Streaming

Streaming is a critical feature of the Agent system -- it allows users to see partial output as the AI generates its reply, rather than waiting for the complete response. OpenClaw's streaming involves delta buffering, block merging, and multiple callbacks.

Streaming Architecture

Delta Processing Pipeline

Each delta (text fragment) received from the model passes through the following steps:

Append to deltaBuffer
Update inline code state (awareness of code blocks)
Detect block state markers (<thinking>, <final>)
Push to BlockChunker for merging
Trigger onPartialReply callback

Inline Code State

Streaming needs to be aware of Markdown code block state to avoid breaking messages in the middle of code:

While inside a code block, the chunker will not force-emit -- it waits for the code block to close.

Reasoning Modes

The Agent supports three reasoning modes:

Mode	Behavior	Use Case
`"off"`	Discard thinking blocks	Default, fast response
`"on"`	Include thinking in final output	Show full reasoning process
`"stream"`	Send thinking via separate callback	Real-time reasoning display

Callback Interface

Streaming delivers data to the upper layers through callbacks:

Callback	Triggered When	Purpose
`onBlockReply`	After BlockChunker merge	Deliver meaningful text blocks
`onPartialReply`	After each delta	Typewriter effect
`onReasoningStream`	Thinking content (mode=stream)	Real-time reasoning display
`onBlockReplyFlush`	Before tool call / on completion	Force flush buffer
`onToolResult`	After tool execution	Notify upper layer

Callback Timing Sequence

Key timing rules:

onPartialReply fires immediately after each delta (for typewriter effect)
onBlockReply fires after BlockChunker merges content (for actual delivery)
onBlockReplyFlush fires before tool calls to force output (maintains message boundaries)
onToolResult fires after tool execution completes

Tool Call Boundary Flush

Before a tool call, the streaming system force-flushes all buffered content. This ensures the user has received all pending text before seeing "executing tool..." feedback.

This boundary strategy makes the message flow clearer for the user -- there is an explicit dividing line between the AI's text response and tool operations.

Summary

Streaming consists of three layers: Delta buffering -> Block merging -> Callback dispatch
Inline Code State tracks code blocks to avoid breaking in the middle of code
Reasoning modes (off / on / stream) control the visibility of thinking blocks
Five callback functions cover the complete streaming lifecycle
Force flush before tool calls keeps message boundaries clear

Next: Plugin System Overview

Streaming ​

Streaming Architecture ​

Delta Processing Pipeline ​

Inline Code State ​

Reasoning Modes ​

Callback Interface ​

Callback Timing Sequence ​

Tool Call Boundary Flush ​

Summary ​