Streaming
Streaming is a critical feature of the Agent system -- it allows users to see partial output as the AI generates its reply, rather than waiting for the complete response. OpenClaw's streaming involves delta buffering, block merging, and multiple callbacks.
Streaming Architecture
Delta Processing Pipeline
Each delta (text fragment) received from the model passes through the following steps:
- Append to
deltaBuffer - Update inline code state (awareness of code blocks)
- Detect block state markers (
<thinking>,<final>) - Push to BlockChunker for merging
- Trigger
onPartialReplycallback
Inline Code State
Streaming needs to be aware of Markdown code block state to avoid breaking messages in the middle of code:
While inside a code block, the chunker will not force-emit -- it waits for the code block to close.
Reasoning Modes
The Agent supports three reasoning modes:
| Mode | Behavior | Use Case |
|---|---|---|
"off" | Discard thinking blocks | Default, fast response |
"on" | Include thinking in final output | Show full reasoning process |
"stream" | Send thinking via separate callback | Real-time reasoning display |
Callback Interface
Streaming delivers data to the upper layers through callbacks:
| Callback | Triggered When | Purpose |
|---|---|---|
onBlockReply | After BlockChunker merge | Deliver meaningful text blocks |
onPartialReply | After each delta | Typewriter effect |
onReasoningStream | Thinking content (mode=stream) | Real-time reasoning display |
onBlockReplyFlush | Before tool call / on completion | Force flush buffer |
onToolResult | After tool execution | Notify upper layer |
Callback Timing Sequence
Key timing rules:
- onPartialReply fires immediately after each delta (for typewriter effect)
- onBlockReply fires after BlockChunker merges content (for actual delivery)
- onBlockReplyFlush fires before tool calls to force output (maintains message boundaries)
- onToolResult fires after tool execution completes
Tool Call Boundary Flush
Before a tool call, the streaming system force-flushes all buffered content. This ensures the user has received all pending text before seeing "executing tool..." feedback.
This boundary strategy makes the message flow clearer for the user -- there is an explicit dividing line between the AI's text response and tool operations.
Summary
- Streaming consists of three layers: Delta buffering -> Block merging -> Callback dispatch
- Inline Code State tracks code blocks to avoid breaking in the middle of code
- Reasoning modes (off / on / stream) control the visibility of thinking blocks
- Five callback functions cover the complete streaming lifecycle
- Force flush before tool calls keeps message boundaries clear
Next: Plugin System Overview