Skip to content

Streaming

Streaming is a critical feature of the Agent system -- it allows users to see partial output as the AI generates its reply, rather than waiting for the complete response. OpenClaw's streaming involves delta buffering, block merging, and multiple callbacks.

Streaming Architecture

Delta Processing Pipeline

Each delta (text fragment) received from the model passes through the following steps:

  1. Append to deltaBuffer
  2. Update inline code state (awareness of code blocks)
  3. Detect block state markers (<thinking>, <final>)
  4. Push to BlockChunker for merging
  5. Trigger onPartialReply callback

Inline Code State

Streaming needs to be aware of Markdown code block state to avoid breaking messages in the middle of code:

While inside a code block, the chunker will not force-emit -- it waits for the code block to close.

Reasoning Modes

The Agent supports three reasoning modes:

ModeBehaviorUse Case
"off"Discard thinking blocksDefault, fast response
"on"Include thinking in final outputShow full reasoning process
"stream"Send thinking via separate callbackReal-time reasoning display

Callback Interface

Streaming delivers data to the upper layers through callbacks:

CallbackTriggered WhenPurpose
onBlockReplyAfter BlockChunker mergeDeliver meaningful text blocks
onPartialReplyAfter each deltaTypewriter effect
onReasoningStreamThinking content (mode=stream)Real-time reasoning display
onBlockReplyFlushBefore tool call / on completionForce flush buffer
onToolResultAfter tool executionNotify upper layer

Callback Timing Sequence

Key timing rules:

  1. onPartialReply fires immediately after each delta (for typewriter effect)
  2. onBlockReply fires after BlockChunker merges content (for actual delivery)
  3. onBlockReplyFlush fires before tool calls to force output (maintains message boundaries)
  4. onToolResult fires after tool execution completes

Tool Call Boundary Flush

Before a tool call, the streaming system force-flushes all buffered content. This ensures the user has received all pending text before seeing "executing tool..." feedback.

This boundary strategy makes the message flow clearer for the user -- there is an explicit dividing line between the AI's text response and tool operations.

Summary

  • Streaming consists of three layers: Delta buffering -> Block merging -> Callback dispatch
  • Inline Code State tracks code blocks to avoid breaking in the middle of code
  • Reasoning modes (off / on / stream) control the visibility of thinking blocks
  • Five callback functions cover the complete streaming lifecycle
  • Force flush before tool calls keeps message boundaries clear

Next: Plugin System Overview

OpenClaw Source Code Tutorial