Media Processing

OpenClaw's media processing pipeline handles image, audio, video, and document reception, conversion, and delivery. Different channels have different media limits, and the pipeline provides a unified adaptation layer in between.

Source Location

src/media/
├── store.ts             # Media storage and retrieval
├── fetch.ts             # URL media download
├── image-ops.ts         # Image operations (resize, convert)
├── mime.ts              # MIME type detection
├── parse.ts             # Media metadata parsing
├── png-encode.ts        # PNG encoding
├── input-files.ts       # File input handling
├── constants.ts         # Media constants and limits
└── *.test.ts            # Tests

src/media-understanding/   # AI media understanding
├── image-understanding.ts
├── video-understanding.ts
└── ...

Processing Pipeline

Image Operations

Source: src/media/image-ops.ts

Image processing is based on the sharp library:

typescript

// Image operations (simplified)
// - Resize to fit channel constraints
// - Convert between formats (JPEG, PNG, WebP)
// - Extract metadata (dimensions, format)
// - Apply size caps

Per-Channel Image Limits

Channel	Max Size	Supported Formats
Telegram	10MB (image), 50MB (file)	JPEG, PNG, WebP, GIF
Discord	25MB (standard), 100MB (Nitro)	Common formats
WhatsApp	16MB	JPEG, PNG
Slack	Depends on plan	Common formats

MIME Detection

Source: src/media/mime.ts

typescript

// MIME detection
// - File extension mapping
// - Magic byte detection
// - Content-Type header parsing
// - Fallback to application/octet-stream

Media Understanding

Source: src/media-understanding/

The media understanding module enables the AI Agent to "see" and "hear" media content:

Capability	Description
Image understanding	Uses vision models to analyze image content
Video understanding	Extracts key frames + visual analysis
Audio transcription	Whisper integration, speech-to-text

Temporary File Management

Media files are stored in a temporary directory with lifecycle management:

typescript

// Temporary file lifecycle
// - Created during processing
// - TTL-based cleanup
// - Channel-specific size caps

Summary

sharp-based image processing (resize, format conversion)
MIME detection supports file extension, magic bytes, and Content-Type
Media understanding enables AI to analyze images, video, and audio content
Different channels have different size and format limits
Temporary files have TTL-based lifecycle management

Next: Security Model

Media Processing ​

Source Location ​

Processing Pipeline ​

Image Operations ​

Per-Channel Image Limits ​

MIME Detection ​

Media Understanding ​

Temporary File Management ​

Summary ​

Media Processing

Source Location

Processing Pipeline

Image Operations

Per-Channel Image Limits

MIME Detection

Media Understanding

Temporary File Management

Summary