Skip to content

Media Processing

OpenClaw's media processing pipeline handles image, audio, video, and document reception, conversion, and delivery. Different channels have different media limits, and the pipeline provides a unified adaptation layer in between.

Source Location

src/media/
├── store.ts             # Media storage and retrieval
├── fetch.ts             # URL media download
├── image-ops.ts         # Image operations (resize, convert)
├── mime.ts              # MIME type detection
├── parse.ts             # Media metadata parsing
├── png-encode.ts        # PNG encoding
├── input-files.ts       # File input handling
├── constants.ts         # Media constants and limits
└── *.test.ts            # Tests

src/media-understanding/   # AI media understanding
├── image-understanding.ts
├── video-understanding.ts
└── ...

Processing Pipeline

Image Operations

Source: src/media/image-ops.ts

Image processing is based on the sharp library:

typescript
// Image operations (simplified)
// - Resize to fit channel constraints
// - Convert between formats (JPEG, PNG, WebP)
// - Extract metadata (dimensions, format)
// - Apply size caps

Per-Channel Image Limits

ChannelMax SizeSupported Formats
Telegram10MB (image), 50MB (file)JPEG, PNG, WebP, GIF
Discord25MB (standard), 100MB (Nitro)Common formats
WhatsApp16MBJPEG, PNG
SlackDepends on planCommon formats

MIME Detection

Source: src/media/mime.ts

typescript
// MIME detection
// - File extension mapping
// - Magic byte detection
// - Content-Type header parsing
// - Fallback to application/octet-stream

Media Understanding

Source: src/media-understanding/

The media understanding module enables the AI Agent to "see" and "hear" media content:

CapabilityDescription
Image understandingUses vision models to analyze image content
Video understandingExtracts key frames + visual analysis
Audio transcriptionWhisper integration, speech-to-text

Temporary File Management

Media files are stored in a temporary directory with lifecycle management:

typescript
// Temporary file lifecycle
// - Created during processing
// - TTL-based cleanup
// - Channel-specific size caps

Summary

  • sharp-based image processing (resize, format conversion)
  • MIME detection supports file extension, magic bytes, and Content-Type
  • Media understanding enables AI to analyze images, video, and audio content
  • Different channels have different size and format limits
  • Temporary files have TTL-based lifecycle management

Next: Security Model

OpenClaw Source Code Tutorial