Media Processing
OpenClaw's media processing pipeline handles image, audio, video, and document reception, conversion, and delivery. Different channels have different media limits, and the pipeline provides a unified adaptation layer in between.
Source Location
src/media/
├── store.ts # Media storage and retrieval
├── fetch.ts # URL media download
├── image-ops.ts # Image operations (resize, convert)
├── mime.ts # MIME type detection
├── parse.ts # Media metadata parsing
├── png-encode.ts # PNG encoding
├── input-files.ts # File input handling
├── constants.ts # Media constants and limits
└── *.test.ts # Tests
src/media-understanding/ # AI media understanding
├── image-understanding.ts
├── video-understanding.ts
└── ...Processing Pipeline
Image Operations
Source: src/media/image-ops.ts
Image processing is based on the sharp library:
typescript
// Image operations (simplified)
// - Resize to fit channel constraints
// - Convert between formats (JPEG, PNG, WebP)
// - Extract metadata (dimensions, format)
// - Apply size capsPer-Channel Image Limits
| Channel | Max Size | Supported Formats |
|---|---|---|
| Telegram | 10MB (image), 50MB (file) | JPEG, PNG, WebP, GIF |
| Discord | 25MB (standard), 100MB (Nitro) | Common formats |
| 16MB | JPEG, PNG | |
| Slack | Depends on plan | Common formats |
MIME Detection
Source: src/media/mime.ts
typescript
// MIME detection
// - File extension mapping
// - Magic byte detection
// - Content-Type header parsing
// - Fallback to application/octet-streamMedia Understanding
Source: src/media-understanding/
The media understanding module enables the AI Agent to "see" and "hear" media content:
| Capability | Description |
|---|---|
| Image understanding | Uses vision models to analyze image content |
| Video understanding | Extracts key frames + visual analysis |
| Audio transcription | Whisper integration, speech-to-text |
Temporary File Management
Media files are stored in a temporary directory with lifecycle management:
typescript
// Temporary file lifecycle
// - Created during processing
// - TTL-based cleanup
// - Channel-specific size capsSummary
- sharp-based image processing (resize, format conversion)
- MIME detection supports file extension, magic bytes, and Content-Type
- Media understanding enables AI to analyze images, video, and audio content
- Different channels have different size and format limits
- Temporary files have TTL-based lifecycle management
Next: Security Model