OpenClaw
Source summary of OpenClaw, a personal AI assistant platform that runs on your own devices and answers on channels you already use.
Problem and audience
A self-hosted, single-user AI assistant that isn’t trapped in a browser tab. Targets power users who want their assistant reachable on WhatsApp, Telegram, Discord, Slack, Signal, iMessage, and 18+ other platforms simultaneously — with voice, tools, and a visual canvas — while keeping all data local. Competes with cloud-hosted assistants by being provider-agnostic (Anthropic, OpenAI, Vertex, XAI, Gemini, and custom endpoints) and giving the operator full control over security, routing, and tool policy.
Architecture overview
Gateway control plane
The core is a single long-lived WebSocket daemon at ws://127.0.0.1:18789. All state flows through this gateway: channel connections, agent sessions, tool execution, cron jobs, config changes, and event broadcasting. Clients (CLI, macOS app, web Control UI, iOS/Android nodes) connect over a typed RPC protocol with three frame types: request ({type:"req", id, method, params}), response ({type:"res", id, ok, payload|error}), and event ({type:"event", event, payload}).
The protocol defines 100+ methods organized by domain (agent, channels, chat, config, cron, devices, doctor, models, nodes, sessions, skills, system, etc.). Authorization is layered: role-based (operator vs node), scope-based (read/write/admin/pairing), and rate-limited on write methods (3 per 60s for config.apply, config.patch, update.run).
Connection handshake uses a challenge-response flow: gateway sends a nonce, client responds with role, scopes, and signed device fingerprint. Auth modes include shared-secret tokens, Tailscale identity from TLS cert, trusted-proxy headers, and none for loopback-only.
Channel plugin system
24+ messaging platforms implemented as typed adapter plugins. The ChannelPlugin type defines 20+ optional adapter interfaces — pure composition, no inheritance:
- gateway — start/stop account connections, QR login flows
- outbound — send formatted payloads to the platform
- config — account resolution and validation
- setup/setupWizard — onboarding and configuration flows
- pairing — DM access control with pairing codes
- security — per-channel security policy
- groups — group message routing and mention handling
- threading — conversation thread binding across platforms
- streaming — real-time response streaming support
- agentTools — channel-specific tools exposed to the agent
- heartbeat, allowlist, doctor, lifecycle, commands, etc.
Each channel implementation (Telegram, WhatsApp, Discord, Slack, etc.) composes only the adapters it needs. Discovery via openclaw.plugin.json manifests. Registry caches plugins by version with deduplication and deterministic ordering.
This is a concrete implementation of the channel adapter pattern and a strong example of the multi-channel AI gateway architecture.
Plugin SDK boundary
Extensions import only from openclaw/plugin-sdk/* (10+ public subpaths). Core src/** is off-limits. This strict boundary enables third-party plugins without coupling to internal implementation. The 55+ bundled plugins in the extensions/ workspace follow the same rules as external plugins, serving as reference implementations.
When a bundled plugin needs a new seam, the contract flows: add a typed SDK facade first, then consume it. No hardcoded extension IDs in core — everything goes through manifest metadata, capabilities, and registry lookups.
Agent runtime
The embedded “Pi” agent runtime runs inference in-process (not as an external service). Execution pipeline:
- Command ingress — loads config, resolves session and agent metadata, builds workspace skill snapshot
- Model resolution — selects provider + model with auth profile rotation and fallback chains
- Inference loop — LLM call, tool dispatch, budget enforcement (tokens/turns), context compaction when approaching limits
- Post-turn — session persistence to
.openclaw/sessions/*.jsonl, usage tracking, ACP event logging
Lane-based concurrency serializes execution within a lane (global, session, subagent) while running lanes in parallel.
Skills platform
Markdown-based procedural memory, same pattern as Hermes Agent. Skills are SKILL.md files with YAML frontmatter, discovered from user (~/.skills/), project (./.openclaw/skills/), and plugin-owned directories. At agent invocation, the skills index is injected into the system prompt as XML:
<available_skills>
<skill>
<name>docker</name>
<description>Run containers</description>
<location>~/.skills/docker/SKILL.md</location>
</skill>
</available_skills>
Skill filtering applies exposure rules, agent-level allowlists, and capability/binary availability on the target node.
Native companion apps
macOS (Swift), iOS (Swift), and Android (Kotlin) connect as capability “nodes” over the gateway WebSocket. Nodes provide capabilities the gateway can’t: camera, screen, location, voice. Voice Wake provides always-on wake words on macOS/iOS. Talk Mode enables continuous voice conversation on Android. Live Canvas + A2UI gives the agent a visual workspace to render into.
Notable patterns and approaches
Lazy loading boundaries — Heavy modules live behind *.runtime.ts files with dynamic imports. Channel registration stays fast (~100ms) because send, monitor, and probe code loads on demand. The CLAUDE.md enforces: never mix static and dynamic imports for the same heavy module.
Prompt cache stability — Ordering of maps, sets, registries, plugin lists, and MCP catalogs is determinized before building model requests. Context compaction mutates newest content first so the cached prefix stays byte-identical across turns.
JSON5 config with includes — Configuration stored as JSON5 at ~/.openclaw/config.json with environment variable substitution (${VAR}), recursive includes for splitting large configs, zod validation at load time, legacy migration paths, and automatic backup rotation.
DM pairing security — Unknown senders on real messaging platforms receive a short pairing code. The bot won’t process their message until the operator approves with openclaw pairing approve <channel> <code>. Public inbound requires explicit opt-in (dmPolicy="open" + "*" in allowlist).
Plugin auto-enable — Plugins activate automatically based on config hints (e.g., adding a Telegram token enables the Telegram plugin) without requiring hardcoded extension IDs in core.
Key dependencies
- @anthropic-ai/sdk + openai — dual-provider LLM clients
- Baileys — WhatsApp Web session (unofficial)
- grammY — Telegram Bot API
- discord.js — Discord client
- @slack/bolt — Slack integration
- hono — lightweight HTTP framework for webhook/health endpoints
- zod — runtime validation at config and API boundaries
- TypeBox + AJV — JSON Schema for gateway protocol
- sharp — image processing in media pipeline
Connections
- Architecturally parallel to Hermes Agent: same multi-platform gateway + skills + tools + session pattern, different language (TypeScript vs Python) and different design philosophy (composition-based adapters vs inheritance-based ABC). OpenClaw has stricter plugin boundaries and a richer native companion app story; Hermes Agent has a built-in learning loop and RL training infrastructure.
- The channel adapter pattern is a typed variant of the plugin composition approach described in managed agents architecture — tools and capabilities compose without inheritance.
- Skills system parallels Hermes Agent’s SKILL.md format and the agent learning loop pattern, though OpenClaw’s skills are not self-created by the agent (they’re authored by users or shipped by plugins).
- The gateway-as-control-plane design echoes the brain-hands decoupling pattern: reasoning (agent runtime) and execution (channel adapters, tool backends, nodes) are separated behind stable typed interfaces.
- Context compaction mirrors context window compression — auto-summarizing when approaching limits, preserving recent context, keeping cached prefixes stable. OpenClaw adds staged summarization for large histories and an identifier preservation policy.
- The dreaming system (
memory-coreplugin) implements sleep-phase memory consolidation — three-phase offline consolidation (light/REM/deep) with evidence accumulation thresholds before durable promotion. This is the offline complement to the agent learning loop. - Tool loop detection uses content-aware result hashing to distinguish legitimate polling from stuck loops, with four independent detectors and escalating response (warn -> critical -> circuit break).
- The agent exec policy implements three-axis human-in-the-loop tool approval (security x ask x fallback) with fail-closed composition across host and session policies.
- Two-tier model failover extends the credential pool pattern: auth profile rotation (inner loop) + model fallback chain (outer loop) + cooldown probing near expiry.