OpenClaw

Source summary of OpenClaw, a personal AI assistant platform that runs on your own devices and answers on channels you already use.

Problem and audience

A self-hosted, single-user AI assistant that isn’t trapped in a browser tab. Targets power users who want their assistant reachable on WhatsApp, Telegram, Discord, Slack, Signal, iMessage, and 18+ other platforms simultaneously — with voice, tools, and a visual canvas — while keeping all data local. Competes with cloud-hosted assistants by being provider-agnostic (Anthropic, OpenAI, Vertex, XAI, Gemini, and custom endpoints) and giving the operator full control over security, routing, and tool policy.

Architecture overview

Gateway control plane

The core is a single long-lived WebSocket daemon at ws://127.0.0.1:18789. All state flows through this gateway: channel connections, agent sessions, tool execution, cron jobs, config changes, and event broadcasting. Clients (CLI, macOS app, web Control UI, iOS/Android nodes) connect over a typed RPC protocol with three frame types: request ({type:"req", id, method, params}), response ({type:"res", id, ok, payload|error}), and event ({type:"event", event, payload}).

The protocol defines 100+ methods organized by domain (agent, channels, chat, config, cron, devices, doctor, models, nodes, sessions, skills, system, etc.). Authorization is layered: role-based (operator vs node), scope-based (read/write/admin/pairing), and rate-limited on write methods (3 per 60s for config.apply, config.patch, update.run).

Connection handshake uses a challenge-response flow: gateway sends a nonce, client responds with role, scopes, and signed device fingerprint. Auth modes include shared-secret tokens, Tailscale identity from TLS cert, trusted-proxy headers, and none for loopback-only.

Channel plugin system

24+ messaging platforms implemented as typed adapter plugins. The ChannelPlugin type defines 20+ optional adapter interfaces — pure composition, no inheritance:

gateway — start/stop account connections, QR login flows
outbound — send formatted payloads to the platform
config — account resolution and validation
setup/setupWizard — onboarding and configuration flows
pairing — DM access control with pairing codes
security — per-channel security policy
groups — group message routing and mention handling
threading — conversation thread binding across platforms
streaming — real-time response streaming support
agentTools — channel-specific tools exposed to the agent
heartbeat, allowlist, doctor, lifecycle, commands, etc.

Each channel implementation (Telegram, WhatsApp, Discord, Slack, etc.) composes only the adapters it needs. Discovery via openclaw.plugin.json manifests. Registry caches plugins by version with deduplication and deterministic ordering.

This is a concrete implementation of the channel adapter pattern and a strong example of the multi-channel AI gateway architecture.

Plugin SDK boundary

Extensions import only from openclaw/plugin-sdk/* (10+ public subpaths). Core src/** is off-limits. This strict boundary enables third-party plugins without coupling to internal implementation. The 55+ bundled plugins in the extensions/ workspace follow the same rules as external plugins, serving as reference implementations.

When a bundled plugin needs a new seam, the contract flows: add a typed SDK facade first, then consume it. No hardcoded extension IDs in core — everything goes through manifest metadata, capabilities, and registry lookups.

Agent runtime

The embedded “Pi” agent runtime runs inference in-process (not as an external service). Execution pipeline:

Command ingress — loads config, resolves session and agent metadata, builds workspace skill snapshot
Model resolution — selects provider + model with auth profile rotation and fallback chains
Inference loop — LLM call, tool dispatch, budget enforcement (tokens/turns), context compaction when approaching limits
Post-turn — session persistence to .openclaw/sessions/*.jsonl, usage tracking, ACP event logging

Lane-based concurrency serializes execution within a lane (global, session, subagent) while running lanes in parallel.

Skills platform

Markdown-based procedural memory, same pattern as Hermes Agent. Skills are SKILL.md files with YAML frontmatter, discovered from user (~/.skills/), project (./.openclaw/skills/), and plugin-owned directories. At agent invocation, the skills index is injected into the system prompt as XML:

<available_skills>
  <skill>
    <name>docker</name>
    <description>Run containers</description>
    <location>~/.skills/docker/SKILL.md</location>
  </skill>
</available_skills>

Skill filtering applies exposure rules, agent-level allowlists, and capability/binary availability on the target node.

Native companion apps

macOS (Swift), iOS (Swift), and Android (Kotlin) connect as capability “nodes” over the gateway WebSocket. Nodes provide capabilities the gateway can’t: camera, screen, location, voice. Voice Wake provides always-on wake words on macOS/iOS. Talk Mode enables continuous voice conversation on Android. Live Canvas + A2UI gives the agent a visual workspace to render into.

Notable patterns and approaches

Lazy loading boundaries — Heavy modules live behind *.runtime.ts files with dynamic imports. Channel registration stays fast (~100ms) because send, monitor, and probe code loads on demand. The CLAUDE.md enforces: never mix static and dynamic imports for the same heavy module.

Prompt cache stability — Ordering of maps, sets, registries, plugin lists, and MCP catalogs is determinized before building model requests. Context compaction mutates newest content first so the cached prefix stays byte-identical across turns.

JSON5 config with includes — Configuration stored as JSON5 at ~/.openclaw/config.json with environment variable substitution (${VAR}), recursive includes for splitting large configs, zod validation at load time, legacy migration paths, and automatic backup rotation.

DM pairing security — Unknown senders on real messaging platforms receive a short pairing code. The bot won’t process their message until the operator approves with openclaw pairing approve <channel> <code>. Public inbound requires explicit opt-in (dmPolicy="open" + "*" in allowlist).

Plugin auto-enable — Plugins activate automatically based on config hints (e.g., adding a Telegram token enables the Telegram plugin) without requiring hardcoded extension IDs in core.

Key dependencies

@anthropic-ai/sdk + openai — dual-provider LLM clients
Baileys — WhatsApp Web session (unofficial)
grammY — Telegram Bot API
discord.js — Discord client
@slack/bolt — Slack integration
hono — lightweight HTTP framework for webhook/health endpoints
zod — runtime validation at config and API boundaries
TypeBox + AJV — JSON Schema for gateway protocol
sharp — image processing in media pipeline

Connections

Architecturally parallel to Hermes Agent: same multi-platform gateway + skills + tools + session pattern, different language (TypeScript vs Python) and different design philosophy (composition-based adapters vs inheritance-based ABC). OpenClaw has stricter plugin boundaries and a richer native companion app story; Hermes Agent has a built-in learning loop and RL training infrastructure.
The channel adapter pattern is a typed variant of the plugin composition approach described in managed agents architecture — tools and capabilities compose without inheritance.
Skills system parallels Hermes Agent’s SKILL.md format and the agent learning loop pattern, though OpenClaw’s skills are not self-created by the agent (they’re authored by users or shipped by plugins).
The gateway-as-control-plane design echoes the brain-hands decoupling pattern: reasoning (agent runtime) and execution (channel adapters, tool backends, nodes) are separated behind stable typed interfaces.
Context compaction mirrors context window compression — auto-summarizing when approaching limits, preserving recent context, keeping cached prefixes stable. OpenClaw adds staged summarization for large histories and an identifier preservation policy.
The dreaming system (memory-core plugin) implements sleep-phase memory consolidation — three-phase offline consolidation (light/REM/deep) with evidence accumulation thresholds before durable promotion. This is the offline complement to the agent learning loop.
Tool loop detection uses content-aware result hashing to distinguish legitimate polling from stuck loops, with four independent detectors and escalating response (warn -> critical -> circuit break).
The agent exec policy implements three-axis human-in-the-loop tool approval (security x ask x fallback) with fail-closed composition across host and session policies.
Two-tier model failover extends the credential pool pattern: auth profile rotation (inner loop) + model fallback chain (outer loop) + cooldown probing near expiry.