Hermes Agent
Source summary of Hermes Agent, a self-improving multi-platform AI agent framework by Nous Research.
Problem and audience
An open-source, self-hosted AI agent that isn’t tied to a laptop, learns about its user across sessions, and works from any messaging platform. Targets power users and developers who want a general-purpose agent (not just coding) with full control over model choice and deployment. Competes with proprietary agent platforms by being provider-agnostic: Nous Portal, OpenRouter (200+ models), OpenAI, Anthropic, z.ai/GLM, Kimi, MiniMax, or any custom endpoint.
Architecture overview
Agent loop
The core is AIAgent.chat() in run_agent.py (9,400 lines). Each turn:
- System prompt assembly — identity + platform hints + memory prefetch + skills index + context files (AGENTS.md, SOUL.md) + token budget
- LLM API call — model selection (optional cheap-vs-strong routing), tool schema injection, prompt caching (Anthropic Cache-Control headers)
- Tool-calling loop — check
response.tool_calls, dispatch viahandle_function_call(), persist results to disk (interrupt recovery), inject results as new turn, check budget (tokens/chars/turns), repeat until no more tool calls - Post-turn — sync memory providers, emit hooks (
agent:end), queue background prefetch for next turn
Budget enforcement prevents runaway loops. Context compression auto-triggers at configurable threshold (default 50% of context window), summarizing middle turns via auxiliary LLM while preserving a structured template (Goal, Progress, Decisions, Files, Next Steps).
Multi-platform gateway
A single gateway process handles all messaging platforms concurrently. Platform adapters inherit from PlatformAdapter(ABC) — all async, all reducing to the same AIAgent.chat() call. Supported: Telegram, Discord, Slack, WhatsApp, Signal, Matrix, Email, SMS, Home Assistant, DingTalk, Feishu, WeChat Work. PII redaction via deterministic hashing in logs. Session management with configurable reset policies (time-based, message count, manual).
Tools (40+)
Terminal execution across 6 backends (local, Docker, SSH, Daytona, Modal, Singularity), file operations, web search, browser automation (Browserbase/CamoFox), vision (Claude), subagent delegation, MCP client, code execution (sandboxed Python), session search (FTS5 + LLM summarization), cron scheduling, TTS (Edge TTS, ElevenLabs). Tool registry pattern — each tool module registers schemas and handlers at import time.
Memory system
MemoryManager coordinates one builtin provider (MEMORY.md/USER.md files) and at most one external provider (Honcho, Hindsight, Mem0, Supermemory, RetainDB, etc.). Each provider implements MemoryProvider(ABC) with lifecycle hooks: prefetch() (before turn), sync_turn() (after turn), on_session_end(), on_pre_compress(), on_delegation(). Failures in one provider never block the other.
The builtin provider exposes memory <action> and user <action> tools for the agent to read/write persistent notes. External providers can expose their own tool schemas. Memory context is injected into the system prompt wrapped in <memory-context> fences.
Skills system
Markdown-based procedural memory. Skills are SKILL.md files with optional YAML frontmatter, config, and scripts. The agent autonomously creates new skills after complex tasks and improves existing ones during use. Discovery: scans ~/.hermes/skills/, parses frontmatter, injects skills index into system prompt. Invocation: user types /skill-name, skill_commands.py loads the skill, resolves config values, injects payload. Compatible with agentskills.io open standard.
This is a concrete implementation of the agent learning loop pattern: memory provides facts that persist, skills provide procedures that compound, and session search (FTS5) provides cross-session recall.
Notable patterns and approaches
Credential pool — Multi-credential failover with configurable strategies (fill-first, round-robin, random, least-used). Tracks per-credential status (OK, exhausted) with cooldown timers (1h for rate limits, 24h for quota). Supports OAuth refresh and custom endpoints. See credential pool pattern.
Prompt injection scanning — prompt_builder._scan_context_content() scans context files for invisible Unicode, injection patterns (“ignore previous instructions”, “curl exfil”, “cat .env”), and fence-escape sequences before injection into system prompt.
Thread pooling for async — Persistent event loops and thread pools (128 workers default) resolve asyncio deadlocks in nested event loop contexts (RL training inside Atropos). _run_async() bridges sync callers to async tool handlers without closing cached HTTP clients.
Plugin architecture — Three-tier discovery (user ~/.hermes/plugins/ > project ./.hermes/plugins/ > pip entry-points). Plugins declare hooks via YAML manifests and register via PluginContext. Non-blocking execution — plugin errors never block the agent.
Skin engine — Data-driven visual theming for CLI. Builtin skins (default/gold, ares/crimson, mono/grayscale, slate/blue) plus user-created YAML skins. Customizable colors, spinner faces, branding, tool emojis.
RL and training infrastructure
Atropos integration via HermesAgentBaseEnv. Two-phase operation: Phase 1 uses OpenAI-compatible servers (vLLM, SGLang, OpenRouter), Phase 2 uses ManagedServer with client-side tool parsing. HermesAgentLoop provides a reusable multi-turn agent engine for RL with reasoning extraction (multiple provider formats), tool error tracking, and budget enforcement.
Tool-call parsers for model families: Hermes, DeepSeek v3/v3.1, Qwen/Qwen3-Coder, Llama, Mistral, GLM-4.5/4.7, Kimi K2, Longcat. Benchmarks: TBLite (OpenThoughts), Terminal-Bench 2.0, YC-Bench. Trajectory compression for training data generation.
Key dependencies
- openai + anthropic — LLM API clients (dual-provider design)
- prompt_toolkit — rich CLI TUI
- httpx — async HTTP (tools, MCP, platform adapters)
- pydantic — config validation, data models
- tenacity — retry logic with jittered backoff
Connections
- Implements the LLM wiki pattern at the agent level — skills compound like wiki pages, memory persists like the raw layer
- The 6 terminal backends directly implement the capsules concept — isolated environments that hibernate when idle
- Skills system parallels CORAL’s shared persistent memory — markdown files as accumulating knowledge
- Context compression preserves structured summaries, similar to how the heartbeat mechanism in CORAL forces periodic reflection
- Embodies the everything-is-text principle: all platforms, skills, memory, and config reduce to text