Hermes Agent

source Apr 9, 2026

ai-agentstool-callingmemoryskillsmulti-platformrl-trainingnous-research

Source summary of Hermes Agent, a self-improving multi-platform AI agent framework by Nous Research.

Problem and audience

An open-source, self-hosted AI agent that isn’t tied to a laptop, learns about its user across sessions, and works from any messaging platform. Targets power users and developers who want a general-purpose agent (not just coding) with full control over model choice and deployment. Competes with proprietary agent platforms by being provider-agnostic: Nous Portal, OpenRouter (200+ models), OpenAI, Anthropic, z.ai/GLM, Kimi, MiniMax, or any custom endpoint.

Architecture overview

Agent loop

The core is AIAgent.chat() in run_agent.py (9,400 lines). Each turn:

System prompt assembly — identity + platform hints + memory prefetch + skills index + context files (AGENTS.md, SOUL.md) + token budget
LLM API call — model selection (optional cheap-vs-strong routing), tool schema injection, prompt caching (Anthropic Cache-Control headers)
Tool-calling loop — check response.tool_calls, dispatch via handle_function_call(), persist results to disk (interrupt recovery), inject results as new turn, check budget (tokens/chars/turns), repeat until no more tool calls
Post-turn — sync memory providers, emit hooks (agent:end), queue background prefetch for next turn

Budget enforcement prevents runaway loops. Context compression auto-triggers at configurable threshold (default 50% of context window), summarizing middle turns via auxiliary LLM while preserving a structured template (Goal, Progress, Decisions, Files, Next Steps).

Multi-platform gateway

A single gateway process handles all messaging platforms concurrently. Platform adapters inherit from PlatformAdapter(ABC) — all async, all reducing to the same AIAgent.chat() call. Supported: Telegram, Discord, Slack, WhatsApp, Signal, Matrix, Email, SMS, Home Assistant, DingTalk, Feishu, WeChat Work. PII redaction via deterministic hashing in logs. Session management with configurable reset policies (time-based, message count, manual).

Tools (40+)

Terminal execution across 6 backends (local, Docker, SSH, Daytona, Modal, Singularity), file operations, web search, browser automation (Browserbase/CamoFox), vision (Claude), subagent delegation, MCP client, code execution (sandboxed Python), session search (FTS5 + LLM summarization), cron scheduling, TTS (Edge TTS, ElevenLabs). Tool registry pattern — each tool module registers schemas and handlers at import time.

Memory system

MemoryManager coordinates one builtin provider (MEMORY.md/USER.md files) and at most one external provider (Honcho, Hindsight, Mem0, Supermemory, RetainDB, etc.). Each provider implements MemoryProvider(ABC) with lifecycle hooks: prefetch() (before turn), sync_turn() (after turn), on_session_end(), on_pre_compress(), on_delegation(). Failures in one provider never block the other.

The builtin provider exposes memory <action> and user <action> tools for the agent to read/write persistent notes. External providers can expose their own tool schemas. Memory context is injected into the system prompt wrapped in <memory-context> fences.

Skills system

Markdown-based procedural memory. Skills are SKILL.md files with optional YAML frontmatter, config, and scripts. The agent autonomously creates new skills after complex tasks and improves existing ones during use. Discovery: scans ~/.hermes/skills/, parses frontmatter, injects skills index into system prompt. Invocation: user types /skill-name, skill_commands.py loads the skill, resolves config values, injects payload. Compatible with agentskills.io open standard.

This is a concrete implementation of the agent learning loop pattern: memory provides facts that persist, skills provide procedures that compound, and session search (FTS5) provides cross-session recall.

Notable patterns and approaches

Credential pool — Multi-credential failover with configurable strategies (fill-first, round-robin, random, least-used). Tracks per-credential status (OK, exhausted) with cooldown timers (1h for rate limits, 24h for quota). Supports OAuth refresh and custom endpoints. See credential pool pattern.

Prompt injection scanning — prompt_builder._scan_context_content() scans context files for invisible Unicode, injection patterns (“ignore previous instructions”, “curl exfil”, “cat .env”), and fence-escape sequences before injection into system prompt.

Thread pooling for async — Persistent event loops and thread pools (128 workers default) resolve asyncio deadlocks in nested event loop contexts (RL training inside Atropos). _run_async() bridges sync callers to async tool handlers without closing cached HTTP clients.

Plugin architecture — Three-tier discovery (user ~/.hermes/plugins/ > project ./.hermes/plugins/ > pip entry-points). Plugins declare hooks via YAML manifests and register via PluginContext. Non-blocking execution — plugin errors never block the agent.

Skin engine — Data-driven visual theming for CLI. Builtin skins (default/gold, ares/crimson, mono/grayscale, slate/blue) plus user-created YAML skins. Customizable colors, spinner faces, branding, tool emojis.

RL and training infrastructure

Atropos integration via HermesAgentBaseEnv. Two-phase operation: Phase 1 uses OpenAI-compatible servers (vLLM, SGLang, OpenRouter), Phase 2 uses ManagedServer with client-side tool parsing. HermesAgentLoop provides a reusable multi-turn agent engine for RL with reasoning extraction (multiple provider formats), tool error tracking, and budget enforcement.

Tool-call parsers for model families: Hermes, DeepSeek v3/v3.1, Qwen/Qwen3-Coder, Llama, Mistral, GLM-4.5/4.7, Kimi K2, Longcat. Benchmarks: TBLite (OpenThoughts), Terminal-Bench 2.0, YC-Bench. Trajectory compression for training data generation.

Key dependencies

openai + anthropic — LLM API clients (dual-provider design)
prompt_toolkit — rich CLI TUI
httpx — async HTTP (tools, MCP, platform adapters)
pydantic — config validation, data models
tenacity — retry logic with jittered backoff

Connections

Implements the LLM wiki pattern at the agent level — skills compound like wiki pages, memory persists like the raw layer
The 6 terminal backends directly implement the capsules concept — isolated environments that hibernate when idle
Skills system parallels CORAL’s shared persistent memory — markdown files as accumulating knowledge
Context compression preserves structured summaries, similar to how the heartbeat mechanism in CORAL forces periodic reflection
Embodies the everything-is-text principle: all platforms, skills, memory, and config reduce to text

Backlinks

Building Effective Agents

Directly extends the Managed Agents Architecture by the same organization — that post covers infrastructure, this one covers design patterns

The brain-hands split in Brain-Hands Decoupling is the infrastructure counterpart to the augmented LLM concept here

The orchestrator-workers pattern maps to the "many brains, many hands" idea in Managed Agents

The ACI concept connects to tool design patterns in Hermes Agent and OpenClaw

→

Agent-Computer Interface

The execute(name, input) -> string interface in Brain-Hands Decoupling is an ACI — simple, uniform, hard to misuse

Hermes Agent's 40+ tools with typed schemas and dispatch loop are a large-scale ACI implementation

Agent Exec Policy adds a safety layer on top of the ACI — controlling which tool calls are approved

The poka-yoke principle connects to Robustness Control — designing systems that resist failure modes rather than detecting them after the fact

The emphasis on absolute paths over relative paths mirrors standard software engineering practice and is enforced by tools like Claude Code

→

Agent Learning Loop Hermes Agent implements all three components:

Memory: MemoryManager coordinates builtin (MEMORY.md/USER.md) + external providers (Honcho, Hindsight, Mem0). Providers expose tools for the agent to read/write persistent notes. Lifecycle hooks sync after each turn.

Skills: Markdown SKILL.md files with optional config and scripts. The agent autonomously creates skills after complex tasks. Skills self-improve during use. Distributed via agentskills.io.

Recall: FTS5 full-text search over session history with LLM summarization for cross-session retrieval.

→

Agentic Workflow Patterns

Augmented LLM is the building block that all five patterns compose

The orchestrator-workers pattern parallels the Brain-Hands Decoupling in Managed Agents — one brain, many hands

The evaluator-optimizer loop is structurally the same as the Agent Learning Loop — generate, evaluate, refine

CORAL implements a sophisticated version of orchestrator-workers with shared memory and asynchronous co-evolution

Hermes Agent's skill creation after complex tasks resembles the evaluator-optimizer: generate a solution, evaluate it, refine into a reusable skill

→

Augmented LLM

The "brain" in Brain-Hands Decoupling is an augmented LLM — Claude plus harness loop, with sandboxes as the tools

Hermes Agent is a concrete implementation: Claude + 40 tools + FTS5 memory + skills = one augmented LLM in a chat loop

Context Window Compression addresses the memory augmentation — what happens when the model's retained state exceeds capacity

The concept connects to the LLM wiki pattern in LLM Knowledge Bases where the retrieval augmentation is a structured wiki rather than raw documents

→

Brain-Hands Decoupling

Managed Agents Architecture: the system that implements this pattern

Meta-harness: the design philosophy that motivates decoupling

Hermes Agent takes the opposite approach — a single process with 6 terminal backends. This works for a developer tool but limits infrastructure flexibility

Capsules: isolated environments for agents — the "hands" in this pattern

The user notes that the boundary between software and hardware is an implementation detail — brain-hands decoupling embodies this: the brain's interface abstracts away whether the hands are software containers or physical devices

→

Building Effective Agents

The infrastructure counterpart to this design guide is Managed Agents Architecture by the same organization — that post covers the brain/hands/session split, this one covers the patterns running inside the brain

The orchestrator-workers pattern directly maps to the "many brains, many hands" architecture in Brain-Hands Decoupling

The emphasis on tool quality resonates with Hermes Agent's 40+ tools and skill system — both argue that tool design matters more than prompt design

The evaluator-optimizer pattern is structurally identical to the Agent Learning Loop: generate, evaluate, improve

The recommendation to start simple and add complexity connects to the knowledge base's theme around Autonomy With Acceptable Quality

→

Context Window Compression The summary must preserve what matters for continued work. Hermes Agent uses a structured template: →

Credential Pool Pattern Hermes Agent implements this in agent/credential_pool.py (47KB). CredentialPool manages PooledCredential objects with status tracking, cooldown timers, and strategy selection. Supports both OAuth tokens (with refresh) and static API keys. Persistence via JSON serialization. Agent key auto-refresh with clock skew tolerance. →

Gemma 4 Model Card

Per-Layer Embeddings: architectural innovation enabling E2B/E4B efficiency

Hybrid attention mechanism: interleaved local/global attention pattern

Thinking mode: built-in reasoning capability with configurable output

Variable image resolution: token budget tuning for speed vs. detail tradeoff

Mixture-of-Experts efficiency: 26B A4B architecture pattern

Context window compression: 128K/256K context windows require compression for long sessions

Hermes Agent: open-source agent framework compatible with Gemma models

→

Managed Agents Architecture

The meta-harness concept: opinionated about interfaces, not implementations

Harness staleness: why harnesses need to be swappable

Brain-hands decoupling: the core architectural pattern

Hermes Agent takes a different approach — bundling terminal backends, memory, and skills into a single agent process, though its 6 terminal backends (local, Docker, SSH, Daytona, Modal, Singularity) parallel the "many hands" idea

Context window compression: the session-as-context-object pattern is an alternative to summarize-and-discard

Credential pool pattern: solves a related problem (safe credential access) with a different mechanism (failover rotation vs. vault isolation)

The user's observation that an AI agent is always physical — Managed Agents makes the brain/hands split explicit, treating execution environments as abstract "hands" regardless of substrate

→

AI Agents

Capsules Isolated Environments for AI Agents — isolated, reproducible environments for agents

Clawdbot Capsules and Self Evolving Agents — minimal core + self-development

My Digital Twin Starts With Claude Code — personal knowledge graph from Claude Code sessions

Hermes Agent — self-improving multi-platform agent framework with learning loop, skills, and RL training (Nous Research)

OpenClaw — personal AI assistant gateway: 24+ messaging channels, typed plugin adapters, embedded agent runtime, native companion apps

Agent Learning Loop — memory + skills + session search forming a closed self-improvement cycle

Sleep-Phase Memory Consolidation — offline three-phase (light/REM/deep) memory consolidation with evidence accumulation thresholds

Context Window Compression — auto-summarizing old conversation turns to stay within context limits

Credential Pool Pattern — multi-credential failover with selection strategies for agent infrastructure

Managed Agents Architecture — Anthropic's hosted long-horizon agent service: brain/hands/session decoupling

Brain-Hands Decoupling — separating reasoning from execution behind stable interfaces

Meta-Harness — system designed for harnesses that don't exist yet

Harness Staleness — harnesses encode assumptions that go stale as models improve

Session as Context Object — durable event log as interrogable context outside the context window

Multi-Channel AI Gateway — single-daemon architecture routing one agent across many messaging platforms

Channel Adapter Pattern — typed composition of optional interfaces for messaging channel plugins

→

Multi-Channel AI Gateway Hermes Agent — Python, 12+ channels, platform adapters inheriting from PlatformAdapter(ABC). All platforms reduce to the same AIAgent.chat() call. Simpler adapter surface (fewer optional interfaces) but includes a self-improving learning loop the agent uses to create and refine skills. →

Nous Research

Hermes models — fine-tuned LLMs optimized for function calling and tool use (the "Hermes" family). Used as the base for tool-calling training and evaluation.

Hermes Agent — open-source, self-improving AI agent framework with multi-platform gateway, skills system, and RL training infrastructure.

Nous Portal — hosted inference API for Nous models and third-party models.

agentskills.io — open standard and hub for agent skills (procedural memory).

Atropos/Tinker — RL training framework for tool-calling models, integrated into Hermes Agent as tinker-atropos submodule.

→

OpenClaw Markdown-based procedural memory, same pattern as Hermes Agent. Skills are SKILL.md files with YAML frontmatter, discovered from user (~/.skills/), project (./.openclaw/skills/), and plugin-owned directories. At agent invocation, the skills index is injected into the system prompt as XML: →

OpenClaw

Architecturally parallel to Hermes Agent: same multi-platform gateway + skills + tools + session pattern, different language (TypeScript vs Python) and different design philosophy (composition-based adapters vs inheritance-based ABC). OpenClaw has stricter plugin boundaries and a richer native companion app story; Hermes Agent has a built-in learning loop and RL training infrastructure.

The channel adapter pattern is a typed variant of the plugin composition approach described in managed agents architecture — tools and capabilities compose without inheritance.

Skills system parallels Hermes Agent's SKILL.md format and the agent learning loop pattern, though OpenClaw's skills are not self-created by the agent (they're authored by users or shipped by plugins).

The gateway-as-control-plane design echoes the brain-hands decoupling pattern: reasoning (agent runtime) and execution (channel adapters, tool backends, nodes) are separated behind stable typed interfaces.

Context compaction mirrors context window compression — auto-summarizing when approaching limits, preserving recent context, keeping cached prefixes stable. OpenClaw adds staged summarization for large histories and an identifier preservation policy.

The dreaming system (memory-core plugin) implements sleep-phase memory consolidation — three-phase offline consolidation (light/REM/deep) with evidence accumulation thresholds before durable promotion. This is the offline complement to the agent learning loop.

Tool loop detection uses content-aware result hashing to distinguish legitimate polling from stuck loops, with four independent detectors and escalating response (warn -> critical -> circuit break).

The agent exec policy implements three-axis human-in-the-loop tool approval (security x ask x fallback) with fail-closed composition across host and session policies.

Two-tier model failover extends the credential pool pattern: auth profile rotation (inner loop) + model fallback chain (outer loop) + cooldown probing near expiry.

→

Sleep-Phase Memory Consolidation The evidence accumulation approach contrasts with Hermes Agent's real-time memory sync, where the agent writes to MEMORY.md during each conversation turn. Both approaches have trade-offs: real-time writing captures intent while it's fresh but produces noisy memory; offline consolidation is more selective but can miss things that were important in context. →

Thinking Mode

Gemma 4 Model Card: all Gemma 4 models support thinking mode

Context window compression: thinking blocks add tokens, requiring compression in long sessions

Hermes Agent: agent frameworks can leverage thinking mode for observable reasoning

→