Agent Learning Loop

concept
ai-agentsmemoryskillslearningself-improvement

A closed learning loop in an AI agent that enables self-improvement across sessions. Three components working together:

  1. Persistent memory — facts about the user, preferences, project state. Prefetched before each turn, synced after. The agent nudges itself to persist knowledge it judges worth keeping.
  2. Procedural skills — reusable procedures (markdown files, scripts) that the agent creates after completing complex tasks and improves during subsequent use. Skills compound — each successful execution refines the procedure.
  3. Cross-session recall — searchable history of past conversations, enabling the agent to retrieve context from earlier sessions rather than starting from zero.

The loop closes when knowledge gained in one session (a new skill, a memory entry, a refined procedure) improves performance in future sessions without human intervention.

Implementations

Hermes Agent implements all three components:

  • Memory: MemoryManager coordinates builtin (MEMORY.md/USER.md) + external providers (Honcho, Hindsight, Mem0). Providers expose tools for the agent to read/write persistent notes. Lifecycle hooks sync after each turn.
  • Skills: Markdown SKILL.md files with optional config and scripts. The agent autonomously creates skills after complex tasks. Skills self-improve during use. Distributed via agentskills.io.
  • Recall: FTS5 full-text search over session history with LLM summarization for cross-session retrieval.

Relationship to other patterns

The learning loop is the operational counterpart to the LLM wiki pattern (Karpathy). Where the wiki pattern applies ingest-compile-query to external knowledge, the learning loop applies the same cycle to the agent’s own experience: experience is “ingested” into memory, “compiled” into skills, and “queried” via session search.

CORAL’s shared persistent memory extends this to multi-agent settings — multiple agents accumulate knowledge in a shared filesystem, and cross-agent parents inherit successful strategies.

The key constraint: the loop must be automatic. If it requires human curation, it degrades to a note-taking system. The agent must decide what to remember, when to create a skill, and how to improve it — which requires the LLM to judge the value of its own experience.

Offline consolidation

The learning loop operates in real-time: the agent decides what to remember during conversations. OpenClaw adds an offline complement via sleep-phase memory consolidation — a cron-triggered background process modeled on human sleep phases (light/REM/deep) that retrospectively evaluates which short-term memories actually matter. Memories must accumulate evidence across multiple queries and days before promotion to durable storage. This filters out task-specific noise that real-time judgment often lets through.

The two approaches are complementary: real-time writing captures intent while it’s fresh; offline consolidation provides the retrospective evaluation that prevents memory bloat.