Sleep-Phase Memory Consolidation
A background memory consolidation process for AI agents modeled on human sleep phases. Short-term signals accumulate over multiple sessions and days; only high-confidence, multi-evidence memories are promoted to durable long-term storage. The system runs offline (typically cron-triggered at 3 AM), not during conversations.
Why offline consolidation matters
Real-time memory systems (writing to MEMORY.md during a conversation) have a fundamental problem: the agent decides what’s important while still inside the task. One-shot judgments produce noisy memory — the agent saves things that seemed important in context but aren’t, and misses patterns that only emerge across sessions. Offline consolidation solves this by requiring evidence from multiple queries and multiple days before promoting anything.
The three phases
The design borrows from human sleep architecture, where different sleep stages serve different consolidation functions:
Light sleep — ingestion and deduplication
Ingests raw signals from two sources: daily memory notes (markdown files written during conversations) and redacted session transcripts. Deduplicates using Jaccard similarity on tokenized snippets (configurable threshold). Stages candidates with metadata: source, timestamp, content hash. Never writes to durable memory. Output: a pool of staged recall candidates.
REM sleep — cross-cutting theme extraction
Analyzes the staged candidate pool for recurring patterns across sessions. Builds concept-tag statistics (how often a concept appears across memories, on how many different days). Identifies “candidate truths” — snippets that appear repeatedly with high average retrieval scores. Confidence scoring combines four signals:
- Average retrieval score (0.45 weight) — how relevant was this when recalled
- Recall strength (0.25) — how often was it recalled, log-scaled
- Consolidation (0.20) — how many distinct days did this appear across
- Conceptual richness (0.10) — how many concept tags does it connect
Still does not write to durable memory. Output: ranked candidate truths with confidence scores.
Deep sleep — selective promotion
The only phase that writes to durable MEMORY.md. Uses a six-signal weighted ranking:
- Relevance (0.30) — average retrieval quality score
- Frequency (0.24) — number of short-term signals accumulated
- Query diversity (0.15) — distinct query contexts that recalled this
- Recency (0.15) — time-decayed freshness (exponential decay with configurable half-life)
- Consolidation (0.10) — multi-day recurrence strength
- Conceptual richness (0.06) — concept-tag density
Candidates must pass three threshold gates: minimum score, minimum recall count, and minimum unique queries. Only then is a memory promoted to the durable store.
Key design decisions
Evidence accumulation over single-shot judgment. The multi-phase pipeline with threshold gates means a memory must be recalled multiple times, across multiple queries, across multiple days before it becomes permanent. This filters out task-specific noise.
Separation of write authority. Light and REM phases can only stage and score — they cannot modify durable memory. Deep sleep is the only writer. This prevents intermediate processing from corrupting the long-term store.
Temporal decay with exemptions. Older signals decay exponentially, but evergreen memories (non-dated files like MEMORY.md itself) are exempt. This lets the system forget stale context while preserving deliberately-stored knowledge.
Narrative dream diary. After each phase, a subagent generates a first-person narrative reflection — “a poet who happens to be a programmer.” These diary entries are stored in DREAMS.md for the user to read but are never used as a promotion source. This makes the consolidation process legible and debuggable for the human operator.
Relationship to other patterns
This is the offline complement to the agent learning loop. The learning loop handles real-time self-improvement (persist memory during conversation, create skills after complex tasks). Dreaming handles what the learning loop can’t: retrospective evaluation of which memories actually matter, discovery of cross-session patterns, and controlled promotion with evidence thresholds.
The evidence accumulation approach contrasts with Hermes Agent’s real-time memory sync, where the agent writes to MEMORY.md during each conversation turn. Both approaches have trade-offs: real-time writing captures intent while it’s fresh but produces noisy memory; offline consolidation is more selective but can miss things that were important in context.
Implementation
OpenClaw implements this as a plugin extension (memory-core), not in core. The dreaming system hooks into the gateway’s cron service for scheduling and uses subagent spawning for narrative generation. State persists in JSON files under memory/.dreams/. Session transcript ingestion tracks per-session watermarks to avoid reprocessing.