CORAL: Autonomous Multi-Agent Evolution

source
multi-agentevolutionopen-ended-discoveryknowledge-accumulation

CORAL (Qu et al., 2026) is a framework for autonomous evolution on open-ended discovery problems. It replaces the fixed heuristics of prior LLM-based evolutionary search (FunSearch, AlphaEvolve, OpenEvolve, ShinkaEvolve, EvoX) with long-running agents that decide for themselves what to explore, when to test, and what knowledge to carry forward.

Core mechanisms

Shared persistent memory as filesystem. Knowledge is structured into three directories — attempts/ (historical evaluations), notes/ (observations and reflections), and skills/ (reusable procedures). Each agent’s workspace symlinks to this shared repository, maintaining a single source of truth while preserving workspace isolation.

Asynchronous multi-agent organization. Each agent runs in its own git worktree with access to the shared evaluator and memory via symlink. Agents coordinate indirectly through cross-agent knowledge transfer — one agent’s discoveries influence another’s search through written artifacts, not messaging protocols.

Heartbeat mechanism. Periodic interventions prevent drift into local minima: per-iteration reflection (note-taking), periodic consolidation (organizing notes into skills every ~10 evaluations), and stagnation-triggered redirection (strategic reassessment after 5 non-improving rounds).

Paradigm progression

CORAL distinguishes three paradigms for LLM-based search on open-ended problems:

  1. Fixed evolutionary search — external algorithm governs parent selection, prompt construction, and population updates. LLM acts only as mutation operator. (FunSearch, AlphaEvolve, EvoX)
  2. Autonomous single-agent evolution — one agent controls all four stages (retrieve, propose, evaluate, update), deciding timing and realization autonomously.
  3. Autonomous multi-agent evolution — multiple agents run asynchronously, coordinating exclusively through shared persistent memory.

Each stage grants more control to the agents and less to the scaffolding.

Results

New SOTA on 8 of 11 benchmarks across mathematical and systems optimization tasks (source).

MetricCORALFixed baselines
Improvement rate3-10x higherbaseline
Evaluations to converge5-2060-100
Kernel engineering (4-agent)1,103 cycles1,363 (prev. SOTA)
Polyominoes (4-agent, no web)84.287.0 (prev. SOTA)

Ablations confirm causal contributions: disabling knowledge accumulation degrades kernel engineering by 18.6% (1,350 → 1,601 cycles). Co-evolution outperforms best-of-4 independent single-agent runs on all tasks, confirming gains come from active cooperation, not just additional compute.

Why it works

Two mechanisms drive performance:

Local verification. Agents test code locally before consuming evaluations. On tasks with compilable code (Transaction: 61%, Kernel Engineering: 57%), this catches errors cheaply. Tasks with hidden evaluators (PRISM: 0%) cannot benefit.

Knowledge accumulation. On advanced tasks, agents create 10x more knowledge artifacts per attempt (0.55-0.68 vs. 0.05 on standard tasks). Knowledge access correlates with 55% improvement rate on kernel engineering vs. 26% baseline. Notes capture reusable insights — architectural bottlenecks, documented “never worked” approaches — rather than lightweight progress logs.

Connections

CORAL’s shared persistent memory is a direct application of the LLM wiki pattern — agents compound knowledge into a persistent repository that survives across attempts and agents. The heartbeat mechanism resembles a feedback loop operating at the meta-level: measure (evaluate), compare (reflect), adjust (pivot).

The paper’s emphasis on agent autonomy over fixed orchestration aligns with the knowledge base’s recurring theme that agents should handle execution while humans set direction — though CORAL takes this further by removing human involvement from the search loop entirely.

Limitations

  • Requires frontier models capable of complex coding-agent workflows; no small-model variant yet.
  • All agents start identically — no bootstrapped heterogeneity (distinct roles, private information).
  • Assumes a well-specified evaluator. Many real problems have incomplete or ambiguous evaluators that may need to co-evolve with solutions.