Evaluator-Optimizer Pattern

concept
ai-agentsworkflowspatternsiterationfeedback-loop

An agentic workflow pattern where one LLM generates a response and another evaluates it, iterating in a loop until quality criteria are met. One of five agentic workflow patterns described in Building Effective Agents.

Structure

Generator LLM --> Output --> Evaluator LLM --> Feedback
     ^                                           |
     +-------------------------------------------+
                  (loop until satisfied)

The generator produces; the evaluator critiques. The generator incorporates feedback and tries again. The loop terminates when the evaluator approves or a maximum iteration count is reached.

When to use

Two diagnostic questions:

  1. Can a human articulate feedback that demonstrably improves the output? If so, an LLM evaluator can likely provide similar feedback.
  2. Are there clear evaluation criteria? The evaluator needs something concrete to assess against.

If both are true, the pattern is a good fit. If the quality difference between iterations is marginal, the added latency and cost aren’t justified.

Examples

  • Literary translation — nuances that the translator LLM misses initially, but an evaluator can identify and critique
  • Complex search — multiple rounds of searching and analysis, with the evaluator deciding whether further searches are warranted
  • Code generation — generate code, run tests, use test results as evaluation, regenerate

The feedback loop connection

This pattern is a feedback loop in the control theory sense: output is measured against a reference (the evaluation criteria), the error signal (evaluator feedback) drives correction, and the system iterates toward convergence.

This makes it structurally identical to:

  • The Agent Learning Loop — generate, evaluate, improve, store as skill
  • The iterative refinement in human writing — draft, review, revise
  • The Heartbeat Mechanism in CORAL — periodic reflection that redirects effort when progress stalls

The key difference from the learning loop: the evaluator-optimizer is stateless across tasks. It refines a single output. The learning loop persists improvements across tasks via memory and skills.

Connections