Augmented LLM

concept
ai-agentsllmretrievaltoolsmemorybuilding-block

The foundational building block of agentic systems: a language model enhanced with retrieval, tools, and memory. Rather than using a bare LLM, every agentic pattern composes one or more augmented LLMs that can actively search for information, invoke external capabilities, and retain state across interactions.

Three augmentations

AugmentationWhat it doesModel behavior
RetrievalAccess to external knowledge sourcesModel generates its own search queries
ToolsInteraction with APIs and servicesModel selects and invokes appropriate tools
MemoryPersistent state across callsModel determines what information to retain

Current models perform all three actively — they are not passive recipients of injected context, but participants that direct the use of their augmentations.

Implementation priorities

Two aspects matter most:

  1. Tailor capabilities to the use case. Generic retrieval or tool sets underperform purpose-built ones.
  2. Provide a well-documented interface. The model’s effectiveness scales with how clearly its capabilities are described. This is the foundation of the Agent-Computer Interface concept.

Model Context Protocol

MCP is one approach to implementing augmentations — a standard protocol that lets developers integrate with third-party tools through a simple client. The key value is ecosystem interoperability: one integration works across all MCP-compatible tools.

Role in the pattern hierarchy

The augmented LLM sits at the base. All five agentic workflow patterns compose augmented LLMs:

  • Prompt chaining: sequential augmented LLM calls
  • Routing: classifier augmented LLM + specialized augmented LLMs
  • Parallelization: multiple augmented LLMs running concurrently
  • Orchestrator-workers: orchestrator augmented LLM delegating to worker augmented LLMs
  • Evaluator-optimizer: generator + evaluator augmented LLMs in a loop

Connections

  • The “brain” in Brain-Hands Decoupling is an augmented LLM — Claude plus harness loop, with sandboxes as the tools
  • Hermes Agent is a concrete implementation: Claude + 40 tools + FTS5 memory + skills = one augmented LLM in a chat loop
  • Context Window Compression addresses the memory augmentation — what happens when the model’s retained state exceeds capacity
  • The concept connects to the LLM wiki pattern in LLM Knowledge Bases where the retrieval augmentation is a structured wiki rather than raw documents