Augmented LLM
The foundational building block of agentic systems: a language model enhanced with retrieval, tools, and memory. Rather than using a bare LLM, every agentic pattern composes one or more augmented LLMs that can actively search for information, invoke external capabilities, and retain state across interactions.
Three augmentations
| Augmentation | What it does | Model behavior |
|---|---|---|
| Retrieval | Access to external knowledge sources | Model generates its own search queries |
| Tools | Interaction with APIs and services | Model selects and invokes appropriate tools |
| Memory | Persistent state across calls | Model determines what information to retain |
Current models perform all three actively — they are not passive recipients of injected context, but participants that direct the use of their augmentations.
Implementation priorities
Two aspects matter most:
- Tailor capabilities to the use case. Generic retrieval or tool sets underperform purpose-built ones.
- Provide a well-documented interface. The model’s effectiveness scales with how clearly its capabilities are described. This is the foundation of the Agent-Computer Interface concept.
Model Context Protocol
MCP is one approach to implementing augmentations — a standard protocol that lets developers integrate with third-party tools through a simple client. The key value is ecosystem interoperability: one integration works across all MCP-compatible tools.
Role in the pattern hierarchy
The augmented LLM sits at the base. All five agentic workflow patterns compose augmented LLMs:
- Prompt chaining: sequential augmented LLM calls
- Routing: classifier augmented LLM + specialized augmented LLMs
- Parallelization: multiple augmented LLMs running concurrently
- Orchestrator-workers: orchestrator augmented LLM delegating to worker augmented LLMs
- Evaluator-optimizer: generator + evaluator augmented LLMs in a loop
Connections
- The “brain” in Brain-Hands Decoupling is an augmented LLM — Claude plus harness loop, with sandboxes as the tools
- Hermes Agent is a concrete implementation: Claude + 40 tools + FTS5 memory + skills = one augmented LLM in a chat loop
- Context Window Compression addresses the memory augmentation — what happens when the model’s retained state exceeds capacity
- The concept connects to the LLM wiki pattern in LLM Knowledge Bases where the retrieval augmentation is a structured wiki rather than raw documents