Credential Pool Pattern

concept Apr 9, 2026

infrastructuremulti-providerfailoverapi-management

A pattern for managing multiple API credentials for the same provider with automatic failover, enabling higher throughput and resilience for AI agent infrastructure.

Structure

A pool holds N credentials for a given provider. Each credential has a status (OK, exhausted) and usage metadata. When a credential hits a rate limit (429) or quota (402), it is marked exhausted with a cooldown timer (e.g. 1 hour for rate limits, 24 hours for quota). The pool selects the next available credential according to a configurable strategy.

Selection strategies

Strategy	Behavior	Use case
Fill-first	Use first credential until exhausted, then next	Minimize active credentials, simplify billing
Round-robin	Rotate through all credentials	Distribute load evenly across quota buckets
Random	Random selection	Spread load without coordination state
Least-used	Pick credential with lowest usage count	Maximize aggregate throughput

Implementation in Hermes Agent

Hermes Agent implements this in agent/credential_pool.py (47KB). CredentialPool manages PooledCredential objects with status tracking, cooldown timers, and strategy selection. Supports both OAuth tokens (with refresh) and static API keys. Persistence via JSON serialization. Agent key auto-refresh with clock skew tolerance.

When to use

Any agent system that:

Uses multiple API keys for the same provider (personal + team + org)
Needs to survive rate limits without blocking the user
Rotates across free-tier keys during development
Manages credentials for multiple users in a shared gateway

The pattern pairs well with multi-provider routing (e.g. Hermes’s smart model routing), where the pool handles intra-provider failover and the router handles inter-provider failover.

Two-tier failover

OpenClaw extends credential pooling into a two-tier failover system:

Tier 1 — auth profile rotation (inner loop). Within a single model, the system rotates through multiple API credentials. Each profile tracks cooldown state, failure reason (closed union: auth, auth_permanent, billing, rate_limit, overloaded, timeout, model_not_found, format, session_expired, unknown), and usage timestamps. Different failure reasons drive different behavior: rate_limit allows cooldown probes, auth_permanent skips immediately.

Tier 2 — model fallback (outer loop). When all auth profiles for a model are exhausted, the system tries the next model in a configured fallback chain. An explicitly empty fallbacks: [] array disables fallback entirely.

Cooldown probing. When all profiles for a provider are cooldowned but expiry is near (within 2 minutes), the system makes an optimistic probe attempt, throttled to once per 30 seconds per provider. This avoids both premature retries and unnecessary waiting. Probe state is tracked in-memory with a 256-key cap and 24-hour TTL.

Runtime auth refresh. For token-exchange providers (Google Vertex, AWS Bedrock), the auth controller schedules proactive token refreshes before expiry via setTimeout, with one retry on failure. This prevents mid-conversation auth failures.

Backlinks

Hermes Agent Credential pool — Multi-credential failover with configurable strategies (fill-first, round-robin, random, least-used). Tracks per-credential status (OK, exhausted) with cooldown timers (1h for rate limits, 24h for quota). Supports OAuth refresh and custom endpoints. See credential pool pattern. →

Managed Agents Architecture

The meta-harness concept: opinionated about interfaces, not implementations

Harness staleness: why harnesses need to be swappable

Brain-hands decoupling: the core architectural pattern

Hermes Agent takes a different approach — bundling terminal backends, memory, and skills into a single agent process, though its 6 terminal backends (local, Docker, SSH, Daytona, Modal, Singularity) parallel the "many hands" idea

Context window compression: the session-as-context-object pattern is an alternative to summarize-and-discard

Credential pool pattern: solves a related problem (safe credential access) with a different mechanism (failover rotation vs. vault isolation)

The user's observation that an AI agent is always physical — Managed Agents makes the brain/hands split explicit, treating execution environments as abstract "hands" regardless of substrate

→

AI Agents

Capsules Isolated Environments for AI Agents — isolated, reproducible environments for agents

Clawdbot Capsules and Self Evolving Agents — minimal core + self-development

My Digital Twin Starts With Claude Code — personal knowledge graph from Claude Code sessions

Hermes Agent — self-improving multi-platform agent framework with learning loop, skills, and RL training (Nous Research)

OpenClaw — personal AI assistant gateway: 24+ messaging channels, typed plugin adapters, embedded agent runtime, native companion apps

Agent Learning Loop — memory + skills + session search forming a closed self-improvement cycle

Sleep-Phase Memory Consolidation — offline three-phase (light/REM/deep) memory consolidation with evidence accumulation thresholds

Context Window Compression — auto-summarizing old conversation turns to stay within context limits

Credential Pool Pattern — multi-credential failover with selection strategies for agent infrastructure

Managed Agents Architecture — Anthropic's hosted long-horizon agent service: brain/hands/session decoupling

Brain-Hands Decoupling — separating reasoning from execution behind stable interfaces

Meta-Harness — system designed for harnesses that don't exist yet

Harness Staleness — harnesses encode assumptions that go stale as models improve

Session as Context Object — durable event log as interrogable context outside the context window

Multi-Channel AI Gateway — single-daemon architecture routing one agent across many messaging platforms

Channel Adapter Pattern — typed composition of optional interfaces for messaging channel plugins

→

OpenClaw

Architecturally parallel to Hermes Agent: same multi-platform gateway + skills + tools + session pattern, different language (TypeScript vs Python) and different design philosophy (composition-based adapters vs inheritance-based ABC). OpenClaw has stricter plugin boundaries and a richer native companion app story; Hermes Agent has a built-in learning loop and RL training infrastructure.

The channel adapter pattern is a typed variant of the plugin composition approach described in managed agents architecture — tools and capabilities compose without inheritance.

Skills system parallels Hermes Agent's SKILL.md format and the agent learning loop pattern, though OpenClaw's skills are not self-created by the agent (they're authored by users or shipped by plugins).

The gateway-as-control-plane design echoes the brain-hands decoupling pattern: reasoning (agent runtime) and execution (channel adapters, tool backends, nodes) are separated behind stable typed interfaces.

Context compaction mirrors context window compression — auto-summarizing when approaching limits, preserving recent context, keeping cached prefixes stable. OpenClaw adds staged summarization for large histories and an identifier preservation policy.

The dreaming system (memory-core plugin) implements sleep-phase memory consolidation — three-phase offline consolidation (light/REM/deep) with evidence accumulation thresholds before durable promotion. This is the offline complement to the agent learning loop.

Tool loop detection uses content-aware result hashing to distinguish legitimate polling from stuck loops, with four independent detectors and escalating response (warn -> critical -> circuit break).

The agent exec policy implements three-axis human-in-the-loop tool approval (security x ask x fallback) with fail-closed composition across host and session policies.

Two-tier model failover extends the credential pool pattern: auth profile rotation (inner loop) + model fallback chain (outer loop) + cooldown probing near expiry.

→