Credential Pool Pattern
A pattern for managing multiple API credentials for the same provider with automatic failover, enabling higher throughput and resilience for AI agent infrastructure.
Structure
A pool holds N credentials for a given provider. Each credential has a status (OK, exhausted) and usage metadata. When a credential hits a rate limit (429) or quota (402), it is marked exhausted with a cooldown timer (e.g. 1 hour for rate limits, 24 hours for quota). The pool selects the next available credential according to a configurable strategy.
Selection strategies
| Strategy | Behavior | Use case |
|---|---|---|
| Fill-first | Use first credential until exhausted, then next | Minimize active credentials, simplify billing |
| Round-robin | Rotate through all credentials | Distribute load evenly across quota buckets |
| Random | Random selection | Spread load without coordination state |
| Least-used | Pick credential with lowest usage count | Maximize aggregate throughput |
Implementation in Hermes Agent
Hermes Agent implements this in agent/credential_pool.py (47KB). CredentialPool manages PooledCredential objects with status tracking, cooldown timers, and strategy selection. Supports both OAuth tokens (with refresh) and static API keys. Persistence via JSON serialization. Agent key auto-refresh with clock skew tolerance.
When to use
Any agent system that:
- Uses multiple API keys for the same provider (personal + team + org)
- Needs to survive rate limits without blocking the user
- Rotates across free-tier keys during development
- Manages credentials for multiple users in a shared gateway
The pattern pairs well with multi-provider routing (e.g. Hermes’s smart model routing), where the pool handles intra-provider failover and the router handles inter-provider failover.
Two-tier failover
OpenClaw extends credential pooling into a two-tier failover system:
Tier 1 — auth profile rotation (inner loop). Within a single model, the system rotates through multiple API credentials. Each profile tracks cooldown state, failure reason (closed union: auth, auth_permanent, billing, rate_limit, overloaded, timeout, model_not_found, format, session_expired, unknown), and usage timestamps. Different failure reasons drive different behavior: rate_limit allows cooldown probes, auth_permanent skips immediately.
Tier 2 — model fallback (outer loop). When all auth profiles for a model are exhausted, the system tries the next model in a configured fallback chain. An explicitly empty fallbacks: [] array disables fallback entirely.
Cooldown probing. When all profiles for a provider are cooldowned but expiry is near (within 2 minutes), the system makes an optimistic probe attempt, throttled to once per 30 seconds per provider. This avoids both premature retries and unnecessary waiting. Probe state is tracked in-memory with a 256-key cap and 24-hour TTL.
Runtime auth refresh. For token-exchange providers (Google Vertex, AWS Bedrock), the auth controller schedules proactive token refreshes before expiry via setTimeout, with one retry on failure. This prevents mid-conversation auth failures.