Tool Loop Detection

concept
ai-agentssafetytool-callingloop-detectioncircuit-breaker

A content-aware system for detecting when an AI agent is stuck in a tool-calling loop, distinguishing between legitimate polling (where results change) and stuck loops (where the agent repeats the same action with no progress).

The problem

Agents with tool-calling capabilities frequently get stuck in loops. Common patterns:

  • Calling command_status repeatedly on a process that has already finished
  • Alternating between two tools that return the same result (ping-pong)
  • Retrying a failing command with identical arguments
  • Polling for a condition that will never be met

Simple call-count limits (“stop after 10 identical calls”) are either too restrictive (blocking legitimate polling of long-running processes) or too permissive (allowing 10 wasted calls before intervening). The key insight is that the loop is stuck when the results stop changing, not when the call count reaches a threshold.

Content-aware detection

The solution hashes both the tool call arguments and the tool result. A call is “no-progress” when the same tool + same arguments produces the same result hash as a previous call. This lets the system distinguish:

  • Polling with progress: calling process.log repeatedly, getting new output each time. Same args, different result hashes. Not a loop.
  • Stuck polling: calling process.log repeatedly, getting the same output. Same args, same result hash. This is a loop.

Four detectors

A sliding window (last 30 tool calls, configurable) feeds four independent detectors:

Generic repeat — Same tool + same argument hash called N times. Warns at 10, blocks at 20. Catches simple retry loops.

Known-poll no-progress — Specifically targets polling tools (command_status, process.poll, process.log). Tracks whether the result hash changes between calls. Blocks when the result is stable across N calls. Catches the most common stuck pattern: the agent polling a completed process.

Global circuit breaker — Absolute ceiling (30 calls) on identical no-progress outcomes, regardless of which detector caught them. Emergency stop for cases where individual detectors’ thresholds are too generous.

Ping-pong — Detects alternating A-B-A-B patterns with stable result hashes. Catches the agent oscillating between two tools that each trigger the other.

Escalating response

Detection produces three severity levels:

  • Warning (threshold 10) — logged, injected as a hint in the agent’s next context. The agent can self-correct.
  • Critical (threshold 20) — logged with full diagnostics. Stronger context injection.
  • Circuit break (threshold 30) — tool call is blocked. The agent receives an error explaining why.

The escalating response gives the agent a chance to self-correct before hard intervention. Most loops break at the warning stage because the injected hint tells the agent it’s repeating itself.

Implementation notes

Tool arguments are stable-serialized (deterministic key ordering) before hashing with SHA-256. Tool results are also hashed. The sliding window is bounded to prevent memory growth in very long sessions. Per-detector state resets when the agent makes a non-repeating call, so a brief legitimate repeat doesn’t permanently poison the detector.

Why this matters for agent safety

Tool loops are the most common failure mode in production agent systems. They waste tokens, consume rate limits, and can cause real-world side effects (sending duplicate messages, creating duplicate resources). A content-aware approach catches stuck loops early without restricting legitimate agent behavior like process monitoring or polling for external conditions.

  • OpenClaw implements this in src/agents/tool-loop-detection.ts
  • The agent learning loop is the opposite concern: making sure the agent does repeat successful patterns