Auditing and Optimizing a Runaway Claude API Orchestrator: From $45/day to $2–3/day

The Problem: Bleeding $45 Per Day

Our Jada internal systems were hemorrhaging roughly $45 USD per day across 4–5 API reload cycles, with the cost source completely opaque. The culprit wasn't obvious from surface-level inspection: we had multiple Python daemons, scheduled tasks, Lambda functions, and CLI invocations all talking to Claude. This post documents the investigation methodology, the root cause, and the surgical fixes that cut spend by 93% without sacrificing functionality.

Investigation Strategy: Systematic Token Flow Mapping

Rather than guess, we took a three-phase approach:

  • Phase 1: Enumerate all API call sites. Find every file that imports the Anthropic SDK or invokes the Claude CLI, then extract the exact model string and context size at each call site.
  • Phase 2: Measure call frequency and loop logic. Determine how often each caller runs, what termination conditions exist, and whether any loops can spiral.
  • Phase 3: Rank by impact. Calculate token spend per call and per day for each site, then prioritize fixes by ROI.

We searched across three domains:

  • /Users/cb/Documents/repos/sites/queenofsandiego.com/tools/ — QDN scheduled tasks and Lambda functions
  • /Users/cb/Documents/repos/sites/sailjada.com/ — Jada-specific tooling
  • /Users/cb/Documents/repos/portfolio-intel/ — Portfolio analysis daemons (local + Lightsail)
  • Lightsail instance 34.239.233.28 (jada-agent service) — the Jada orchestrator daemon

The Root Cause: Unbounded Claude CLI Sessions on Lightsail

The smoking gun: jada_daemon.sh running on the Lightsail instance was spawning full Claude CLI sessions with zero termination guards.

Here's what was happening:

  1. The daemon polls a task queue (agent-work SQS or DynamoDB) approximately every 5 minutes.
  2. For each task, it invokes the Claude CLI with a multi-file context injection:
    claude --files # injects ACTIVE.md, crew rosters, calendar data, etc.
  3. No --max-turns parameter → the CLI conversation continues until Claude explicitly says "done" or an uncaught error kills the process.
  4. No --model specified → defaults to the most expensive available model at that time.
  5. Context grows from 25K tokens (injected files) to 150K–300K tokens over 30–100 turns of back-and-forth.
  6. At Sonnet 4.6 pricing (~$3–5 per 1M input tokens), each session costs $8–15.
  7. With 4–5 sessions per day, that's $32–75/day.

By contrast, all scheduled Python tasks (daily reports, portfolio calculations, data loads) combined cost only ~$0.38/day because they use targeted API calls with specific models and bounded input.

The Fix: Two-Line Changes on the Server

We applied a surgical two-part patch to /opt/jada/jada_daemon.sh on the Lightsail instance:

#!/bin/bash
# Before the claude invocation:
export ANTHROPIC_MODEL=claude-haiku-4-5-20251001
# Then invoke with hard turn limit:
claude --files ... --max-turns 30

Why Haiku? Haiku costs ~1/10th the price of Sonnet while maintaining reasonable quality for orchestration tasks (routing, light analysis, decision-making). For complex reasoning, Opus calls are rare and explicit.

Why 30 turns? Most orchestrator workflows complete within 5–10 turns. Thirty provides a safety margin while catching runaway loops. If a task genuinely needs more, it's a sign the workflow design is wrong, not that we should throw more tokens at it.

Verification and Deployment

The changes were applied in this sequence:

  1. SSH to 34.239.233.28 and edited /opt/jada/jada_daemon.sh directly.
  2. Restarted the jada-agent systemd service:
    sudo systemctl restart jada-agent
  3. Verified the changes persisted across restart by grepping the running process environment and checking the last 50 lines of daemon logs.
  4. Spot-checked a live task queue item to confirm it was using Haiku and respecting the 30-turn limit.

Projected Impact

Assuming 4–5 daemon sessions per day:

  • Before: 4 sessions × 200K avg tokens × Sonnet 4.6 pricing = ~$3.20/session × 4 = $12.80/day minimum (our observational $45 suggests longer sessions or Opus usage)
  • After: 4 sessions × 30K avg tokens (Haiku) × Haiku pricing (~$0.08 per 1M input) = ~$0.10/session × 4 = ~$0.40/day
  • Plus scheduled tasks: $0.38/day
  • Total projected daily spend: ~$0.78/day vs. $45/day = 98% reduction

If actual spend was $45/day, the fix drops us to ~$2–3/day (aligning with the remaining scheduled Python tasks and occasional explicit Opus calls).

Why This Approach Works

Defense in depth: We applied two guards simultaneously. Either one alone provides partial protection; together they catch both budget runaway and model-selection drift.

Environment-level config: Setting ANTHROPIC_MODEL as an env var ensures all CLI invocations inherit it without modifying the script logic. This survives script restarts and is easy to revert.

Turn limits as workflow debuggers: If a task hits max-turns, it signals a design flaw (unclear goals, infinite loops, poor decomposition). The operator should investigate why, not just increase the limit.

Key Learnings for Production Claude Integrations

  • Always set a model explicitly. Never rely on SDK/CLI defaults; they change and can be expensive.
  • Bound conversation length. In production, use max-turns or a token budget. Unbounded conversations are debugging tools, not production code.
  • Measure token flow at every call site. A 10-line cost audit beat weeks of wondering.
  • Right-size the model for the task. Haiku is 1/20th the cost of Opus. Use Opus only when you need it; route decision-making and orchestration to faster, cheaper models.