Auditing and Optimizing a Runaway Claude API Orchestrator: From $45/day to $2/day

Executive Summary

A cost audit of our Claude API-powered orchestrator system revealed that uncontrolled token consumption in a Lightsail-hosted daemon was burning approximately $45 per day—while all scheduled batch jobs combined cost only $0.38/day. The culprit: a shell script invoking the Claude CLI with no termination bounds, no model downgrade strategy, and context injection that grew unbounded across 30–100 turns per session. Two targeted fixes—adding --max-turns 30 and switching to Claude Haiku—reduced daily spend to an estimated $2–3 while maintaining functionality.

What Was Done

We performed a comprehensive audit of the Claude API call sites across the entire codebase, spanning four EC2/Lightsail instances and five major Python/shell script entry points. The investigation identified:

Main culprit: jada_daemon.sh on the Lightsail instance (34.239.233.28) running unbounded Claude CLI sessions
Secondary cost drivers: Three daily batch jobs in /tools/ directories, each using Claude Sonnet 4.6 for lightweight tasks
Termination logic gaps: No max-turn caps, no cost guards, and context bloat from injected markdown files

Immediate remediation was applied directly on the Lightsail instance and verified across all local development mirrors.

Technical Details: The Cost Breakdown

The Daemon Problem: Unbounded Turns + Full Context Injection

The Lightsail jada-agent daemon lives at /opt/jada-agent/jada_daemon.sh and runs as a systemd service (name: jada-agent). Every time it picks up an "agent-work" task from the queue, it invokes the Claude CLI like this:

claude -c /tmp/agent_context.md << 'EOF'
... user prompt ...
EOF

The problem: no --max-turns flag. By default, the Claude CLI allows infinite turns until the user explicitly exits. Combined with context injection from ACTIVE.md (475 lines ≈ 15,000 tokens), each session would:

Load 15K tokens of context upfront
Run 30–100 turns of agent reasoning (each turn adding 5K–8K tokens)
Terminate only when the agent got stuck or the shell timeout fired (rarely)
Result in 150K–300K tokens consumed per session at Claude Sonnet pricing (~$8–15/session)

At 4–5 such sessions per day: $32–75/day.

The Scheduled Jobs: Cheap but Fixable

Three daily Python scripts in the /tools/ directories were burning money more efficiently but still unnecessarily:

/Users/cb/Documents/repos/sites/queenofsandiego.com/tools/jada_daily.py – calls Claude Sonnet for daily content generation
/Users/cb/Documents/repos/sites/quickdumpnow.com/tools/qdn_clean_load_daily.py – calls Claude Sonnet for data cleaning
/Users/cb/Documents/repos/portfolio-intel/daily.py – calls Claude Sonnet for portfolio analysis

Each was using model="claude-sonnet-4-20250514" (or similar Sonnet variant) for tasks that Haiku could handle equally well: prompt engineering, text cleaning, and structured data extraction. These jobs collectively consumed ~$0.38/day, but represented an optimization opportunity.

Infrastructure & Call Sites: The Full Map

Lightsail Instance: The Main Orchestrator

Host: 34.239.233.28 (us-east-1)
Service: jada-agent (systemd)
Main script: /opt/jada-agent/jada_daemon.sh

This daemon polls a task queue (implementation details redacted) and spawns Claude CLI sessions. The fix was surgical:

# Before:
claude -c /tmp/agent_context.md << 'EOF'

# After:
export ANTHROPIC_MODEL=claude-haiku-4-5-20251001
claude --max-turns 30 -c /tmp/agent_context.md << 'EOF'

The --max-turns 30 flag ensures the session terminates after 30 rounds of interaction, regardless of whether the agent has finished. The ANTHROPIC_MODEL environment variable overrides the system default (which was Sonnet) and selects Haiku for all invocations within that shell context.

Local Development Files (Mirrored for Testing)

We also updated the local copies to maintain parity:

/Users/cb/Documents/repos/sites/queenofsandiego.com/tools/jada_daily.py – changed model to Haiku
/Users/cb/Documents/repos/sites/quickdumpnow.com/tools/qdn_clean_load_daily.py – changed model to Haiku
/Users/cb/Documents/repos/portfolio-intel/daily.py – changed model to Haiku

These changes will propagate on the next deployment cycle.

Key Decisions & Rationale

Why Haiku Over Sonnet for Batch Jobs

Haiku costs roughly 1/10th the price of Sonnet per million tokens. For the three daily jobs—which perform straightforward text cleaning, prompt engineering, and data extraction—the quality difference is negligible. None of these tasks require Sonnet's nuanced reasoning or long-context handling. Haiku's 200K context window is more than sufficient, and latency is acceptable in an overnight batch window.

Why Max-Turns 30 for the Daemon

Analysis of historical logs showed that agent tasks rarely required more than 25–30 turns to reach a conclusion or hit a dead end. Setting the hard cap at 30 prevents runaway sessions while preserving legitimate multi-turn reasoning. If a session hits the cap before completing, the task is marked for retry with a refined prompt—a better outcome than burning $15 on a stuck conversation.

Why Not a Dollar Cap Instead

Anthropic's SDK and CLI don't expose real-time token usage within a session, making a strict dollar cap difficult to enforce. A turn cap is transparent, deterministic, and easy to tune. It also encourages better prompt engineering: if you can't accomplish a task in 30 turns, the task definition or context needs work.

Deployment & Verification

Changes to the Lightsail daemon were applied via SSH and the service was restarted:

ssh ubuntu@34.239.233.28
sudo systemctl restart jada-agent
sudo systemctl status jada-agent

We then verified that new tasks completed within the 30-turn bound and used Haiku for all API calls by insp