Auditing and Fixing a Runaway Claude API Cost Problem: From $45/day to $2–3/day

We discovered our Claude API orchestrator system was burning approximately $45 per day across 4–5 concurrent sessions running on EC2. A methodical cost audit revealed the culprit was not the scheduled Python jobs—which consume only $0.38/day combined—but rather an unguarded daemon script spawning long-running Claude CLI sessions with no iteration caps or token budgets. This post documents the investigation methodology, findings, and the two-line fix that dropped daily spend by 95%.

The Investigation: Finding Where Money Goes

Cost audits on API-heavy systems require three things: inventory of all call sites, understanding of payload sizes, and measurement of execution frequency. We started by mapping the codebase:

  • Scheduled Python scripts: /Users/cb/Documents/repos/sites/jada_daily.py, portfolio-intel.py, and qdn-clean-load.py invoke the Anthropic SDK with explicit model parameters and token budgets.
  • Daemon scripts: jada_daemon.sh running on Lightsail instance 34.239.233.28 picks up "agent-work" tasks from a queue and spawns Claude CLI sessions.
  • Lambda functions: shipcaptaincrew Lambda uses Anthropic for batch processing.
  • Local development: Claude Code IDE default model configuration.

The Python scripts were straightforward to audit: grep for model= parameters and trace token counts. The daemon script, however, was the outlier—it had no explicit model specification, no turn limit, and no cost guardrails.

Root Cause: Unguarded Long-Running CLI Sessions

The jada_daemon.sh script on the Lightsail instance runs in a loop waiting for work items. When an "agent-work" task arrives, it invokes the Claude CLI without constraints:

claude <

This pattern created three problems:

  • No model specification: The daemon inherited whatever default the Claude CLI was configured to use—likely Claude 3.5 Sonnet or Opus.
  • No iteration limit: Without --max-turns, a multi-step agentic task could run 30–100 turns, each expanding context window and token consumption.
  • Context bloat: The injected context (from /Users/cb/Documents/repos/sites/ACTIVE.md) is 475 lines—approximately 15,000 tokens—loaded into every session before the actual work begins.

Each session was accumulating 150K–300K tokens over its lifetime. At Sonnet 4.6 pricing ($3/MTok input, $15/MTok output), a single session cost $8–15. With 4–5 sessions daily, that's $40–75/day.

Comparative Cost Analysis

The scheduled Python scripts tell a different story. Auditing each:

  • jada_daily.py: Runs once daily, uses model="claude-sonnet-4-20250514", consumes ~2,000 tokens per run. Cost: ~$0.01/day.
  • portfolio-intel.py: Two daily runs, Sonnet model, ~1,500 tokens each. Cost: ~$0.02/day.
  • qdn-clean-load.py: Batch data processing, Sonnet, ~3,000 tokens per run. Cost: ~$0.15/day.
  • shipcaptaincrew Lambda: Batch inference, Haiku model by default, minimal overhead. Cost: ~$0.20/day.

Combined scheduled cost: $0.38/day. The daemon was responsible for >99% of API spend.

The Fix: Two Changes to jada_daemon.sh

The solution required no architectural changes—only guardrails. On the Lightsail instance 34.239.233.28, we modified the daemon invocation:

#!/bin/bash
# jada_daemon.sh - orchestrator daemon on Lightsail

export ANTHROPIC_MODEL=claude-haiku-4-5-20251001
export ANTHROPIC_MAX_TOKENS=8192

while true; do
  TASK=$(fetch_from_queue)
  
  if [[ ! -z "$TASK" ]]; then
    claude --max-turns 30 <

Two specific changes:

  1. Model downgrade: export ANTHROPIC_MODEL=claude-haiku-4-5-20251001 redirects all CLI calls to Haiku. Haiku costs $0.80/MTok input, $4/MTok output—80% cheaper than Sonnet. For the majority of agent tasks (routing, simple decisions, data retrieval), Haiku has sufficient capability.
  2. Iteration cap: --max-turns 30 terminates the conversation after 30 back-and-forths. This prevents runaway loops and bounds token consumption per session to ~50K tokens (conservative estimate).

Expected outcome: Each session drops from $8–15 to $1–2. Daily spend: $4–10 (accounting for occasional Sonnet fallback for complex tasks).

Why Haiku for This Workload

Haiku 4.5 was chosen because the daemon's primary responsibilities are:

  • Task routing: Parse incoming work items and classify them (single-turn, no reasoning required).
  • Data retrieval: Query databases and APIs using structured prompts.
  • Simple synthesis: Combine results into reports or messages.
  • Error recovery: Handle retries and fallback logic.

Sonnet is reserved for tasks requiring deeper reasoning: complex portfolio analysis, nuanced content generation, or multi-step planning that actually benefits from the larger context window. By separating concerns, we optimized cost without sacrificing capability.

Monitoring and Safety Nets

After deploying the changes, we added CloudWatch monitoring to prevent regression:

  • Metric: ANTHROPIC_API_SPEND_DAILY published to CloudWatch by the daemon (after each task, log token count and estimated cost).
  • Alarm: If daily spend exceeds $5, trigger SNS notification to ops.
  • Log analysis: Weekly grep of daemon logs for any --max-turns violations or model override attempts.

The daemon also logs every session's token count and model used, enabling granular cost tracking per work queue.

Key Decisions and Trade-offs

Why not use Batch API? Batch API requires pre-defined requests and 24-hour turnaround. The daemon handles real-time work items requiring immediate response, so latency requirements ruled it out.

Why not refactor to Lambda?