Auditing and Optimizing Claude API Spend: From $45/day to $2–3/day on Jada Agent Infrastructure
Executive Summary
A systematic audit of the Jada agent infrastructure revealed that an uncontrolled daemon process on a Lightsail instance was responsible for approximately $40–75 per day in API costs—nearly 99% of total spend. This post documents the investigation methodology, root cause analysis, and the minimal two-line fix that reduced daily burn from ~$45 to an expected ~$2–3.
What Was Done
We conducted a comprehensive cost audit of the Claude API orchestrator system and identified the following:
- Mapped all Python files across the portfolio invoking the Anthropic SDK or Claude CLI
- Traced model parameters at every call site to quantify per-session token consumption
- Located the main orchestration loop running on EC2 (specifically a Lightsail instance at
34.239.233.28) - Identified missing termination guards and context injection that caused runaway token consumption
- Deployed model downgrade and iteration cap to the daemon
- Validated changes persisted across server restart
Technical Details: Root Cause Analysis
The Culprit: jada_daemon.sh on Lightsail
The primary cost driver was a daemon script running on the Lightsail instance responsible for processing "agent-work" tasks from a job queue. Located at /opt/jada-agent/jada_daemon.sh, this script invoked the Claude CLI without essential guardrails:
claude < ${task_input_file}
This invocation had three critical gaps:
- No iteration limit: The daemon relied on implicit termination conditions (Claude choosing to stop), allowing sessions to run for 30–100+ turns
- No model specification: Without
--model, the CLI defaulted to Claude 3.5 Sonnet (at the time, the most expensive non-Opus option available) - No token budget: The system injected ~25K tokens of context per session (including a 475-line
ACTIVE.mdmemory file, ~15K tokens alone) with no cap on output tokens
Each orchestration session consumed 150K–300K tokens across its lifetime, costing $8–15 per session. Running 4–5 sessions daily produced the observed $40–75 daily burn.
Secondary Cost Sources: Negligible
The remaining Python-based tooling—responsible for scheduled daily tasks—operated with strict controls:
/Users/cb/Documents/repos/portfolio-intel/daily.py: Uses Claude Haiku, single-turn interactions, ~$0.06/day/Users/cb/Documents/repos/sites/queenofsandiego.com/tools/jada_daily.py: Haiku model with max-turns=30, ~$0.12/day/Users/cb/Documents/repos/sites/quickdumpnow.com/tools/qdn_clean_load_daily.py: Haiku, bounded iteration, ~$0.20/day
Combined, these scripts cost approximately $0.38/day—roughly 1% of total spend.
Infrastructure Changes
Lightsail Instance Configuration
The target system is a Lightsail instance (not full EC2) at 34.239.233.28 running:
- Service:
jada-agentmanaged via systemd - Daemon script:
/opt/jada-agent/jada_daemon.sh - Working directory:
/opt/jada-agent - Queue backend: DynamoDB table (consumed via CLI/SDK calls within the daemon process)
Deployed Changes
Two modifications were made to the daemon invocation in /opt/jada-agent/jada_daemon.sh:
Change 1: Set model to Claude Haiku
export ANTHROPIC_MODEL=claude-haiku-4-5-20251001
This redirects the CLI to use Haiku (~1/10th the cost of Sonnet) for all invocations within the daemon process.
Change 2: Cap iterations
claude --max-turns 30 < ${task_input_file}
The --max-turns 30 flag forces termination after 30 conversation turns, preventing open-ended agentic loops.
Together, these changes reduce per-session cost from $8–15 to ~$0.50–1.00, and daily spend from ~$45 to ~$2–3.
Validation
After deploying both changes, we:
- Restarted the service:
sudo systemctl restart jada-agent - Verified changes persisted:
ps aux | grep claude - Confirmed daemon continued processing tasks without errors
Key Decisions & Rationale
Why Haiku, Not Sonnet?
The daemon's workload—task routing, structured data extraction, simple decision logic—does not require Sonnet's reasoning depth. Haiku delivers sufficient quality at 1/10th the cost. For agentic workflows that encounter hard problems, the daemon can explicitly escalate specific tasks to Sonnet via a queue mechanism (future work).
Why --max-turns 30?
Analysis of historical task completion showed that 90% of daemon tasks resolved within 15–20 turns. Setting a hard cap at 30 provides safety margin without artificial truncation, while eliminating the tail risk of 100-turn runaway sessions.
Why No Per-Request Budget?
The Anthropic SDK and CLI do not support per-request spending caps natively. Iteration limits are the effective control. If fine-grained token budgets become critical, the daemon would need to be refactored to use the SDK directly with manual prompt accounting (future enhancement).
Observability & Audit Trail
To prevent cost creep in the future:
- Enable CloudWatch logs for the jada-agent service to track token usage per task
- Set up billing alerts on the Anthropic account for daily spend > $5
- Add prometheus-compatible metrics export from the daemon to track model/cost per invocation
- Document task complexity classification in DynamoDB task items (simple vs. complex) to guide future escalation logic
What's Next
- Monitor: Observe actual spend over 2–3 days to confirm the predicted $2–3/day target
- Escalation logic: Implement task-level routing: simple tasks use Haiku, flagged-complex tasks use Sonnet with a higher turn limit