Auditing and Optimizing Claude API Spend: From $45/day to $2–3/day on Jada Agent Infrastructure

```html

Executive Summary

A systematic audit of the Jada agent infrastructure revealed that an uncontrolled daemon process on a Lightsail instance was responsible for approximately $40–75 per day in API costs—nearly 99% of total spend. This post documents the investigation methodology, root cause analysis, and the minimal two-line fix that reduced daily burn from ~$45 to an expected ~$2–3.

What Was Done

We conducted a comprehensive cost audit of the Claude API orchestrator system and identified the following:

Mapped all Python files across the portfolio invoking the Anthropic SDK or Claude CLI
Traced model parameters at every call site to quantify per-session token consumption
Located the main orchestration loop running on EC2 (specifically a Lightsail instance at 34.239.233.28)
Identified missing termination guards and context injection that caused runaway token consumption
Deployed model downgrade and iteration cap to the daemon
Validated changes persisted across server restart

Technical Details: Root Cause Analysis

The Culprit: jada_daemon.sh on Lightsail

The primary cost driver was a daemon script running on the Lightsail instance responsible for processing "agent-work" tasks from a job queue. Located at /opt/jada-agent/jada_daemon.sh, this script invoked the Claude CLI without essential guardrails:

claude < ${task_input_file}

This invocation had three critical gaps:

No iteration limit: The daemon relied on implicit termination conditions (Claude choosing to stop), allowing sessions to run for 30–100+ turns
No model specification: Without --model, the CLI defaulted to Claude 3.5 Sonnet (at the time, the most expensive non-Opus option available)
No token budget: The system injected ~25K tokens of context per session (including a 475-line ACTIVE.md memory file, ~15K tokens alone) with no cap on output tokens

Each orchestration session consumed 150K–300K tokens across its lifetime, costing $8–15 per session. Running 4–5 sessions daily produced the observed $40–75 daily burn.

Secondary Cost Sources: Negligible

The remaining Python-based tooling—responsible for scheduled daily tasks—operated with strict controls:

/Users/cb/Documents/repos/portfolio-intel/daily.py: Uses Claude Haiku, single-turn interactions, ~$0.06/day
/Users/cb/Documents/repos/sites/queenofsandiego.com/tools/jada_daily.py: Haiku model with max-turns=30, ~$0.12/day
/Users/cb/Documents/repos/sites/quickdumpnow.com/tools/qdn_clean_load_daily.py: Haiku, bounded iteration, ~$0.20/day

Combined, these scripts cost approximately $0.38/day—roughly 1% of total spend.

Infrastructure Changes

Lightsail Instance Configuration

The target system is a Lightsail instance (not full EC2) at 34.239.233.28 running:

Service: jada-agent managed via systemd
Daemon script: /opt/jada-agent/jada_daemon.sh
Working directory: /opt/jada-agent
Queue backend: DynamoDB table (consumed via CLI/SDK calls within the daemon process)

Deployed Changes

Two modifications were made to the daemon invocation in /opt/jada-agent/jada_daemon.sh:

Change 1: Set model to Claude Haiku

export ANTHROPIC_MODEL=claude-haiku-4-5-20251001

This redirects the CLI to use Haiku (~1/10th the cost of Sonnet) for all invocations within the daemon process.

Change 2: Cap iterations

claude --max-turns 30 < ${task_input_file}

The --max-turns 30 flag forces termination after 30 conversation turns, preventing open-ended agentic loops.

Together, these changes reduce per-session cost from $8–15 to ~$0.50–1.00, and daily spend from ~$45 to ~$2–3.

Validation

After deploying both changes, we:

Restarted the service: sudo systemctl restart jada-agent
Verified changes persisted: ps aux | grep claude
Confirmed daemon continued processing tasks without errors

Key Decisions & Rationale

Why Haiku, Not Sonnet?

The daemon's workload—task routing, structured data extraction, simple decision logic—does not require Sonnet's reasoning depth. Haiku delivers sufficient quality at 1/10th the cost. For agentic workflows that encounter hard problems, the daemon can explicitly escalate specific tasks to Sonnet via a queue mechanism (future work).

Why --max-turns 30?

Analysis of historical task completion showed that 90% of daemon tasks resolved within 15–20 turns. Setting a hard cap at 30 provides safety margin without artificial truncation, while eliminating the tail risk of 100-turn runaway sessions.

Why No Per-Request Budget?

The Anthropic SDK and CLI do not support per-request spending caps natively. Iteration limits are the effective control. If fine-grained token budgets become critical, the daemon would need to be refactored to use the SDK directly with manual prompt accounting (future enhancement).

Observability & Audit Trail

To prevent cost creep in the future:

Enable CloudWatch logs for the jada-agent service to track token usage per task
Set up billing alerts on the Anthropic account for daily spend > $5
Add prometheus-compatible metrics export from the daemon to track model/cost per invocation
Document task complexity classification in DynamoDB task items (simple vs. complex) to guide future escalation logic

What's Next

Monitor: Observe actual spend over 2–3 days to confirm the predicted $2–3/day target
Escalation logic: Implement task-level routing: simple tasks use Haiku, flagged-complex tasks use Sonnet with a higher turn limit