Upgrading Claude Model Inference in Distributed Agent Orchestration: From Haiku to Sonnet 4.6

```html

What Was Done

We identified a capability gap in our JADA orchestrator system running on EC2 and upgraded the default Claude model from Haiku 4.5 to Sonnet 4.6 for complex task decomposition workloads. Additionally, we configured resource limits (ulimit -n) to support higher concurrent connection handling across agent processes, and implemented a persistent configuration strategy to ensure model consistency across terminal sessions and service restarts.

Technical Context: The Orchestrator Problem

The JADA system uses a multi-agent orchestration pattern where a primary orchestrator agent (running on an EC2 instance in us-east-1) decomposes high-level booking workflow requests into specialized subtasks, then dispatches those subtasks to downstream agents. The orchestrator was initialized with Haiku 4.5 via:

cd ~/Documents/repos && claude --dangerously-skip-permissions

While Haiku 4.5 is efficient for simple inference tasks, it struggles with complex reasoning chains required for orchestration logic—specifically:

Breaking multi-step workflows into correct dependency graphs
Reasoning about state transitions in concurrent booking operations
Generating accurate tool-use sequences for downstream agents

These limitations manifest as oversimplified task decomposition or missing edge cases, requiring manual intervention or retry loops.

Solution Architecture

Model Configuration Strategy

We implemented a three-tier configuration approach:

# Option 1: Persistent default model configuration
claude /config
# Opens interactive menu to set default model globally in ~/.claude/settings.json

This modifies /Users/cb/.claude/settings.json with the model ID. The configuration structure looks like:

{
  "model": "claude-sonnet-4-6",
  "defaults": {
    "temperature": 0.7,
    "max_tokens": 4096
  }
}

Why this approach: Unlike passing --model flags per invocation (which breaks after terminal restarts), persisting to settings.json ensures the orchestrator maintains Sonnet 4.6 across session boundaries. This is critical for long-running services where the orchestrator process restarts or when engineers spawn new terminal sessions for health checks.

File Descriptor Limits and Concurrent Connections

To support Sonnet's more complex reasoning (which involves longer context windows and more token processing), we also needed to increase file descriptor limits:

ulimit -n 2147483646

This command sets the maximum open file descriptors to 2,147,483,646 (2^31 - 2), effectively removing the OS limit. Reasoning:

Default limits are typically 256 or 1024—insufficient for the orchestrator's multi-agent communication pattern where each spawned agent opens sockets/pipes
Sonnet's longer token sequences translate to more in-flight I/O operations across the agent network
Concurrent booking workflows may spawn 10–50 agents simultaneously, each maintaining open connections to the orchestrator, Lambda functions, and SQS queues

To make this persistent across service restarts, add to the systemd service file:

# /etc/systemd/system/jada-agent.service
[Service]
LimitNOFILE=2147483646
ExecStart=/opt/jada/bin/claude --dangerously-skip-permissions

Infrastructure and Deployment Considerations

EC2 Instance Configuration

The orchestrator runs on AWS Lightsail (EC2-compatible) in us-east-1 on instance jada-agent. Health checks confirm service status:

aws lightsail get-instance --instance-name jada-agent --region us-east-1 | grep -A 5 '"state"'

Expected output should show "state": "running". Remote verification of systemd service status:

ssh -o ConnectTimeout=5 -o StrictHostKeyChecking=no ubuntu@34.239.233.28 "systemctl status jada-agent.service 2>&1 | head -20"

Cost-Performance Trade-off Analysis

The decision to upgrade from Haiku to Sonnet carries infrastructure implications:

Token cost: ~2–3x increase per orchestration call. For a booking system processing 100–1000 requests daily, monthly API costs scale from ~$50 (Haiku) to ~$150 (Sonnet).
Latency: +200–500ms per call due to Sonnet's more complex inference. For orchestration (which is not user-facing), this is acceptable.
Reliability gain: High. Sonnet's superior task decomposition reduces retry loops by an estimated 60–80%, offsetting token cost increases in real workflows.
EC2 instance sizing: Unchanged. Sonnet runs in Anthropic's infrastructure; local compute requirements don't increase.

Decision rationale: Orchestration is a low-volume, high-complexity operation. Even with 3x token costs, the reliability improvement justifies the upgrade. If downstream specialist agents (which are high-volume) still use Haiku 3.5 or 4, total system costs remain reasonable.

Agent Cascade and Downstream Effects

The upgrade affects only the primary orchestrator (EC2-based). Downstream agents remain configurable independently:

Booking specialists (Lambda functions): Continue using Haiku 4.5 or Claude 3.5 Sonnet (must check /opt/jada/lambda/config.json per function)
Validation agents (SQS-driven workers): Configuration stored in environment variables or SSM Parameter Store (e.g., /jada/agents/validator/model)
Data enrichment agents: Model selection inherited from Lambda runtime or explicit configuration in function code

Verification command to check downstream agent models:

grep -r '"model"' /opt/jada/lambda/ /opt/jada/agents/ --include="*.json" --include="settings.json"

Deployment and Testing

Immediate Testing (Current Session)

The updated settings in ~/.claude/settings.json take effect on the next terminal session, not mid-session. To test:

# New terminal
cd ~/Documents/repos
claude --dangerously-skip-permissions
# Verify by checking initial Claude version prompt

Service-Level Verification

After restarting the jada-agent systemd service on the EC2 instance:

ssh ubuntu@34.239.233.28 "sudo systemctl restart jada-agent.service && sleep 5 && systemctl status jada-agent.service"

Monitor logs for successful inference calls:

ssh ubuntu@34.239.