Upgrading Claude Model Defaults in a Distributed Agent Architecture: From Haiku 4.5 to Sonnet 4.6
What Was Done
We evaluated and upgraded the default Claude model for the JADA orchestrator system from Claude Haiku 4.5 to Claude Sonnet 4.6. This involved:
- Updating
~/.claude/settings.jsonto persist the new model default across terminal sessions - Verifying EC2 instance orchestrator health via Lightsail API calls
- Documenting model selection trade-offs for a multi-agent system with complex task decomposition requirements
- Establishing baseline metrics for measuring orchestrator performance impact
The change enables the local Claude CLI to use Sonnet 4.6 for complex reasoning tasks while maintaining the ability to cascade to specialized agents downstream. This post covers why we made this decision, how it integrates with the existing orchestrator pattern, and what monitoring we need in place.
Technical Details: Model Configuration
Settings File Location and Format
The Claude CLI reads configuration from ~/.claude/settings.json. This file is user-local and persists across shell sessions. The update we applied sets the default model for all invocations of the claude command unless overridden by inline flags.
# View current settings
cat ~/.claude/settings.json
# Interactive configuration (if supported by your Claude CLI version)
claude /config
The settings file contains the model identifier as a string field. For this upgrade, the relevant change was updating the default_model field from claude-haiku-4-5 to claude-sonnet-4-6. This persists across new terminal sessions but does not affect sessions already initialized with the old model.
Command-Line Invocation Pattern
The development workflow uses a permissions-bypass flag to invoke Claude directly from a repository context:
cd ~/Documents/repos && claude --dangerously-skip-permissions
This pattern allows rapid iteration on complex orchestration logic without standard permission guards. The --dangerously-skip-permissions flag is intended for development environments only and should not be used in production pipelines. For production orchestrators running on EC2, we rely on IAM role-based access controls instead.
Infrastructure Context: EC2 Orchestrator Health Checks
Verifying Orchestrator Status
Before deploying model changes, we confirmed the jada-agent service was healthy on the orchestrator EC2 instance. This involved two verification approaches:
# SSH health check (with timeouts and strict host key verification disabled for dev)
ssh -o ConnectTimeout=5 -o StrictHostKeyChecking=no ubuntu@34.239.233.28 \
"systemctl status jada-agent.service 2>&1 | head -20"
# AWS Lightsail API status check
aws lightsail get-instance --instance-name jada-agent --region us-east-1 \
| grep -A 5 '"state"'
The Lightsail API call returns detailed state information including instance status, networking configuration, and recent operations. The instance at 34.239.233.28 in us-east-1 is where the JADA orchestrator service runs. The jada-agent.service systemd unit manages the orchestrator process lifecycle.
Why We Check Before Deploying
Model changes at the CLI level don't directly affect the EC2 orchestrator, but they do affect how we test, validate, and debug agent interactions before pushing code to production. A healthy orchestrator instance is a prerequisite for integration testing. We want to ensure that any new Sonnet-based reasoning flows can actually be tested against the real downstream infrastructure.
Architecture Pattern: Model Selection in Cascade Systems
Orchestrator vs. Specialist Agents
The JADA system uses an orchestrator pattern where:
- Orchestrator layer (local Claude CLI, now Sonnet 4.6): Decomposes complex user requests into sub-tasks, routes to specialist agents, aggregates results, and handles error recovery.
- Specialist agents (EC2 services): Handle specific domains (booking, payment, messaging, etc.), potentially using different models or deterministic logic.
The orchestrator's model choice is critical because it must understand task dependencies, handle ambiguous inputs, and reason about when to escalate or retry. Haiku 4.5 is fast and cheap but sometimes struggles with multi-step decomposition under complex constraints. Sonnet 4.6 significantly improves reasoning quality without sacrificing too much latency.
Cost vs. Capability Trade-Off
Upgrading the orchestrator model to Sonnet 4.6 increases token costs by approximately 2–3x compared to Haiku. However, this is a one-time cost at the orchestration layer, not a per-specialist-agent cost. The impact on EC2 infrastructure costs is minimal since the orchestrator doesn't directly scale with request volume—it's a single service handling routing and coordination.
For complex workflows (e.g., multi-leg bookings with constraint satisfaction), Sonnet's improved reasoning translates to fewer failed decompositions, fewer retries, and ultimately faster end-to-end completion. This indirect efficiency gain can offset the token cost increase.
Session Lifecycle and Model Persistence
Current Session Behavior
The settings.json update takes effect on new terminal sessions. A terminal session that was already initialized with Haiku 4.5 will continue using Haiku until the session is closed and a new one is opened. This is because the model choice is bound at initialization time, not re-evaluated on every command.
To test the new Sonnet 4.6 configuration immediately, open a new terminal and run:
cd ~/Documents/repos && claude --dangerously-skip-permissions
The new session will load settings.json and initialize with Sonnet 4.6.
File Descriptor Limits and Resource Planning
While not directly related to the model upgrade, we also noted the need to configure file descriptor limits for processes handling many concurrent connections. The command:
ulimit -n 2147483646
sets the per-process file descriptor limit to approximately 2^31 - 2, the maximum value for a 32-bit signed integer. This is relevant if the orchestrator or specialist agents expect to manage thousands of concurrent connections. The default limit on most Linux systems is 1024, which will cause socket or file open errors under load.
For the EC2 orchestrator instance, these limits should be configured in the systemd service file (/etc/systemd/system/jada-agent.service) using the LimitNOFILE directive, not rely on shell-level ulimit.
Key Decisions and Rationale
- Sonnet 4.6 over Opus 4.7: Opus offers marginal reasoning improvements over Sonnet but at roughly 3x the token cost. For orchestration (which is primarily decomposition and routing, not deep reasoning), Sonnet's cost-to-capability ratio is superior.
- Persistent settings over inline flags: Storing the model choice in settings.json reduces cognitive load during development. Developers don't need to remember to append