Tuning Claude Agent Resource Limits and Model Selection for Multi-Agent Orchestration on AWS Lightsail

When scaling a multi-agent system like JADA's booking orchestrator, two critical decisions emerge: how to configure shell resource limits for concurrent file handle management, and which Claude model to deploy across your agent hierarchy. This post walks through the infrastructure and architectural decisions we made to optimize our agent orchestration pattern running on AWS Lightsail.

Understanding ulimit -n and Why We Set It to 2147483646

The command ulimit -n 2147483646 sets the maximum number of open file descriptors for a shell session. This matters significantly for agent-based systems:

  • What it does: The -n flag specifically constrains file descriptors (files, sockets, pipes, TCP connections). The value 2147483646 is approximately 2^31 - 2, the practical maximum for a signed 32-bit integer.
  • Default limits: Most Linux systems ship with defaults of 256 or 1024 per-process. When orchestrating multiple agents, each spawning HTTP clients, database connections, and inter-process pipes, you'll hit this ceiling quickly.
  • Why it matters for agents: Each agent in the JADA system may maintain concurrent connections to AWS APIs, external booking services, and message queues. Without raising ulimit -n, you get "Too many open files" errors when agents exceed the default limit.

We apply this in our Lightsail instance initialization script (/opt/jada/bootstrap.sh) before starting the orchestrator service:

#!/bin/bash
ulimit -n 2147483646
systemctl start jada-agent.service

This ensures that when jada-agent.service (our orchestrator systemd service) spawns, it inherits the relaxed file descriptor limit.

Model Selection: Haiku vs. Sonnet 4.6 for Orchestration

Our initial deployment used Claude Haiku 4.5 as the default agent model, invoked via:

cd ~/Documents/repos && claude --dangerously-skip-permissions

However, Haiku's 4K context window and reduced reasoning capacity became a bottleneck for complex task decomposition. We needed to upgrade to Sonnet 4.6 for orchestration tasks while keeping cost-conscious decisions about specialist agents.

Configuration Update: Settings File

The Claude CLI reads its default model from ~/.claude/settings.json. We updated this file to set Sonnet 4.6 as the default:

{
  "model": "claude-sonnet-4-6",
  "unsafe_allow_dangerous_features": true
}

This change persists across new shell sessions. The unsafe_allow_dangerous_features flag aligns with the --dangerously-skip-permissions CLI argument, which is essential for our orchestrator pattern where agents need to execute system commands and file operations without interactive approval dialogs.

Why Sonnet 4.6 for Orchestrators

  • Task decomposition: Sonnet excels at breaking down complex booking workflows into parallel, sequential, and conditional task chains. Haiku's smaller context frequently truncated multi-step reasoning.
  • Cost trade-off: Sonnet is ~2-3x more expensive per token than Haiku, but for orchestrators that run infrequently and decompose once per request, the per-request cost is acceptable. Specialist agents can remain on Haiku.
  • Latency tolerance: Orchestration is typically not latency-critical; a few extra milliseconds for better reasoning is a reasonable trade.

Infrastructure: Lightsail Instance Configuration

Our orchestrator runs on a single AWS Lightsail instance (jada-agent) in the us-east-1 region. We verify its status with:

aws lightsail get-instance \
  --instance-name jada-agent \
  --region us-east-1

The instance configuration includes:

  • Instance type: Ubuntu 22.04 LTS, 4GB RAM, 2 vCPU
  • Service: jada-agent.service (systemd unit at /etc/systemd/system/jada-agent.service)
  • Working directory: /opt/jada (where agent code and models live)
  • Environment variables: Claude API keys, AWS credentials, and orchestrator flags set in /etc/environment or the service unit file

We validate the service status with SSH:

ssh -o ConnectTimeout=5 -o StrictHostKeyChecking=no ubuntu@34.239.233.28 \
  "systemctl status jada-agent.service"

The IP 34.239.233.28 is the static public IP assigned to the Lightsail instance. StrictHostKeyChecking is disabled for automated monitoring scripts (host key is verified out-of-band).

Multi-Agent Architecture: Orchestrator + Specialists

Our orchestration pattern follows a hierarchical design:

  1. Tier 1 (Orchestrator): Runs on Lightsail, uses Sonnet 4.6, responsible for task decomposition and agent dispatch. Located at /opt/jada/orchestrator.py.
  2. Tier 2 (Specialist Agents): Spawned by the orchestrator, may run on the same Lightsail instance or be invoked as Lambda functions. Use Haiku 4.5 for cost efficiency (they perform narrow, well-defined tasks like "fetch hotel availability" or "parse booking confirmation").
  3. Tier 3 (External Services): Real booking APIs, database queries, message queues—agents interact with these, not with each other directly.

Task data flows: Orchestrator → Specialist agents → External APIs → Specialist agents → Orchestrator → Client.

Key Decisions and Trade-offs

  • Single Lightsail instance vs. Auto Scaling Group: We chose a single instance because orchestration traffic is bursty and low-volume. ASG complexity wasn't justified. If throughput increases 10x, we'll migrate to a containerized setup (ECS) with auto-scaling.
  • File descriptor limit at 2^31 - 2: This is the OS maximum on 32-bit systems. It's excessive for current load (we see ~200 concurrent connections), but cheap insurance against future scaling. No downside to setting it high.
  • Sonnet for orchestration only: We don't upgrade specialist agents to Sonnet because they run numerous times per request. Keeping them on Haiku reduces marginal cost while Sonnet's reasoning is wasted on narrow tasks.
  • Settings file vs. CLI flag: We set the default in ~/.claude/settings.json rather than using --model flags in every invocation. This centralizes configuration and is less error-prone when scripts are modified by multiple engineers.

Verification