```html

Diagnosing the Jada-Agent Orchestrator: SSH Access Patterns, Daemon Health Verification, and OAuth Token Recovery

During a routine health check of our jada-agent orchestrator instance (Lightsail, 34.239.233.28), we encountered a multi-layered infrastructure challenge: locating SSH credentials that weren't stored locally, establishing secure access through AWS APIs, and identifying a broken OAuth flow that's been silently degrading service availability. This post covers the diagnostic process, architectural decisions, and the actionable findings.

The Challenge: Missing SSH Keys and Access Patterns

The initial request was straightforward—SSH into the jada-agent daemon and verify service health. However, the private key wasn't present at the expected location ~/.ssh/jada-key. Rather than fail, we implemented a multi-path fallback strategy:

  • Path 1 (Local SSH): Check standard locations and repos.env for key path references
  • Path 2 (AWS SSM Session Manager): Use IAM-based session access without stored keys
  • Path 3 (Lightsail Temporary Credentials): Invoke the Lightsail API to generate ephemeral SSH access credentials

We chose Path 3 because it provides audit trails (CloudTrail logging), doesn't rely on long-lived key material, and aligns with our zero-trust infrastructure model. The Lightsail API endpoint GetInstanceAccessDetails returns a temporary RSA certificate paired with a private key, valid for 60 seconds—perfect for one-off diagnostics.

Technical Execution: Credential Generation and Connection

The workflow looked like this:

# Query Lightsail API for temporary access credentials
aws lightsail get-instance-access-details \
  --instance-name jada-agent-prod \
  --region us-east-1

# Parse the JSON response for the temporary key material
# Write to a temporary file (mode 0600 for security)
# SSH in using the certificate as an OpenSSH identity

ssh -i /tmp/jada_temp_key \
  -o "IdentityFile=/tmp/jada_temp_key" \
  ec2-user@34.239.233.28 \
  "systemctl status jada-agent.service"

The key insight here: AWS Lightsail's temporary credentials return a certificate, not a raw private key. This required configuring the SSH client to recognize the certificate format. Once connected, we had 60 seconds to gather diagnostics before the credentials expired—a tight window that forced us to batch multiple commands into a single SSH session.

Daemon Health Snapshot: What We Found

The jada-agent.service is operationally healthy, but with two critical issues:

  • Service Status: Active and running since May 10 (3 days uptime); load average 0.00; CPU ~0.65% average; memory footprint 144MB / 914MB
  • Session Utilization: 3 of 5 daily sessions consumed; two sessions hit the 30-turn Claude API limit and exited with code 1 (not a crash—the daemon logs and continues)
  • Task Processing: One successful session (Session 2) completed work on e-signature and crew page blockers, creating downstream tasks. Sessions 1 and 3 exhausted turn limits before completion

The turn-limit exits are a known pattern in our architecture. The daemon is working as designed—it respects the 30-turn safety limit to prevent runaway costs and infinite loops. However, this means complex tasks may require multiple sessions to complete, introducing latency.

Critical Issue: Broken Google OAuth Token for Port Sheet Sync

The most actionable finding: port_sheet_sync.py has been failing every 30-minute sync since at least May 13 afternoon with:

[port-sheet] token error: HTTP Error 400: Bad Request

This indicates the Google OAuth token stored for that service is expired or revoked. The port sheet syncs—which feed data into our downstream analytics and reporting workflows—have been silent-failing without alerting. This is a classic case of credential rot: the token was likely generated months ago, the refresh window passed, and no automated re-authentication mechanism caught it.

Root Cause: The port_sheet_sync.py script uses a stored OAuth token that requires periodic refresh. Our monitoring didn't surface the 400 errors as critical alerts; the script simply logged and continued, creating a blind spot.

Infrastructure Pattern: We store OAuth credentials in a secrets vault, but we don't monitor token health proactively. The fix requires:

  • Re-authentication via the OAuth consent flow (requires user interaction or service account credentials)
  • Storing the refreshed token back to the vault
  • Adding CloudWatch alarms for repeated 400/401 errors in sync logs

Key Decisions and Trade-Offs

1. Temporary vs. Long-Lived SSH Keys: We chose Lightsail's 60-second credentials over retrieving a permanent key from our secrets vault. Why? Reducing the blast radius—if credentials are accidentally logged or exposed during diagnostics, they're worthless after one minute.

2. Metrics Collection via API: Rather than relying solely on daemon logs, we pulled CPU, network, and status-check metrics directly from the Lightsail API. This bypasses any potential logging issues and gives us the authoritative cloud-provider view.

3. Batch Commands in Single SSH Session: Given the 60-second window, we combined multiple diagnostics into one call:

ssh ... "systemctl status jada-agent.service && \
  journalctl -u jada-agent.service -n 50 && \
  ps aux | grep jada && \
  curl http://localhost:9090/metrics"

This reduced authentication overhead and ensured all data was captured before credential expiry.

What's Next: Addressing the Token Degradation

The port_sheet_sync OAuth failure needs immediate attention:

  • Short-term: Trigger a manual re-authentication flow for the Google API client (stored in your secrets manager) and test a sync cycle
  • Mid-term: Implement OAuth token health checks as a separate monitoring task that alerts on repeated 400 errors
  • Long-term: Migrate to service account credentials for machine-to-machine integrations, eliminating the need for user-initiated consent flows

The daemon itself is solid. The turn-limit exits are a feature, not a bug—they prevent runaway costs. But the silent failure of a critical sync job is exactly the kind of infrastructure debt that accumulates into surprises.

```