```html

Diagnosing and Resolving OAuth Token Expiration in the JADA Agent Orchestrator

During a routine health check of the JADA agent daemon running on AWS Lightsail instance 34.239.233.28, we discovered a critical but isolated failure in the port_sheet_sync.py background task. This post covers the diagnostic approach, infrastructure interactions, and the remediation strategy for OAuth token lifecycle management in long-running orchestrator systems.

What Happened

The JADA agent daemon—which runs on a Lightsail instance and coordinates task distribution, session management, and background syncs—was healthy overall with 11 days of uptime and active task processing. However, the port_sheet_sync.py script, responsible for syncing booking and scheduling data to a Google Sheet, had been failing consistently every 30 minutes with:

[port-sheet] token error: HTTP Error 400: Bad Request

This indicated an expired or revoked Google OAuth token. While the daemon itself remained operational and continued processing agent tasks, the background sync critical path had degraded silently.

Diagnostic Approach

Remote Access via Lightsail API

The initial challenge was SSH access: the private key (jada-key) was not stored locally in ~/.ssh/. Rather than store sensitive keys in filesystem locations, we leveraged AWS Lightsail's native access mechanism:

  • Called the Lightsail GetInstanceAccessDetails API endpoint to request temporary SSH credentials
  • Received a one-time certificate and temporary access window (typically valid for 1 hour)
  • Used the temporary certificate paired with the instance's public key to establish an SSH session

This approach avoids persistent key management on developer machines while maintaining audit trails through AWS CloudTrail.

Service Health Inspection

Once connected, we collected the following diagnostics:

  • Service Status: Checked systemctl status jada-agent.service to confirm active/running state, uptime (3 days), and any recent restarts
  • Resource Utilization: Queried Lightsail metrics API for CPU (0.65% average), memory (144MB/914MB), and disk usage (6.2GB/39GB)
  • Log Analysis: Examined daemon logs to identify the pattern of OAuth failures and session activity
  • Process List: Verified that jada-agent and subordinate processes (port_sheet_sync.py) were running or had been attempted

Technical Details: OAuth Token Lifecycle

Root Cause

Google OAuth 2.0 access tokens issued to service accounts or user credentials have an expiration window (typically 1 hour). The port_sheet_sync.py script stores a refresh token and uses it to periodically request new access tokens. The repeated HTTP 400 errors suggest one of these scenarios:

  • The refresh token was revoked (e.g., user changed password, revoked app permissions, or token was explicitly invalidated)
  • The stored token JSON file became corrupted or malformed
  • The Google Cloud project or OAuth application credentials were deleted or had their permissions modified

Where the Credentials Are Stored

The GA and Google Sheets credentials are maintained in a secrets directory referenced by environment variables read from repos.env. The structure follows this pattern:


/path/to/secrets/
  ├── dangerouscentaur_ga_token.json    (GA4 API refresh token)
  ├── google_sheets_token.json          (Sheets API refresh token)
  └── client_secrets.json               (OAuth app credentials)

File permissions on credential files are locked down to 0600 (user read/write only) to prevent unauthorized access.

Infrastructure: Agent Task Queue and Session Management

Session Limits and Task Lifecycle

The JADA agent operates under a strict session limit: 5 sessions per calendar day (UTC midnight rollover) with a maximum of 30 Claude API turns per session. On the day of diagnosis:

  • Session 1 (00:00 UTC): Hit max turns (30) → exit code 1
  • Session 2 (00:02 UTC): Completed successfully → processed e-signature and crew page generation blockers
  • Session 3 (00:05 UTC): Hit max turns (30) → exit code 1
  • Sessions 4–5: Available for future tasks

After session 3, the daemon polled the task queue and found no pending work, entering idle state (load average 0.00). This is expected behavior; the daemon wakes on schedule or task insertion.

Background Sync Architecture

The port_sheet_sync.py script runs as a scheduled background process (cron or systemd timer) independently from the main agent loop. It:

  • Reads booking/scheduling data from a local database or queue
  • Authenticates to Google Sheets API using the stored refresh token
  • Appends or updates rows in a designated spreadsheet
  • Logs success or failure to the daemon's log stream

The 30-minute interval means failures are immediately visible in logs but do not block other agent operations.

Key Decisions

Why Lightsail API Instead of Local Key Storage

Storing SSH private keys on developer machines introduces security debt:

  • Keys can be accidentally committed to version control
  • Malware on the developer machine can steal keys
  • Key rotation becomes manual and prone to inconsistency

Lightsail's temporary certificate API provides just-in-time access with automatic expiration, CloudTrail audit logging, and no persistent key distribution.

Separating Background Syncs from Agent Task Queue

The port_sheet_sync.py runs independently so that:

  • OAuth failures do not block high-priority agent tasks (e.g., e-signature processing)
  • Sync logic can be updated and redeployed without restarting the daemon
  • Resource contention is minimized (sync runs on its own schedule, not competing for turns)

However, this also means sync failures are "quiet"—they must be actively monitored rather than blocking upstream.

What's Next: Token Remediation

To resolve the port_sheet_sync.py OAuth failure:

  1. Re-authenticate: Run the Google OAuth flow script (e.g., auth_ga.py) with the account dangerouscentaur@gmail.com to refresh both the GA token and Sheets token
  2. Validate Credentials: Verify that the Google Cloud project still has active OAuth credentials and that the service account or user has the required scopes (Sheets API, Analytics API)