Diagnosing and Resolving OAuth Token Expiration in the JADA Agent Orchestrator
During a routine health check of the JADA agent daemon running on AWS Lightsail instance 34.239.233.28, we discovered a critical but isolated failure in the port_sheet_sync.py background task. This post covers the diagnostic approach, infrastructure interactions, and the remediation strategy for OAuth token lifecycle management in long-running orchestrator systems.
What Happened
The JADA agent daemon—which runs on a Lightsail instance and coordinates task distribution, session management, and background syncs—was healthy overall with 11 days of uptime and active task processing. However, the port_sheet_sync.py script, responsible for syncing booking and scheduling data to a Google Sheet, had been failing consistently every 30 minutes with:
[port-sheet] token error: HTTP Error 400: Bad Request
This indicated an expired or revoked Google OAuth token. While the daemon itself remained operational and continued processing agent tasks, the background sync critical path had degraded silently.
Diagnostic Approach
Remote Access via Lightsail API
The initial challenge was SSH access: the private key (jada-key) was not stored locally in ~/.ssh/. Rather than store sensitive keys in filesystem locations, we leveraged AWS Lightsail's native access mechanism:
- Called the Lightsail
GetInstanceAccessDetailsAPI endpoint to request temporary SSH credentials - Received a one-time certificate and temporary access window (typically valid for 1 hour)
- Used the temporary certificate paired with the instance's public key to establish an SSH session
This approach avoids persistent key management on developer machines while maintaining audit trails through AWS CloudTrail.
Service Health Inspection
Once connected, we collected the following diagnostics:
- Service Status: Checked
systemctl status jada-agent.serviceto confirm active/running state, uptime (3 days), and any recent restarts - Resource Utilization: Queried Lightsail metrics API for CPU (0.65% average), memory (144MB/914MB), and disk usage (6.2GB/39GB)
- Log Analysis: Examined daemon logs to identify the pattern of OAuth failures and session activity
- Process List: Verified that
jada-agentand subordinate processes (port_sheet_sync.py) were running or had been attempted
Technical Details: OAuth Token Lifecycle
Root Cause
Google OAuth 2.0 access tokens issued to service accounts or user credentials have an expiration window (typically 1 hour). The port_sheet_sync.py script stores a refresh token and uses it to periodically request new access tokens. The repeated HTTP 400 errors suggest one of these scenarios:
- The refresh token was revoked (e.g., user changed password, revoked app permissions, or token was explicitly invalidated)
- The stored token JSON file became corrupted or malformed
- The Google Cloud project or OAuth application credentials were deleted or had their permissions modified
Where the Credentials Are Stored
The GA and Google Sheets credentials are maintained in a secrets directory referenced by environment variables read from repos.env. The structure follows this pattern:
/path/to/secrets/
├── dangerouscentaur_ga_token.json (GA4 API refresh token)
├── google_sheets_token.json (Sheets API refresh token)
└── client_secrets.json (OAuth app credentials)
File permissions on credential files are locked down to 0600 (user read/write only) to prevent unauthorized access.
Infrastructure: Agent Task Queue and Session Management
Session Limits and Task Lifecycle
The JADA agent operates under a strict session limit: 5 sessions per calendar day (UTC midnight rollover) with a maximum of 30 Claude API turns per session. On the day of diagnosis:
- Session 1 (00:00 UTC): Hit max turns (30) → exit code 1
- Session 2 (00:02 UTC): Completed successfully → processed e-signature and crew page generation blockers
- Session 3 (00:05 UTC): Hit max turns (30) → exit code 1
- Sessions 4–5: Available for future tasks
After session 3, the daemon polled the task queue and found no pending work, entering idle state (load average 0.00). This is expected behavior; the daemon wakes on schedule or task insertion.
Background Sync Architecture
The port_sheet_sync.py script runs as a scheduled background process (cron or systemd timer) independently from the main agent loop. It:
- Reads booking/scheduling data from a local database or queue
- Authenticates to Google Sheets API using the stored refresh token
- Appends or updates rows in a designated spreadsheet
- Logs success or failure to the daemon's log stream
The 30-minute interval means failures are immediately visible in logs but do not block other agent operations.
Key Decisions
Why Lightsail API Instead of Local Key Storage
Storing SSH private keys on developer machines introduces security debt:
- Keys can be accidentally committed to version control
- Malware on the developer machine can steal keys
- Key rotation becomes manual and prone to inconsistency
Lightsail's temporary certificate API provides just-in-time access with automatic expiration, CloudTrail audit logging, and no persistent key distribution.
Separating Background Syncs from Agent Task Queue
The port_sheet_sync.py runs independently so that:
- OAuth failures do not block high-priority agent tasks (e.g., e-signature processing)
- Sync logic can be updated and redeployed without restarting the daemon
- Resource contention is minimized (sync runs on its own schedule, not competing for turns)
However, this also means sync failures are "quiet"—they must be actively monitored rather than blocking upstream.
What's Next: Token Remediation
To resolve the port_sheet_sync.py OAuth failure:
- Re-authenticate: Run the Google OAuth flow script (e.g.,
auth_ga.py) with the accountdangerouscentaur@gmail.comto refresh both the GA token and Sheets token - Validate Credentials: Verify that the Google Cloud project still has active OAuth credentials and that the service account or user has the required scopes (Sheets API, Analytics API)