```html

Diagnosing and Remediating OAuth Token Expiration in Production Daemon Sync Tasks

During a routine health check of the jada-agent orchestrator daemon running on AWS Lightsail (instance 34.239.233.28), we discovered a systemic OAuth authentication failure in the port sheet synchronization subsystem. This post details the diagnosis methodology, root cause analysis, and remediation strategy for production daemon token lifecycle management.

What Was Done

We performed a comprehensive health audit of the jada-agent daemon to verify service availability, task processing capacity, and error patterns. The audit revealed:

  • Service is stable with 11 days uptime and healthy resource utilization (0.65% CPU, 144MB/914MB memory)
  • Task processing is nominal (3 sessions used of 5 available daily, load average near zero between tasks)
  • Critical failure: port_sheet_sync.py OAuth token has been failing for 24+ hours with consistent HTTP Error 400: Bad Request responses
  • Secondary pattern: Two of three agent runs today hit the 30-turn Claude API limit, causing graceful exit with code 1

Technical Details: OAuth Token Failure Diagnosis

The port_sheet_sync.py script, responsible for synchronizing crew scheduling data to Google Sheets every 30 minutes, has been unable to authenticate since at least May 13 afternoon UTC. Each sync cycle logs:

[port-sheet] token error: HTTP Error 400: Bad Request

This error pattern indicates one of three scenarios:

  • Token expiration: OAuth 2.0 refresh token has exceeded its lifetime (typically 6 months for Google)
  • Token revocation: User revoked access via Google Account Security settings
  • Credential scope mismatch: Token was originally granted with insufficient scopes for current API operations

Given the consistent 400 response across all 30-minute intervals and the absence of any manual security changes noted in recent session logs, token expiration is the most probable root cause. Unlike 401 (Unauthorized) or 403 (Forbidden) responses, the 400 status indicates the OAuth server rejected the request structure itself—characteristic of expired or invalidated refresh tokens.

Infrastructure and Service Architecture

The jada-agent daemon operates as a systemd service on AWS Lightsail with the following characteristics:

  • Service file: jada-agent.service (systemd unit, active/running since May 10)
  • Instance specification: 1GB RAM, 2 vCPU, 40GB SSD storage (17% utilized)
  • Task orchestration: Session-based queue with 5 sessions/day limit, 30-turn limit per session
  • Auxiliary scripts: Periodic sync tasks including port_sheet_sync.py on 30-minute intervals
  • Secrets management: OAuth credentials stored in local environment configuration (path redacted for security)
  • Monitoring: CloudWatch metrics available via Lightsail API (CPU, network, status checks)

The daemon architecture follows a task-pull model: the orchestrator checks the progress dashboard for pending work, claims sessions, executes Claude agent logic, and logs results. Auxiliary sync tasks run independently on cron-like schedules, using stored OAuth tokens for Google Sheets API authentication.

OAuth Token Lifecycle Management Decision

Google OAuth 2.0 tokens issued to service accounts or installed applications follow this lifecycle:

  • Initial grant: User authenticates via oauth2.0_installed_app_flow or equivalent, receives refresh token
  • Refresh validity: Refresh tokens typically remain valid for 6 months of inactivity; inactivity resets the counter
  • Access token expiration: Short-lived access tokens (1 hour) are regenerated from refresh tokens on demand
  • Failure modes: If refresh token is revoked or expired, all subsequent API calls fail with 400 Bad Request

The current implementation stores credentials in a static configuration file without automated token refresh logic. This creates a brittle dependency: any token expiration requires manual re-authentication and credential update.

Remediation Strategy

To resolve the immediate failure and prevent recurrence:

  • Immediate: Re-authenticate the Google account credential for port_sheet_sync.py using the oauth2.0 consent flow. This regenerates both access and refresh tokens with fresh expiration windows.
  • Short-term: Update credential storage to include token refresh timestamp metadata, enabling alerting when tokens approach expiration (e.g., 30 days before 6-month boundary).
  • Medium-term: Implement automatic refresh token rotation in the port_sheet_sync.py` execution path using google-auth-oauthlib library's built-in refresh mechanisms.
  • Long-term: Migrate to service account authentication with JSON keyfile (if applicable to use case), which eliminates user-grant token expiration and enables indefinite API access.

Secondary Pattern: 30-Turn Claude Limit Exits

Two of today's three agent sessions exited with code 1 after reaching the 30-turn conversation limit. This is not a failure condition—it's a graceful constraint enforcement. Sessions that complete within the 30-turn budget (like session 2, which successfully created a needs-you task) exit with code 0. Sessions that exhaust the budget exit with code 1 but don't lose work.

If task complexity is consistently exceeding the 30-turn budget, options include:

  • Increasing the per-session turn limit (requires configuration change in daemon scheduler)
  • Restructuring complex tasks into smaller sub-tasks with independent session lifespans
  • Implementing hierarchical task decomposition in the prompt engineering layer

What's Next

Immediate action items:

  • Re-run OAuth consent flow for dangerouscentaur@gmail.com account in Google Sheets API context
  • Update credential file at (path redacted) with refreshed token
  • Verify next 30-minute sync cycle completes successfully
  • Review port_sheet_sync.py error handling to emit alerts on repeated 400 responses

The daemon remains healthy otherwise—this is a credential lifecycle issue, not a systemic infrastructure problem. Once tokens are refreshed, all pending crew scheduling syncs will resume normally.

```