```html

Diagnosing and Resolving a Broken Google OAuth Token in Port Sheet Sync

During a routine health check of the jada-agent orchestrator daemon running on our Lightsail instance (34.239.233.28), we discovered that the port_sheet_sync.py script has been failing silently for the past several hours due to an expired or revoked Google OAuth token. This post details the diagnostic process, root cause analysis, and the approach we're taking to resolve the authentication breakdown.

What Happened

The jada-agent daemon, which orchestrates background tasks and agent runs on our infrastructure, was performing nominally from a service health perspective: active, running for 11 days, minimal CPU and memory footprint, zero status check failures. However, logs revealed a persistent pattern of failures in the port sheet sync routine:

[port-sheet] token error: HTTP Error 400: Bad Request

This error was occurring every 30 minutes across all sync attempts since at least mid-afternoon UTC on 2026-05-13. The Google OAuth token used by port_sheet_sync.py to authenticate with Google Sheets API had become invalid—either expired, revoked, or corrupted in transit.

Diagnostic Approach

We took a multi-layered diagnostic approach:

  • Service-level inspection: Connected via AWS Lightsail API temporary SSH credentials (rather than maintaining persistent key files locally) and pulled daemon status via systemctl status jada-agent.service.
  • Log aggregation: Extracted daemon logs and session transcript logs to identify failure patterns and timestamps. This revealed that port sheet syncs had been consistently failing for hours while other agent sessions were completing successfully.
  • Metrics collection: Pulled CPU, memory, network, and status check metrics from the Lightsail API directly (rather than relying on agent-reported metrics) to rule out resource contention or instance-level issues.
  • Credential audit: Checked the stored Google OAuth token structure in the jada-agent secrets directory to confirm the token was present but potentially stale or malformed.

The separation between successful agent runs (sessions 1 and 3 completed tasks or hit Claude turn limits, expected behavior) and consistently failing port sheet syncs made the scope of the problem clear: the issue was specific to the Google OAuth credential, not a systemic daemon failure.

Root Cause: Expired OAuth Token

Google OAuth2 tokens have a limited lifetime (typically one hour for access tokens). Refresh tokens, which have a much longer lifespan, are used to obtain new access tokens when the original expires. There are several ways a token can become invalid:

  • Refresh token revocation: The user manually revoked the application's access via Google Account settings.
  • Refresh token expiration: The refresh token itself expires if unused for more than 6 months (Google's policy for OAuth tokens obtained via OAuth 2.0 authorization flow with offline access).
  • Credential rotation: The client ID or client secret used to obtain the token was rotated or regenerated.
  • Token corruption: The stored token file was partially overwritten or corrupted.

In this case, the HTTP 400 error is characteristic of an invalid refresh token being sent to Google's token endpoint. Since the token has been stored and working until recently, the most likely cause is revocation or expiration due to inactivity.

Technical Details: The OAuth Flow

The port_sheet_sync.py script uses the google-auth-oauthlib library (verified installed via pip list) and stores credentials in a JSON file. The typical flow is:

  1. Script loads the stored refresh token from the credentials file.
  2. Script calls Google's token endpoint to exchange the refresh token for a fresh access token.
  3. Script uses the access token to authenticate requests to Google Sheets API.
  4. If the refresh token is invalid, the token endpoint returns 400 Bad Request with an error code like invalid_grant.

The script correctly catches this error and logs it, but it does not gracefully fall back to re-authentication. The daemon continues running the sync on a 30-minute schedule, but each attempt fails with the same error.

Infrastructure Context

The jada-agent daemon runs as a systemd service on a Lightsail instance named jada-key (referenced via its public IP 34.239.233.28). The daemon:

  • Runs a 60-second polling loop that checks a progress dashboard for pending tasks.
  • Orchestrates Claude agent runs via local Python scripts.
  • Executes background maintenance scripts like port_sheet_sync.py on scheduled intervals.
  • Manages a 5-session hard limit per UTC day with a turn limit of 30 per session to stay within cost and rate limit bounds.

Secrets (including the Google OAuth credentials) are stored in a secrets directory on the instance, managed via environment variables loaded at daemon startup. Access to the instance is controlled via AWS Lightsail's temporary SSH key API rather than long-lived SSH key pairs stored in version control.

Remediation Plan

To resolve this issue, we need to re-authenticate the Google OAuth token for the dangerouscentaur@gmail.com account that owns the port sheet. The steps are:

  • Re-run the auth script: Execute auth_ga.py (or an equivalent Google Sheets auth script if one exists) with the --account dangerouscentaur@gmail.com flag to trigger a fresh OAuth 2.0 authorization flow.
  • Browser-based approval: This will open a browser to Google's OAuth consent screen where the account owner approves access for the jada-agent application.
  • Store new credentials: The script will store the new access and refresh tokens in the secrets directory.
  • Restart the daemon: Reload the jada-agent.service to pick up the new credentials and resume port sheet syncs.
  • Verify sync resumption: Monitor logs for the next 30-minute sync cycle to confirm no more 400 errors.

Key Decisions

Why use temporary SSH keys via the Lightsail API? Long-lived SSH key files checked into version control or stored in shared team directories introduce credential leakage risk. The Lightsail API's temporary SSH key feature (valid for 60 seconds) reduces the window of exposure and keeps private keys out of git history and local filesystems.

Why separate the auth flow from the sync script? The port_sheet_sync.py` script focuses narrowly on the sync logic itself. OAuth re-authentication is a separate concern handled by auth_ga.py (which supports multiple Google accounts). This separation allows us to re-auth without modifying the sync script and makes it clear when manual intervention is needed.

Why does the daemon use a 5-session daily limit? Claude API pricing scales with token usage. A hard session limit and turn limit per session prevent runaway costs if tasks are misconfigured or if the agent enters a loop. The system logs when limits are hit (exit code 1) so we can review if the limits need adjustment for specific workloads.

What's Next

Once credentials are refreshed, we