Fixing Permanent OAuth Token Expiration: Moving from Google Cloud Testing Mode to Production

The Problem: 7-Day Refresh Token Expiration

We've been caught in a recurring OAuth headache for months. Every 7 days, the Google Calendar and Gmail refresh tokens expire, breaking calendar sync automation and requiring manual re-authentication. The root cause wasn't a code bug—it was an infrastructure configuration we overlooked: our Google Cloud project was still in Testing mode, which hard-caps refresh token lifetime at 7 days by design.

Previous "fixes" (re-running auth scripts, pushing new tokens to Lambda) were band-aids. The permanent solution requires moving the OAuth consent screen to Production mode, which removes the 7-day expiration and allows tokens to persist indefinitely until explicitly revoked.

Why This Matters for Our Architecture

Our calendar sync system is entirely serverless and event-driven:

  • Apps Script (CalendarSync.gs, BookingAutomation.gs) runs scheduled triggers to sync Boatsetter iCal → JADA Google Calendar
  • Lambda functions handle REST API calls to fetch calendar data and dispatch crew assignments
  • DynamoDB stores crew assignments, captain reports, and event metadata
  • Lightsail server holds refresh tokens in /etc/jada/tokens/ that are synced to Lambda environment variables

If the refresh token expires every 7 days, every integration point breaks simultaneously. We were forcing re-auth via reauth_jada_all.py weekly instead of solving the underlying config issue.

Technical Breakdown: Moving to Production

Step 1: Audit Current OAuth Configuration

First, verify the project's current consent screen mode:

gcloud config set project [PROJECT_ID]
gcloud alpha iap oauth-brands list
gcloud services list --enabled | grep -E "(calendar|gmail)"

This tells us:

  • Whether the brand is "internal" (Testing) or "external" (Production)
  • Whether Gmail, Calendar, and necessary scopes are enabled
  • Current API quota configuration

In our case, all scopes were enabled (calendar.full, calendar.readonly, gmail.readonly, gmail.modify), but the consent screen was stuck in Testing mode because we never published it.

Step 2: Prepare the OAuth Consent Screen for Production

Unlike most GCP configurations, the consent screen cannot be modified via gcloud CLI—only via the Cloud Console UI. Navigate to:

console.cloud.google.com → APIs & Services → OAuth consent screen

The configuration requires three mandatory sections:

  • App Information: Name, user support email, logo (if desired)
  • Scopes: Declare which Google APIs your app needs (calendar.full, gmail.readonly, etc.)
  • Test Users: For now, these can remain empty after publishing; we'll revoke Testing mode

In our case, the app info was already filled (JADA Calendar Automation), and scopes were declared. The blocking step was clicking the "Publish App" button, which changes the status from "Testing" to "In Production."

Step 3: Understand the Token Lifecycle Impact

Once in Production mode:

  • Refresh tokens no longer expire after 7 days—they persist until explicitly revoked by the user or via the Google Admin console
  • Access tokens still expire after 1 hour (standard OAuth behavior); our code already handles refresh automatically
  • User consent is permanent unless the user manually revokes it in their Google Account settings

This aligns with our architecture: we fetch the refresh token once during initial auth, store it in /etc/jada/tokens/refresh_token.json on Lightsail, and inject it into Lambda environment variables via reauth_jada_all.py (which will then become obsolete).

Step 4: Verify Token Handling in Deployed Code

Our Apps Script and Lambda function handle token refresh correctly via the Google API client libraries, which automatically use the refresh_token to obtain a new access_token when the current one expires. For example, in CalendarSync.gs:

function calendarSyncSetup() {
  const calendarService = getCalendarService_();
  // Service automatically refreshes tokens if expired
  const calendar = calendarService.buildCalendarAPI_();
  // ...
}

And in our Lambda handler (Python), the google-auth library handles refresh transparently when calling Google APIs.

Infrastructure Changes

No Infrastructure Changes Required

Importantly, this fix requires zero changes to our deployed infrastructure:

  • Lambda function code: no changes
  • Apps Script deployments: no changes (v98 is already live)
  • DynamoDB tables: no changes
  • API Gateway routes: no changes
  • Lightsail server configuration: no changes

The token sync process via reauth_jada_all.py becomes unnecessary once tokens stop expiring, but we'll leave it in place as a safety valve for future auth rotations.

Key Decisions

Why not just keep re-authing weekly? Because it's error-prone, requires manual intervention, and creates operational toil. Every 7 days, we'd have to SSH into Lightsail, run the Python script, and verify the sync across three platforms (Boatsetter, Viator, Sailo). Production mode eliminates this entirely.

Why Google's Testing mode exists: Google intentionally caps refresh tokens at 7 days in Testing mode to encourage developers to implement proper token refresh logic and to catch bugs where tokens are accidentally exposed. Once you're confident in your auth handling (which we are), Publishing to Production is the correct move.

Why we can do this now: Our app doesn't need user-facing consent flows—it's internal to JADA operations. We're the only user of this OAuth app, so Publishing to Production has no external user impact.

What's Next

After publishing the consent screen to Production:

  1. Trigger a fresh auth cycle to obtain a new refresh token with indefinite lifetime
  2. Verify that token works across all integrations (Apps Script, Lambda, API Gateway)
  3. Store the token in Lightsail and sync to Lambda as before
  4. Monitor token expiration via CloudWatch logs—after 30+ days with no re-auth needed, we'll know it stuck
  5. Remove the weekly re-auth task from operational runbooks

Once confirmed stable, document the permanent token location and lifecycle in our architecture runbook so future engineers understand why we don't need to re-auth anymore.