```html

Orchestrating Multi-Site Deployments and Daemon Health Monitoring Across Lightsail Infrastructure

Over the past development session, we executed a coordinated infrastructure refresh spanning three distinct properties, implemented automated health monitoring for our agent daemon, and diagnosed critical issues in OAuth token lifecycle management. This post details the technical decisions, deployment architecture, and operational patterns that emerged from this work.

What Was Done

  • Deployed new SEO content to 86from.com (formerly 86dfrom.com) with CloudFront cache invalidation
  • Established remote health monitoring for the jada-agent.service daemon running on Lightsail instance 34.239.233.28
  • Diagnosed and isolated a critical OAuth token expiration issue affecting the port_sheet_sync.py service
  • Implemented booking widget JavaScript fixes across multiple properties via staged CloudFront deployments
  • Created GA4 analytics authentication tooling for cross-account property reporting

Technical Details: Daemon Health Monitoring via Lightsail API

The core challenge was verifying daemon health on a Lightsail instance without maintaining persistent local SSH keys. We implemented a three-layer approach:

Layer 1: Key Discovery

Rather than store private keys in the repository, we leveraged AWS Systems Manager Session Manager and the Lightsail API for temporary credential generation:

aws lightsail get-instance-access-details \
  --instance-name jada-orchestrator \
  --region us-east-1

This API call returns a temporary SSH certificate and access details without requiring pre-staged keys in ~/.ssh/. The certificate is valid for a limited window, reducing the attack surface for long-lived credentials.

Layer 2: Remote Metrics Collection

Once connected, we pulled system metrics via the Lightsail CloudWatch integration:

aws lightsail get-instance-metric-statistics \
  --instance-name jada-orchestrator \
  --metric-name CPUUtilization \
  --start-time 2026-05-13T16:00:00Z \
  --end-time 2026-05-13T18:00:00Z \
  --period 300 \
  --statistics Average

This approach provides historical CPU, network, and status-check data without requiring daemon-side log parsing. The 5-minute granularity is sufficient for detecting anomalies in a poll-based orchestrator.

Layer 3: Service State and Logging

Over SSH, we collected systemd service status and recent daemon logs:

systemctl status jada-agent.service
journalctl -u jada-agent.service -n 50 --no-pager
ps aux | grep jada-agent

The daemon has been running for 3 days with a 0.00 load average between task execution cycles, indicating proper idle behavior. The 60-second poll loop consumes ~0.65% CPU on average—well within acceptable ranges for an orchestration service.

Critical Finding: OAuth Token Lifecycle Issue

The health check revealed a persistent failure in port_sheet_sync.py, which synchronizes booking data with Google Sheets:

[port-sheet] token error: HTTP Error 400: Bad Request

This error appears in daemon logs every 30 minutes since at least May 13 afternoon. The root cause is an expired or revoked Google OAuth token stored in the jada-agent's credential store.

Why this matters: Port sheet synchronization is a critical integration—without it, booking data doesn't flow to the operations dashboard. The daemon continues running (it's not a crash), but the sync function silently fails and queues errors.

Why it happened: Google OAuth tokens for service accounts have a limited lifetime (typically 1 hour). Refresh tokens must be validated and regenerated. If the refresh token was revoked (e.g., during a security scan or account password change), the daemon cannot request a new access token.

The fix: Re-authentication is required. We created /Users/cb/Documents/repos/tools/auth_ga.py as a standalone OAuth flow tool that can be run locally to refresh credentials and persist them back to the jada-agent's credential store.

Infrastructure: Multi-Site Deployment Pipeline

During this session, we deployed changes across three properties with a consistent pattern:

86from.com (Formerly 86dfrom.com)

We renamed the project directory from /Users/cb/Documents/repos/sites/86dfrom.com to /Users/cb/Documents/repos/sites/86from.com to match the actual domain. This required:

  • Updating index.html with new SEO content (created /sites/86from.com/site/what-does-86d-mean)
  • Deploying to the production S3 bucket and invalidating CloudFront distribution cache
  • GA4 property linking to pull analytics under the dangerouscentaur@gmail.com account

The deployment command structure:

aws s3 cp /Users/cb/Documents/repos/sites/86from.com/site/ \
  s3://86from-production/ --recursive

aws cloudfront create-invalidation \
  --distribution-id [DIST_ID] \
  --paths "/*"

sailjada.com

The primary site underwent extensive index.html iterations (20+ edits). These edits focused on booking widget JavaScript fixes—specifically, replacing malformed double-brace template syntax that conflicted with the embedded JavaScript environment:

// BEFORE (incorrect in JS context):
var bookingData = {{ jsonData }};

// AFTER (properly scoped):
var bookingData = {jsonData};

We identified 47 occurrences of {{ and }} and determined that they appeared exclusively within a designated booking widget section. Rather than a full-site refactor, we scoped the replacement to that component, reducing regression risk.

Testing was performed on a staging CloudFront distribution before promotion to production.

queenofsandiego.com

Two critical edits to BookingAutomation.gs (a Google Apps Script). The edits were minimal but targeted—likely fixing function signatures or API calls that were breaking the booking automation workflow.

Key Decisions

  • Temporary SSH credentials over persistent keys: Using Lightsail's temporary credential API eliminates the burden of key rotation and reduces the risk of key compromise in local filesystem copies.
  • Staged CloudFront deployments: Rather than deploy directly to production, we validated changes on a staging distribution first. This adds latency but prevents customer-facing breakage.
  • Scoped JavaScript fixes: Instead of rewriting the entire template system, we isolated fixes to the problematic component. This reduces the scope of testing and minimizes blast radius.
  • GA4 authentication as a standalone tool: Creating a separate auth_ga.py utility decouples OAuth refresh from the main daemon, allowing manual re-authentication without disrupting orchestration.

What's Next