Managing Multi-Site Deployments and Daemon Health: Building Resilience into the Jada Orchestrator
Over the past development session, we executed a comprehensive infrastructure health check and multi-site deployment pipeline across three distinct properties while debugging a critical OAuth token failure in our background sync daemon. This post details the technical decisions, architecture patterns, and operational insights from this work.
What Was Done
- Diagnosed and verified health status of the
jada-agent.serviceorchestrator daemon running on AWS Lightsail instance34.239.233.28 - Implemented cross-site content management for three domains:
86from.com,queenofsandiego.com, andsailjada.com - Created SEO landing page for
86from.comwith integrated booking widget - Identified and documented persistent OAuth token failure in
port_sheet_sync.pybackground service - Deployed fixes to staging and production CloudFront distributions with cache invalidation
Technical Details: Daemon Health Assessment
The jada-agent.service runs as a systemd service on the Lightsail instance with a 60-second polling interval for task discovery. Using AWS Systems Manager Session Manager (when SSH key rotation wasn't available locally) combined with the Lightsail API's temporary credential generation, we collected comprehensive metrics without requiring stored private keys in the local development environment.
Service Health Indicators:
jada-agent.service: Active and running for 3+ days continuously- CPU utilization: 0.65% average with no spike patterns detected
- Memory footprint: 144MB of 914MB available (15.7% utilization)
- Disk usage: 6.2GB of 39GB (17% used) — adequate headroom for log rotation
- System load average: 0.00 — indicates idle periods between task batches
- Status checks: 0 failures in preceding 2-hour window
Session activity logs showed 3 of 5 daily sessions consumed, with one session completing successfully and two hitting the 30-turn Claude API limit. This is not a service failure but rather expected behavior when agent tasks exceed the turn budget. Session 2 successfully completed meaningful work—processing e-signature page blockers and creating a needs-you task for manual follow-up.
Critical Infrastructure Issue: OAuth Token Expiration
The most significant finding was a broken Google OAuth token in the port_sheet_sync.py script. This background service, which syncs data to Google Sheets every 30 minutes, has been failing with:
[port-sheet] token error: HTTP Error 400: Bad Request
The issue stems from token expiration or revocation. Unlike the auth_ga.py implementation we developed for Google Analytics API access (which uses OAuth 2.0 with refresh token rotation), the port sheet sync was using a stale credential. The decision to implement separate authentication contexts per service (rather than a centralized token store) prevents cascading failures but requires individual re-authentication when tokens expire.
Why This Pattern: Isolating authentication per service reduces blast radius—a compromised or expired token in one service doesn't invalidate credentials across all integrations. However, it increases operational overhead. Future work should implement a credential lifecycle manager that monitors token expiration and alerts before failures occur.
Multi-Site Deployment Pipeline
Content management across three properties required careful directory structure and deployment sequencing:
Directory Structure:
/Users/cb/Documents/repos/sites/
├── 86from.com/
│ ├── site/
│ │ ├── index.html
│ │ └── what-does-86d-mean/
│ └── [S3 sync target]
├── sailjada.com/
│ ├── index.html
│ └── [CloudFront distribution]
└── queenofsandiego.com/
├── BookingAutomation.gs
└── [Google Apps Script backend]
The 86from.com property required particular attention. Initial directory naming was 86dfrom, which we normalized to 86from (removing the duplicate 'd') to match DNS records and SEO intent. This required:
- Directory rename:
86dfrom.com→86from.com - Content migration of static assets and HTML files
- Google Analytics property reconfiguration to track correct domain
- CloudFront invalidation to purge cached content under old directory name
JavaScript Template Syntax Issue in Booking Widget
During deployment to 86from.com, we discovered a critical issue in the booking widget embedded within index.html. The widget uses double-brace syntax ({{ variable }}) for template interpolation, which conflicted with template engines expecting the same syntax.
The Problem: Double-brace template syntax appears in two contexts:
- Within the booking widget JavaScript block (intentional, for client-side rendering)
- Potentially in surrounding HTML (conflict risk with server-side templating)
The Solution: We extracted and validated the booking widget JavaScript block separately, confirming double-braces only appear within the widget's scope. This allowed safe deployment without ambiguity. The fix involved:
- Identifying the exact line numbers of the
<script>tags containing the widget - Syntax-checking the extracted JavaScript block with a parser
- Confirming no template conflicts outside the widget section
- Adding a version tag with model ID in the widget's HTML comment for tracking
Deployment Strategy: Staging Before Production
Rather than deploying directly to production CloudFront distributions, we used a staging bucket pattern:
- Deploy updated HTML to staging S3 bucket
- Invalidate staging CloudFront cache with
/index.htmland/*patterns - Verify changes in staging environment
- Promote to production CloudFront distribution (stored in separate distribution ID)
This approach prevents cache-poisoning issues where stale content serves to users before TTL expiration. CloudFront invalidation requests cost $0.005 per path, so we batch multiple file changes into single invalidation jobs rather than invalidating per-file.
Infrastructure: AWS Lightsail, S3, and CloudFront
The daemon runs on AWS Lightsail (not EC2) due to lower management overhead for a dedicated, single-purpose service. Lightsail provides:
- Pre-configured networking and firewall rules
- Built-in metrics via CloudWatch (CPU, network, status checks)
- Temporary SSH credential generation via API (no key rotation needed in local dev)
- Fixed IP address (34.239.233.28) registered in Route53 for reliability
Static sites deploy to S3 with CloudFront distributions fronting them. This separation ensures: