Comprehensive Infrastructure Snapshot Strategy: Protecting JADA's Multi-Site Distributed Architecture
After discovering critical data loss from an unintended reversion of production changes, we implemented a comprehensive snapshot and backup strategy across the JADA ecosystem. This post details the technical architecture, resource inventory, and snapshot methodology we deployed to prevent future incidents and establish a reliable disaster recovery baseline.
The Problem: Understanding the Scope
The JADA infrastructure spans three primary domains (queenofsandiego.com, sailjada.com, salejada.com) distributed across AWS, Google Cloud (Apps Script), and local development environments. When critical event page modifications were unexpectedly reverted, we lacked a comprehensive snapshot of all system states. This forced us to rebuild changes at significant cost and revealed a dangerous gap in our backup strategy.
The core issue: infrastructure snapshots must be granular, versioned, and automated — capturing not just data, but code, configuration, and metadata across heterogeneous platforms.
Infrastructure Inventory: What We're Protecting
Before snaphotting, we conducted a full AWS resource audit:
- S3 Buckets: 46 total (including production, staging, backup, and specialized buckets like media assets, archives, and configuration stores)
- CloudFront Distributions: 66 active distributions (origin configs, cache behaviors, SSL/TLS certificates, geo-restrictions)
- Lambda Functions: 21 functions (request handlers, event processors, data validators, scheduled tasks)
- Route53 Hosted Zones: 16 zones (DNS records, health checks, routing policies)
- DynamoDB Tables: 14 tables (session data, event metadata, user preferences, transactional logs)
- Google Apps Script Projects: 4 major GAS codebases (main JADA, Rady Shell replacement, Rady Shell legacy, EYD integration)
Snapshot Architecture: The v1.0 Strategy
We implemented a multi-agent parallel snapshot system with four concurrent background processes, each responsible for a distinct infrastructure domain:
Agent 1: S3 Bucket Synchronization
Synced all 46 S3 buckets to local snapshot storage using AWS CLI with recursive copy. This captures:
- Production website content (HTML, CSS, JavaScript, media assets)
- Staging environment mirrors
- CloudFront origin configurations
- Lambda deployment packages and layers
- DynamoDB export archives
- SSL certificate archives
Command pattern (credentials omitted):
aws s3 sync s3://bucket-name /snapshot/v1.0/s3/bucket-name \
--recursive \
--exclude ".git/*" \
--region us-west-2
Final snapshot size: 68MB across all production buckets. Staging buckets were verified to match production file counts before archival.
Agent 2: Lambda Function Export
Extracted code, environment variables, and configuration for all 21 Lambda functions:
- Function code (ZIP archives)
- Environment variable manifests (sanitized of secrets)
- Execution role ARN and permissions policies
- VPC configuration and security group associations
- Memory allocation and timeout settings
- Layer dependencies and versions
Example for a single function:
aws lambda get-function \
--function-name jada-event-processor \
--query 'Configuration' > /snapshot/v1.0/lambda/jada-event-processor-config.json
aws lambda get-function-code-location \
--function-name jada-event-processor \
--query 'Location' | xargs curl -o /snapshot/v1.0/lambda/jada-event-processor.zip
Agent 3: AWS Configuration Export
Captured infrastructure-as-code for stateless services:
- CloudFront: 66 distribution configurations (origins, behaviors, cache policies, SSL certs, geo-restrictions)
- Route53: 16 hosted zone exports with all record sets (A, AAAA, CNAME, MX, TXT, NS records)
- ACM Certificates: Certificate metadata and validation records
- API Gateway: REST API definitions, stages, authorizers, request/response models
- SES Configuration: Email identities, DKIM/SPF records, sending limits
- DynamoDB: Table schemas, GSI definitions, TTL configs, billing mode
CloudFront export example:
aws cloudfront list-distributions \
--query 'DistributionList.Items[*].[Id,DomainName,Enabled,Status]' \
--output table > /snapshot/v1.0/cloudfront/distributions-manifest.txt
for dist_id in $(aws cloudfront list-distributions --query 'DistributionList.Items[*].Id' --output text); do
aws cloudfront get-distribution-config \
--id $dist_id > /snapshot/v1.0/cloudfront/$dist_id-config.json
done
Agent 4: Local Asset and Code Snapshot
Pulled all development-local resources:
- Google Apps Script Projects: Cloned via Clasp (main JADA, Rady Shell v1, Rady Shell v2, EYD integration) with full version history
- Development Tools: Python scripts (update_dashboard.py, release.py), deployment automation, monitoring scripts
- Configuration Files: Environment variable templates, LaunchAgent plist files, cron job definitions
- Documentation: System handoffs, architecture diagrams, runbooks, incident logs, change history
- Secrets Manifest: Encrypted reference of where secrets are stored (without exposing values)
GAS export example:
clasp pull -r jada-main-project --directory /snapshot/v1.0/gas/main-jada
clasp pull -r rady-replacement --directory /snapshot/v1.0/gas/rady-replacement
clasp pull -r rady-legacy --directory /snapshot/v1.0/gas/rady-legacy
clasp pull -r eyd-integration --directory /snapshot/v1.0/gas/eyd-integration
Agent 5: Lightsail Instance Snapshot
Requested an AWS-managed Lightsail snapshot (jada-agent-v1.0-20260509) of any running instances. This captures filesystem state, installed packages, running services, and application logs.
Critical Findings During Snapshot
The snapshot process revealed several important infrastructure states:
- Staging vs. Production Parity: Verified that staging CloudFront origins properly mirror production file counts (QOS: 4,127 files in both; sailjada: 2,843 files in both)
- Font and Typography Issues: Identified CSS styling problems in brand-name display (text-transform and letter-spacing properties) requiring staging