Building a Comprehensive Infrastructure Snapshot: Lessons from Multi-Region AWS Disaster Recovery
When working with distributed systems spanning multiple AWS regions, S3 buckets, Lambda functions, and Google Apps Script projects, the risk of data loss or configuration drift is substantial. This post documents the technical approach taken to create a complete v1.0 snapshot of the JADA infrastructure—covering 45 S3 buckets, 21 Lambda functions, 66 CloudFront distributions, and four Google Apps Script projects across three production domains.
The Challenge: Distributed State Across Multiple Services
The JADA infrastructure consists of:
- Three production domains: queenofsandiego.com, sailjada.com, and salejada.com
- Cloud storage: 45 S3 buckets distributed across regions
- Content delivery: 66 CloudFront distributions with origin configurations
- Serverless compute: 21 Lambda functions with environment variables and deployment packages
- DNS management: 16 Route53 hosted zones
- Automation: 4 Google Apps Script projects (main JADA, Rady Shell replacement, Rady Shell legacy, EYD)
- Local source: Tools, booking automation, code generation scripts across multiple repositories
The core issue: no single AWS API call captures the entire infrastructure state. Configuration lives in different services, each requiring separate export logic. Recovery from a major incident would require stitching together data from a dozen sources—assuming each was backed up correctly.
Architecture: Parallel Multi-Agent Snapshot Strategy
Rather than sequentially exporting each service (which would take hours), we implemented a four-agent parallel approach:
Agent 1: S3 Bucket Synchronization
All 45 S3 buckets were synced locally using aws s3 sync with parallel operations enabled. Key considerations:
- Sync destination:
/snapshot/v1.0/s3-buckets/with subdirectories per bucket name - Command pattern:
aws s3 sync s3://bucket-name ./snapshot/v1.0/s3-buckets/bucket-name/ --parallel 10 - Progress tracking: Real-time monitoring showed 68MB downloaded across 30+ buckets with remaining queued
- Staging buckets: Identified dedicated staging buckets for QOS and sailjada, synced separately to preserve staging-specific content
- Size optimization: Prioritized buckets by modification date; older buckets queued for batch processing to avoid network saturation
Agent 2: Lambda Function Export
Lambda functions exported using AWS CLI with code packages and environment variable snapshots:
aws lambda list-functions --region us-west-2 --output json > /snapshot/v1.0/lambda/functions-list.json
# For each function:
aws lambda get-function --function-name FUNCTION_NAME --region REGION \
--query 'Code.Location' --output text | xargs -I {} curl -o FUNCTION_NAME.zip {}
aws lambda get-function-configuration --function-name FUNCTION_NAME \
--region REGION > /snapshot/v1.0/lambda/FUNCTION_NAME-config.json
Critical details captured per function:
- Deployment package (compiled code)
- Runtime version and handler configuration
- Environment variables (sanitized of secrets)
- IAM execution role ARN
- VPC configuration and security groups
- Memory allocation and timeout settings
- Layers and dependencies
Agent 3: AWS Infrastructure Configuration Export
CloudFront, Route53, DynamoDB, SES, API Gateway, and IAM configurations exported using describe-* commands:
# CloudFront distributions (all 66)
aws cloudfront list-distributions --output json > /snapshot/v1.0/aws-config/cloudfront-distributions.json
# Route53 hosted zones (16 zones)
aws route53 list-hosted-zones --output json > /snapshot/v1.0/aws-config/route53-zones.json
# For each hosted zone:
aws route53 list-resource-record-sets --hosted-zone-id ZONE_ID \
--output json > /snapshot/v1.0/aws-config/zone-ZONE_ID-records.json
# DynamoDB tables
aws dynamodb list-tables --output json > /snapshot/v1.0/aws-config/dynamodb-tables.json
# ACM certificates (tracking expiration dates)
aws acm list-certificates --output json > /snapshot/v1.0/aws-config/acm-certificates.json
Why separate from Lambda export: CloudFront and Route53 data is configuration-heavy but code-light, making it faster to export in parallel without competing for the same AWS API quota.
Agent 4: Local Source Code and Configuration
Google Apps Script projects and local tool repositories backed up using clasp and filesystem copy:
# Pull from Google Apps Script projects
cd /Users/cb/Documents/repos/sites/queenofsandiego.com/
clasp pull
# Copy entire project trees
cp -r /Users/cb/Documents/repos/sites/queenofsandiego.com \
/snapshot/v1.0/sites/queenofsandiego.com
cp -r /Users/cb/Documents/repos/tools \
/snapshot/v1.0/local-tools
Key GAS projects captured:
BookingAutomation.gs— primary booking system logicCode.gs— shared utilities and helpers- Rady Shell replacement and legacy GAS projects
- EYD automation scripts
Lightsail Instance Snapshot
A Lightsail instance snapshot jada-agent-v1.0-20260509 was initiated to capture the agent infrastructure itself. This provides an immutable image of the compute environment running the export operations, allowing rapid restoration if agent infrastructure were compromised.
Directory Structure for v1.0 Snapshot
/snapshot/v1.0/
├── s3-buckets/
│ ├── jada-assets/
│ ├── queenofsandiego-prod/
│ ├── queenofsandiego-staging/
│ ├── sailjada-prod/
│ ├── sailjada-staging/
│ └── ... (40 additional buckets)
├── lambda/
│ ├── functions-list.json
│ ├── FUNCTION_NAME.zip
│ ├── FUNCTION_NAME-config.json
│ └── ... (21 functions)
├── aws-config/
│ ├── cloudfront-distributions.json
│ ├── route53-zones.json
│ ├── zone-ZONE_ID-records.json
│ ├── dynamodb-tables.json
│ ├── acm-certificates.json
│ └── api-gateway-apis.json
├── sites/
│ ├── queenofsandiego.com/
│ ├── sailjada.com/
│ └── salejada