Comprehensive Infrastructure Snapshot Strategy: Protecting Multi-Site JADA Deployment at Scale
After a critical incident requiring full infrastructure recovery, we implemented a comprehensive snapshot strategy for the JADA ecosystem spanning three production sites (queenofsandiego.com, sailjada.com, salejada.com). This article details the technical approach, command patterns, and architectural decisions behind creating the v1.0 snapshot.
The Problem: Distributed State Across Multiple Services
The JADA infrastructure is sprawled across AWS services, Google Apps Script (GAS) projects, local application code, configuration files, and infrastructure-as-code. A traditional single-point-in-time snapshot wasn't sufficient. We needed atomic captures of:
- 46 S3 buckets (application assets, backups, staging data)
- 66 CloudFront distributions (CDN edge caching)
- 21 Lambda functions (serverless business logic)
- 16 Route53 hosted zones (DNS configuration)
- 4 Google Apps Script projects (spreadsheet automation)
- 14 DynamoDB tables (NoSQL state)
- Local codebase, documentation, and deployment tooling
The challenge: these systems don't snapshot atomically. Changes in one service during capture could invalidate others. We needed a coordinated approach.
Technical Architecture: Parallel Capture with Ordered Dependencies
Phase 1: AWS Infrastructure Enumeration
We started by cataloging all resources to understand scope:
# List all S3 buckets with JADA-related naming patterns
aws s3 ls | grep -E "(jada|qos|sailjada|salejada)"
# Export CloudFront distribution IDs and origins
aws cloudfront list-distributions --query 'DistributionList.Items[].{Id:Id,DomainName:DomainName,Origins:Origins[].DomainName}' --output json
# Enumerate Lambda functions and their environment variables
aws lambda list-functions --query 'Functions[].{Name:FunctionName,Runtime:Runtime,Handler:Handler}' --output table
# Export Route53 hosted zones and record sets
aws route53 list-hosted-zones-by-name --query 'HostedZones[].{Name:Name,Id:Id}' --output json
Why this approach: Before capturing, we needed definitive inventory. AWS APIs provide structured output that becomes the source-of-truth for what needs backing up. This prevents incomplete snapshots from missing obscure resources.
Phase 2: Parallel Multi-Agent Background Capture
We launched four independent agents in parallel, each handling a capture domain without blocking others:
- Agent 1 - S3 Sync: Recursively download all 46 buckets using
aws s3 syncwith bandwidth limiting - Agent 2 - Lambda Export: For each function, pull source code via
aws lambda get-function, capture environment variables, and extract IAM role permissions - Agent 3 - AWS Configuration: Export CloudFront configurations, Route53 zone files, DynamoDB schemas, and ACM certificate metadata
- Agent 4 - Local Assets: Clone GAS projects via
clasp pull, copy application source code, configuration files, and documentation
This parallel strategy reduced total capture time from ~4 hours (sequential) to ~45 minutes. Each agent had failure retry logic; if an S3 bucket sync failed, it would retry with exponential backoff.
Phase 3: Google Apps Script Project Capture
GAS projects require special handling because they don't exist in version control by default. We used the Clasp CLI:
# For each GAS project ID, pull source
clasp pull --project-id "main-jada-gas-project-id"
clasp pull --project-id "rady-shell-replacement-gas-id"
clasp pull --project-id "rady-shell-old-gas-id"
clasp pull --project-id "eyd-gas-project-id"
# Store each in versioned subdirectory
/snapshot/v1.0/gas/main-jada/
/snapshot/v1.0/gas/rady-replacement/
/snapshot/v1.0/gas/rady-old/
/snapshot/v1.0/gas/eyd/
Why Clasp over manual export: Clasp preserves the full development environment including Google Apps Script manifest files, library dependencies, and exact deployment state. Manual export would lose metadata.
Phase 4: Lightsail Instance Snapshot
For the primary application server, we created a Lightsail instance snapshot named jada-agent-v1.0-20260509. This captures:
- Full EBS volume state (filesystem, installed packages, configurations)
- Running service state at snapshot moment
- System logs in
/var/log/ - All installed SSL certificates and permissions
Lightsail snapshots are region-specific and can be exported to AMI format for portability. This was essential for disaster recovery—we can boot from this snapshot in seconds if the primary instance fails.
Snapshot Directory Structure and Manifest
We organized everything under a versioned root with clear separation of concerns:
/snapshot/v1.0/
├── MANIFEST.md # Complete inventory with checksums
├── s3-buckets/
│ ├── qos-prod/ # Complete download of queenofsandiego.com assets
│ ├── sailjada-prod/ # sailjada.com production bucket
│ ├── salejada-prod/ # salejada.com production bucket
│ ├── qos-staging/
│ ├── sailjada-staging/
│ └── [43 other buckets]
├── lambda-functions/
│ ├── function-name/
│ │ ├── code.zip # Pulled via get-function
│ │ ├── config.json # Runtime, memory, timeout, VPC settings
│ │ └── environment.json # All env vars (scrubbed of secrets)
│ └── [20 other functions]
├── cloudfront/
│ ├── distributions.json # All 66 distribution configs
│ ├── origin-configs/ # S3 origin, Lambda origin, custom domain configs
│ └── cache-behaviors.json # TTL, compression, headers per distribution
├── route53/
│ ├── hosted-zones.json # All 16 zones with IDs
│ └── record-sets/ # A, AAAA, CNAME, MX records per zone
├── dynamodb/
│ ├── tables-schema.json # Table definitions, indexes, TTL settings
│ ├── sample-items/ # 10-item sample from each table (anonymized)
│ └── billing-mode.json # On-demand vs provisioned capacity
├── gas-projects/
│ ├── main-jada/ # Full source tree
│ ├── rady-replacement/
│ ├── rady-old/
│ └── eyd/
├── local-code/
│ ├── sites/
│ │ ├── queenofsandiego.com/
│ │ ├── sailjada.com/
│ │ └── salejada.com/
│ ├── tools/ # Deployment scripts, utilities
│ ├── docs/ #