Building a Production Snapshot Infrastructure: Comprehensive AWS State Capture for Three E-Commerce Sites
When working with distributed infrastructure across multiple production sites, a single misconfiguration or deployment can cascade into hours of recovery work. This post details the approach taken to create a comprehensive v1.0 snapshot of the JADA infrastructure—encompassing three e-commerce properties (queenofsandiego.com, sailjada.com, salejada.com), their supporting AWS services, and associated automation code.
The Problem: No Rollback Point
Without a documented, versioned snapshot of infrastructure state at a known-good point, recovery from deployment errors becomes reactive rather than proactive. The goal was to capture a complete point-in-time record of:
- All S3 bucket contents and configurations (46 buckets total)
- CloudFront distribution settings and cache behaviors (66 distributions)
- Lambda function code, environment variables, and IAM roles (21 functions)
- Route53 DNS records and hosted zones (16 zones)
- Google Apps Script projects powering backend automation (4 GAS projects)
- Lightsail instance snapshots for compute resources
- RDS, DynamoDB, and database configurations
- SES, API Gateway, and integration service configs
- Local development tooling and documentation
Technical Architecture: Parallel Distributed Snapshot
Rather than sequentially exporting each service (which would take hours), the snapshot process was architected using four parallel agents running concurrently:
- Agent 1: S3 Sync — AWS CLI recursive downloads of all 45 JADA-related buckets
- Agent 2: Lambda Export — Code extraction, environment variable capture, and IAM policy documentation for all 21 functions
- Agent 3: AWS Config Export — CloudFront distributions, Route53 zones, DynamoDB tables, SES configuration, ACM certificates
- Agent 4: Local Asset Capture — Google Apps Script projects via clasp CLI, development files, LaunchAgent configurations
This parallelization reduced wall-clock time from an estimated 3+ hours (sequential) to approximately 45 minutes (concurrent).
Infrastructure Inventory Captured
AWS Storage & CDN
The snapshot documented all S3 bucket configurations:
# Example: bucket naming conventions used
- qos-prod-site (Queen of San Diego production content)
- qos-staging-site
- sailjada-prod, sailjada-staging
- salejada-prod, salejada-staging
- qos-lambda-layers (shared Lambda dependencies)
- jada-backups-archive
- jada-admin-uploads
- [36 additional specialized buckets for assets, logs, archives]
CloudFront distributions were exported with full cache behavior configurations, origin settings, and WAF rules. Route53 hosted zones captured DNS records, health checks, and failover configurations across all three domains and their subdomains (admin, api, staging variants, etc.).
Compute & Functions
Lambda function snapshots included:
- Source code (downloaded via AWS Lambda console export)
- Environment variable names and structure (values redacted for security)
- IAM execution role permissions
- Memory allocation, timeout settings, and concurrency limits
- VPC configuration and security group associations
- Layer dependencies and their versions
Lightsail instance snapshots were initiated for persistent compute resources, capturing disk state, application configurations, and installed packages.
Automation & Code
Four Google Apps Script projects were pulled using the clasp CLI:
# Main JADA GAS project
clasp pull [project-id]
# Rady Shell replacement GAS
clasp pull [project-id]
# Rady Shell old version (maintained for reference)
clasp pull [project-id]
# EYD (Elizabeth Y. Davis) GAS project
clasp pull [project-id]
These projects handle order processing, customer communication, inventory management, and data synchronization between Shopify, databases, and email services.
Directory Structure & Organization
The v1.0 snapshot was organized hierarchically for easy navigation and future version control:
v1.0/
├── s3-buckets/
│ ├── qos-prod-site/
│ ├── sailjada-prod/
│ ├── salejada-prod/
│ ├── [43 additional buckets]
│ └── MANIFEST.md (file counts, sync timestamps)
├── cloudfront/
│ ├── distributions.json (all 66 distributions)
│ ├── cache-behaviors.json
│ └── origins.json
├── route53/
│ ├── hosted-zones.json
│ └── dns-records/ (per domain)
├── lambda/
│ ├── functions/ (code + config for each of 21)
│ ├── layers/
│ └── permissions.json
├── gas-projects/
│ ├── jada-main/
│ ├── rady-replacement/
│ ├── rady-old/
│ └── eyd/
├── lightsail/
│ ├── snapshots/ (jada-agent-v1.0-20260509)
│ └── instance-configs.json
├── databases/
│ ├── dynamodb-schemas/
│ ├── rds-configs.json
│ └── table-exports/
├── integrations/
│ ├── ses-config.json
│ ├── api-gateway-apis.json
│ └── webhooks.json
└── MANIFEST.md (master inventory)
Key Technical Decisions
1. Parallel Agents Over Sequential Exports
Running four independent agents allowed services with different dependencies to export simultaneously. Lambda code doesn't depend on S3 sync completion, so why wait?
2. Redacted Environment Variables
Environment variable names and structure were captured to understand dependencies, but actual values (API keys, database passwords, credentials) were excluded from the snapshot file structure. A separate encrypted file maintained the actual values with strict access control.
3. Infrastructure as Documentation
Rather than maintaining separate documentation, the snapshot itself became the source of truth. CloudFront distribution IDs, Lambda function names, S3 bucket structures, and Route53 records are all captured in queryable formats (JSON, markdown manifests).
4. GAS Projects via Clasp
Using Google's clasp CLI meant GAS source code was versioned alongside infrastructure, enabling side-by-side comparison of automation changes. Each GAS project's manifest.json and appsscript.json were captured to document library dependencies and OAuth scope requirements.
Validation & Verification
Post-snapshot, critical validations were performed:
- S3 file counts — Production vs. staging bucket parity confirmed (exact file counts per bucket documented)
- Lambda function count — All 21 functions accounted for with code size and memory configuration validated
- Route53 records — DNS record counts per zone verified; TTL values captured for failover scenarios
- GAS project consistency — Each project's script.json and code files