Implementing Comprehensive Infrastructure Snapshots for Multi-Site AWS Deployments

```html

Managing three interconnected e-commerce properties (queenofsandiego.com, sailjada.com, salejada.com) with distributed AWS infrastructure requires meticulous version control and disaster recovery capabilities. This post documents the v1.0 snapshot strategy we implemented to capture the complete infrastructure state across S3, Lambda, CloudFront, Route53, and Google Apps Script projects.

The Problem: Distributed State Without Recovery Points

Operating 45+ S3 buckets, 66 CloudFront distributions, 21 Lambda functions, and 4 Google Apps Script projects across three domains created a complex system where individual component changes weren't being tracked holistically. Without a comprehensive snapshot mechanism, rolling back problematic deployments became time-consuming and error-prone. The solution required capturing not just code and configuration, but the complete infrastructure graph at a point in time.

Technical Architecture: Multi-Layer Snapshot Strategy

We designed a four-parallel-agent approach to capture the full infrastructure state simultaneously:

Layer 1: Compute Infrastructure (Lightsail)

Resource: Lightsail instance running jada-agent
Snapshot ID: jada-agent-v1.0-20260509
Capture method: AWS Lightsail snapshot API
Includes: System configuration, installed tools (clasp, AWS CLI, node modules), environment setup

Layer 2: Object Storage (S3 Inventory)

Synced all 45 JADA-related S3 buckets into local snapshot directories using parallel batch operations. Key buckets included:

Production buckets: queenofsandiego.com, sailjada.com, salejada.com
Staging variants: qos-staging, sailjada-staging, salejada-staging
Asset buckets: brand CSS, media libraries, archive storage
Sync command structure: aws s3 sync s3://bucket-name /snapshot/v1.0/s3/bucket-name --recursive

Total sync: 68MB+ of production and staging content across all properties. Final file count verification ensured no partial syncs occurred.

Layer 3: Serverless Functions (Lambda Export)

Extracted all 21 Lambda functions with complete configuration state:

Function code: Downloaded .zip from deployed version
Environment variables: Exported (values redacted in snapshot manifest)
Configuration: Memory, timeout, VPC settings, IAM role ARN references
Triggers: CloudWatch Events, API Gateway, S3, DynamoDB stream configurations

Export method: aws lambda get-function --function-name [name] --query 'Code.Location' --output text combined with aws lambda get-function-configuration for metadata.

Layer 4: DNS and Content Delivery

Captured complete CloudFront and Route53 configuration:

CloudFront: All 66 distributions with exact origin configurations, cache behaviors, TLS certificates, custom headers
Distribution IDs: Stored with domain mappings (e.g., E2QF3K8JZ9X for queenofsandiego.com production)
Route53: All 16 hosted zones with complete record sets including health checks and routing policies
ACM certificates: Certificate ARN references and domain validation records

Export command pattern: aws cloudfront list-distributions --query 'DistributionList.Items[*].[Id,DomainName,Origins.Items[0].DomainName]'

Layer 5: Google Apps Script Projects

Used clasp (Google Apps Script CLI) to pull all project source code:

Main JADA GAS: Core business logic and automation scripts
Rady Shell (Replacement): Updated implementation of key workflows
Rady Shell (Legacy): Previous version for reference
EYD GAS: Event-driven automation project

Snapshot structure: /snapshot/v1.0/gas/[project-name]/src/ with all .gs and .json files preserved.

Layer 6: Configuration and Documentation

Created a comprehensive MANIFEST.md documenting:

Snapshot creation timestamp and version identifier
Complete inventory of all S3 buckets with file counts
Lambda function names, runtime versions, and handler paths
CloudFront distribution IDs mapped to primary domains
Route53 hosted zone IDs and record counts
DynamoDB table names and key schemas (redacted sensitive data)
GAS project IDs and script function names
Checksum manifests for integrity verification

Key Implementation Decisions

Parallel Agent Architecture

Rather than sequential downloads that would take hours, we launched four independent background agents simultaneously:

Agent 1: S3 bucket sync operations (batch A: 24 buckets, batch B: 21 buckets)
Agent 2: Lambda function export and code download
Agent 3: AWS service configuration export (CloudFront, Route53, DynamoDB, API Gateway, SES)
Agent 4: Local project files and Google Apps Script code via clasp

This approach reduced total snapshot time from estimated 120+ minutes to approximately 25 minutes of wall-clock time.

Environment Variable Strategy

Rather than storing actual secrets in the snapshot, we captured the structure and key names only. The actual values remain in AWS Secrets Manager and parameter store, ensuring that a compromised snapshot doesn't expose credentials. Configuration includes pointers to the actual secret locations.

Staging/Production Parity Verification

Implemented file count validation to ensure both staging and production environments were fully captured. For example, verifying that qos-staging bucket had equivalent file counts to production queenofsandiego.com bucket, with expected differences documented.

Storage Organization

/snapshot/v1.0/
├── s3/                          # All 45 bucket syncs
│   ├── queenofsandiego.com/
│   ├── sailjada.com/
│   ├── salejada.com/
│   ├── qos-staging/
│   └── [43 additional buckets]
├── lambda/                      # All 21 function exports
│   ├── function-names.txt       # Function inventory
│   ├── configurations.json      # Complete metadata
│   └── [function-code-zips]/
├── gas/                         # Google Apps Script projects
│   ├── jada-main/