Building a Comprehensive Infrastructure Snapshot: Lessons from a Multi-Cloud Staging Environment
When working with distributed infrastructure spanning multiple cloud providers, services, and local systems, the ability to capture a complete point-in-time snapshot becomes critical. This post details the technical approach we used to create a v1.0 snapshot of a complex JADA ecosystem comprising three production websites, 45+ S3 buckets, 66 CloudFront distributions, 21 Lambda functions, and multiple Google Apps Script projects.
The Challenge: Capturing Distributed State
The JADA infrastructure spans several layers:
- AWS Services: S3 buckets, CloudFront distributions, Lambda functions, Route53 hosted zones, DynamoDB tables, API Gateway endpoints, ACM certificates, and SES configurations
- Google Cloud: Four distinct Google Apps Script projects for different business functions
- AWS Lightsail: Virtual server instances requiring VM-level snapshots
- Local Development: Repository code, tooling scripts, configuration files, and documentation
A traditional point-in-time backup of any single layer would be insufficient. We needed comprehensive snapshots across all layers, taken simultaneously to ensure consistency.
Technical Architecture: Parallel Multi-Agent Approach
Rather than sequential operations that would require hours, we implemented a four-agent parallel architecture:
Agent 1: S3 Sync Operations
├── sync queenofsandiego.com buckets
├── sync sailjada.com buckets
├── sync salejada.com buckets
├── sync auxiliary buckets
└── parallel batch processing (A/B/C splits)
Agent 2: Lambda & Code Export
├── export all 21 Lambda function code zips
├── capture environment variables
├── extract IAM role policies
└── document configuration and triggers
Agent 3: AWS Configuration Export
├── describe all 66 CloudFront distributions
├── export Route53 hosted zone configurations
├── capture 14 DynamoDB table schemas
├── extract API Gateway definitions
└── document SES configuration and sending limits
Agent 4: Local System Snapshots
├── clasp pull all GAS projects
├── sync repository directories
├── capture development tooling
└── archive documentation and handoffs
S3 and CloudFront Infrastructure Details
The snapshot captured the production S3 bucket structure for three primary sites. For each domain, we created corresponding staging buckets with prefixes:
queenofsandiego.com→ synced to staging with CloudFront distributiond***staging-qossailjada.com→ synced to staging with corresponding distributionsalejada.com→ synced to staging with corresponding distribution
Each CloudFront distribution required invalidation after staging updates to ensure cache coherency. The invalidation pattern used was:
aws cloudfront create-invalidation \
--distribution-id D*** \
--paths "/*" \
--query 'Invalidation.Id' \
--output text
This approach ensures staging serves fresh content while maintaining production integrity through separate distribution IDs.
Lambda Function Inventory and Deployment State
We captured all 21 Lambda functions with:
- Source code exported via
aws lambda get-functionto retrieve deployment packages - Environment variable configurations preserved without exposing values
- IAM role associations documented for permission matrix reconstruction
- Trigger configurations (API Gateway, S3 events, scheduled rules) recorded for infrastructure-as-code regeneration
The snapshot included the update_dashboard.py deployment utility, which orchestrates Lambda function updates across the JADA ecosystem. This tool was modified during the staging workflow to support better error reporting and batch operations.
Google Apps Script Projects: Clasp Integration
Four GAS projects were pulled via clasp:
- Main JADA GAS project (core automation)
- Rady Shell replacement GAS
- Rady Shell legacy/old GAS
- EYD GAS project (event-specific automation)
Each project was exported to /Users/cb/Documents/repos/memory/snapshot-v1.0/gas/ with subdirectories preserving project structure. The clasp workflow used:
clasp pull [project-id] --rootDir /snapshot/path/project-name
This captured not only source code but also .clasp.json configuration files containing project script IDs for future redeploy scenarios.
Critical Staging Workflow Decisions
During snapshot creation, we encountered and documented three critical staging synchronization issues:
1. Font Rendering in Staging
The staging environment showed letter-spacing inconsistencies in brand headers. We identified the issue in the brand CSS styles and modified the letter-spacing property while preserving the original text-transform declarations. The fix was applied to the staging index files and validated before CloudFront invalidation.
2. Product Pricing References
The Bob Dylan product page contained hardcoded price references that required staging-specific values. We downloaded the production version from the bobdylan bucket, identified all $225 references, updated them for staging, and deployed to the staging path with CloudFront cache invalidation.
3. Navigation and Event Page Consistency
The events page required synchronization between production and staging, particularly for "James Taylor" and "All Events" navigation elements. We used S3 API operations to directly compare and sync content:
aws s3api list-objects-v2 \
--bucket sailjada-staging \
--prefix events/ \
--output table
Infrastructure Snapshot Storage
The complete v1.0 snapshot was organized into the following directory structure:
/snapshot-v1.0/
├── MANIFEST.md # Complete inventory and checksum index
├── s3-buckets/ # All 45 synced buckets (68MB+)
├── lambda/ # 21 function codes + configurations
├── cloudfront/ # 66 distribution definitions
├── route53/ # 16 hosted zone exports
├── dynamodb/ # 14 table schemas and backups
├── gas/ # 4 GAS projects with source
├── lightsail/ # VM snapshot identifier
├── local-repos/ # Development repositories
└── tools/ # Deployment and utility scripts
A MANIFEST.md file was generated documenting every resource, file count, and checksum for verification purposes.
Lessons Learned: Why Parallel Agents Mattered
Sequential operations would have taken 4-6 hours. The parallel four-agent approach reduced this to approximately 30 minutes. More importantly, it enabled true point-in-time consistency—all components were captured within a narrow time window, preventing state inconsistencies that often occur during long sequential backup operations.
What's Next
With v1.0 snapshot complete, future work includes:
- Automated daily snapshots using the same parallel methodology
- Differential backup system to reduce storage and bandwidth
- Infrastructure-as-Code generation