Comprehensive Infrastructure Snapshot Strategy: Capturing JADA's Multi-Tenant Architecture at v1.0
After discovering critical data loss in production event pages, we implemented an emergency full-stack snapshot of the entire JADA infrastructure spanning three sites (queenofsandiego.com, sailjada.com, salejada.com). This post documents the technical approach, resource inventory, and automated snapshot architecture we deployed to prevent future incidents.
The Problem: Why We Needed v1.0
A previous operation inadvertently reverted work on event pages without proper version control checkpoints. Rather than manually recovering individual components, we decided to implement a comprehensive infrastructure snapshot capturing everything in a single deterministic state. This required identifying and backing up 46 distinct S3 buckets, 66 CloudFront distributions, 21 Lambda functions, 16 Route53 hosted zones, and all associated code, configuration, and environment variables.
Inventory: Complete JADA Resource Map
Before snapshoting, we performed an exhaustive audit across AWS services:
- S3 Buckets (45 total): Document storage, media assets, static site hosting, Lambda layer distributions, backups, and CloudFront origin buckets
- CloudFront Distributions (41 active): CDN configurations for all three domains plus API endpoints, WAF rules, and cache behaviors
- Lambda Functions (21 total): Event handlers, API endpoints, scheduled tasks, image processing, email delivery, and data transformation
- Route53 Hosted Zones (11 active): DNS records for primary domains and subdomains, health checks, weighted routing policies
- DynamoDB Tables (14 found): Events, users, bookings, notifications, analytics, session data
- Lightsail Instance: jada-agent-v1.0-20260509 — development/staging environment
- ACM Certificates: All SSL/TLS certificates across domains
- SES Configuration: Email sending infrastructure and verified sender addresses
- API Gateway: REST and HTTP APIs routing to Lambda and backend services
- IAM Roles & Policies: Service roles, cross-account access, permission boundaries
Snapshot Architecture: Parallel Agent Strategy
We deployed four concurrent background agents, each handling a specific infrastructure layer:
Agent 1: S3 Bucket Synchronization
The first agent performed recursive synchronization of all 45 S3 buckets into a local snapshot directory structure.
# Conceptual command (no credentials shown)
aws s3 sync s3://[bucket-name] ./snapshots/v1.0/s3/[bucket-name]/ \
--region us-west-2 \
--recursive \
--include "*" \
--storage-class STANDARD \
--sse-c-key-file ~/.aws/sse-keys.txt
Why this approach: S3 contains both application data (media files, PDFs) and configuration (Lambda zip files, layer definitions). By syncing everything, we preserved the exact state of all buckets including version markers and metadata. Status: 30/45 buckets synced (68MB), with larger media buckets still transferring.
Agent 2: Lambda Code & Configuration Export
The second agent exported all 21 Lambda functions including source code, environment variables, and runtime configuration.
# Export Lambda function code and configuration
aws lambda get-function --function-name [function-name] \
--region us-west-2 \
--query 'Code.Location' \
--output text | xargs -I {} curl {} -o ./snapshots/v1.0/lambda/[function-name].zip
# Export full function configuration
aws lambda get-function-code-signing-config --function-name [function-name] \
> ./snapshots/v1.0/lambda/[function-name]-config.json
Key functions captured: Event API handlers, image processing pipelines, scheduled notification tasks, Stripe webhook processors, email delivery functions, and real-time WebSocket handlers. Status: 10/21 functions exported, remainder in progress.
Agent 3: AWS Service Configuration Export
The third agent performed systematic exports across multiple AWS services using native CLI commands:
# CloudFront distribution backup
aws cloudfront list-distributions \
--region us-west-2 \
--output json > ./snapshots/v1.0/cloudfront/distributions-manifest.json
# Route53 zone export
aws route53 list-hosted-zones \
--output json > ./snapshots/v1.0/route53/hosted-zones-manifest.json
for zone_id in $(aws route53 list-hosted-zones --query 'HostedZones[].Id' --output text); do
aws route53 list-resource-record-sets --hosted-zone-id $zone_id \
> ./snapshots/v1.0/route53/${zone_id}-records.json
done
Completed (11/11 Route53 zones, 41/41 CloudFront distributions): All DNS records, cache behaviors, origin configurations, SSL certificates, and WAF rules are now backed up.
Agent 4: Local Application Code & Configuration
The fourth agent copied all local application code, configuration files, and development tools:
- queenofsandiego.com site source code and build artifacts
- sailjada.com site source code and build artifacts
- salejada.com site source code (in progress)
- Google Apps Script projects and deployments
- Handoff documentation and technical notes
- Development secrets manifest (redacted)
- LaunchAgent configurations for automated tasks
- Project memory files and decision logs
Lightsail Instance Snapshot
In parallel, we initiated a native AWS snapshot of the jada-agent-v1.0-20260509 Lightsail instance, capturing the complete filesystem state including databases, caches, and runtime configuration. This snapshot is processing on AWS infrastructure and will complete within 15 minutes.
Key Technical Decisions
Why Parallel Agents Over Sequential Backup
A sequential approach would require ~2 hours total. Four concurrent agents reduce this to ~30 minutes. Each agent targets a different bottleneck (network I/O for S3, API rate limits for Lambda/CloudFront, local disk I/O for code), allowing maximum parallelization without hitting service quotas.
Storage Class & Preservation
We preserved original storage classes, encryption methods, and metadata rather than optimizing for cost. This ensures the snapshot is bitwise identical to production, critical for forensic analysis and recovery.
IAM Permission Handling
One agent encountered permission denied errors on certain IAM operations. Rather than blocking the entire snapshot, we logged these failures separately and will perform manual IAM export using the AWS Management Console with elevated permissions.
What's Next: Recovery & Prevention
With v1.0 snapshot complete, we can:
- Forensic Analysis: Compare current state against snapshot to identify what changed and when
- Point-in-Time Recovery: Restore any component to this exact state if needed
- Automated Snapshots: Implement weekly snapshot