Preventing S3 Deployment Regressions: A Case Study in Staging-First Architecture and Pre-Flight Validation
Last week, a deployment to queenofsandiego.com wiped three working features by pushing a stale local index.html over a newer version in production S3. The hero JADA→BOOK NOW crossfade, the Stripe embedded checkout flow, and a previously-removed hero line all vanished in a single cp command. This post documents the failure mode, the architectural gaps that enabled it, and the hard rules we've added to prevent recurrence.
What Went Wrong: The Incident
The QOS deployment pipeline runs like this:
- Local development at
/Users/cb/Documents/repos/sites/queenofsandiego.com/ - Staging S3 bucket:
staging.queenofsandiego.com - Production S3 bucket:
queenofsandiego.com - CloudFront distribution:
d1j7ixr1hqpi2s.cloudfront.net(prod alias:www.queenofsandiego.com)
The agent pushed both staging and prod in a single command, without pulling the current prod version first to diff against local. The local index.html was ~6 hours stale (last touched during a prior session), but the prod S3 version contained 3 hours of newer CSS classes, Stripe integration code, and hero markup. Result: three features reverted simultaneously, invisible until the CloudFront cache expired.
Root causes:
- No pre-flight S3 diff before overwrite
- Staging and prod deployed in a single command (violates separation of concern)
- Prior session summary warned about stale local files; warning ignored
- No snapshot of prod before overwriting (S3 versioning not enabled on this bucket)
- No proof block printed to chat before the destructive
cp
The Fix: Eight Hard Rules for S3 Deployments
We codified these rules into /Users/cb/Documents/repos/sites/queenofsandiego.com/CLAUDE.md, which auto-loads at the start of every QOS session. Each rule is numbered D1–D8 and written in plain English so an AI agent can parse and follow them without interpretation:
- D1 — Pull and Diff Before Edit: Before touching any file bound for S3, run
aws s3 cp s3://queenofsandiego.com/index.html ./index.html.s3-prodanddiff -u index.html.s3-prod index.html. Print the diff to chat. If prod is ahead, stop and escalate. - D2 — Staging-Only Single-Target Deploys: Deploy to staging first, alone. Never deploy staging and prod in the same command. Separate commands by ≥5 minutes so cache expires.
- D3 — One File Per Logical Change: Each commit and each deployment targets one file or one cohesive feature block. If you're deploying
index.htmlandstyles.cssin the same push, split it. - D4 — Obey Your Own Prior Session Warnings: If a prior session summary says "local files may be stale," treat it as blocking. Pull S3 and verify timestamps before proceeding.
- D5 — Snapshot Prod Before Overwrite: Since S3 versioning is not enabled, manually copy prod to a dated backup:
aws s3 cp s3://queenofsandiego.com/index.html s3://queenofsandiego.com/backups/index.html.$(date +%Y%m%d-%H%M%S). Print the backup path to chat. - D6 — Print a Six-Line Proof Block Before Any cp: Before executing
aws s3 cp, print to chat: file path, S3 target, local timestamp, S3 timestamp, the diff summary, and an explicit "PROCEED Y/N?" prompt. Wait for approval. - D7 — Maintain a Feature Token Registry: Keep a plaintext file at
sites/queenofsandiego.com/S3_FEATURES.txtlisting every deployed feature and a unique grep token (e.g., "JADA-CROSSFADE: data-fade-state="). Before deploying, grep the S3 prod version for all tokens. If any token is missing, prod has regressed—stop and investigate. - D8 — Escalate When S3 Is Ahead of Local: If the diff in D1 shows prod is newer, do not overwrite. Post the diff to the session, mark it BLOCKING, and message CB with the diff and a timestamp.
Infrastructure and Deployment Mechanics
QOS uses a straightforward S3 + CloudFront + Route53 setup:
- S3 buckets:
queenofsandiego.com(prod) andstaging.queenofsandiego.com(staging). Both have public-read ACLs onindex.html,styles.css, and assets. - CloudFront distribution:
d1j7ixr1hqpi2s.cloudfront.net, with CNAME aliaswww.queenofsandiego.com. Cache TTL is 3600s (1 hour) for HTML, 86400s for static assets. Invalidation is manual via AWS CLI. - Deployment command (staging):
aws s3 cp index.html s3://staging.queenofsandiego.com/index.html \ --content-type text/html \ --acl public-read \ --cache-control "max-age=3600" - Deployment command (prod):
aws s3 cp index.html s3://queenofsandiego.com/index.html \ --content-type text/html \ --acl public-read \ --cache-control "max-age=3600" - CloudFront invalidation (after prod deploy):
aws cloudfront create-invalidation \ --distribution-id d1j7ixr1hqpi2s \ --paths "/*"
Route53 points queenofsandiego.com and www.queenofsandiego.com to the CloudFront distribution via A record (alias). staging.queenofsandiego.com points directly to the staging S3 bucket endpoint.
Why These Rules Matter
S3 is immutable at the API level—there is no "undo" unless you have versioning or backups. A single cp command can destroy hours of work. The