Preventing S3 Deployment Regressions: Hard Rules for Multi-Version Front-End Codebases
Over a three-hour development session, a production deployment to queenofsandiego.com inadvertently reverted three working features by deploying a stale local index.html over a newer S3 version. The incident wiped the hero JADA→BOOK NOW crossfade animation, the Stripe embedded checkout booking flow, and resurrected a previously-deleted "For Ranch & Coast readers..." hero line. This post documents the root cause, the hard rules now in place to prevent recurrence, and the infrastructure patterns that caught the regression during validation.
What Happened: The Stale Local File Problem
The deployment pipeline worked correctly—files were copied from local disk to S3, CloudFront cache was invalidated, and the site went live. However, the local /Users/cb/Documents/repos/sites/queenofsandiego.com/index.html had drifted behind what was already on S3 prod. When the cp command executed without a prior diff check, it overwrote the newer production version with the older local version.
The session summary from a prior interaction had explicitly warned: "before any edit to index.html, pull the S3 current version and diff it against local." That warning was ignored.
Root Causes
- No pre-deploy diff: The session did not pull S3 prod, diff local against it, and print the delta before deploying.
- Simultaneous staging + prod deploy: Both environments were promoted in a single command, violating the staging-first rule. This meant no gating point for review.
- No snapshot before overwrite: S3 versioning is not enabled, so the old production version was lost once the
cpcompleted. - No proof block: The session did not print a six-line manifest of exactly which files were being copied and to which bucket/prefix before executing.
- Ignored prior session guidance: The session had its own warnings and chose not to follow them.
The Eight Hard Rules Now Enforced
To prevent this class of regression, eight rules are now baked into /Users/cb/Documents/repos/sites/queenofsandiego.com/CLAUDE.md and load automatically at the start of every QOS development session:
- D1 — Pull and Diff Before Edit: Before any modification to
index.html, runaws s3 cp s3://qos-prod-www/index.html ./index.html.prod, thendiff -u index.html.prod index.html | head -50. Print the delta. If S3 is ahead, stop and escalate. - D2 — Staging Only, Single Target: Every deploy targets exactly one environment. Deploy to
s3://qos-staging-www/first. Wait for human review. Only then promote tos3://qos-prod-www/in a separate, logged command. - D3 — One File Per Logical Change: Each deployment is a single file or a tightly-scoped group (e.g., "all three hero images"). No multi-file sweeps. This makes rollback surgical.
- D4 — Obey Prior Session Warnings: If a prior session summary warns against an action, do not override it without explicit new instruction from the user.
- D5 — Snapshot Prod Before Overwrite: Before any
cpto S3 prod, download and timestamp the current version locally:aws s3 cp s3://qos-prod-www/index.html ./backups/index.html.$(date +%s). Keep the last five snapshots. - D6 — Print Proof Block: Before executing any
cporaws s3 sync, print a six-line proof block showing source file, destination bucket, destination prefix, file size, local modified time, and expected S3 etag. Wait for user acknowledgment. - D7 — Feature Token Registry: Maintain a file at
/Users/cb/Documents/repos/sites/queenofsandiego.com/FEATURE_TOKENS.mdlisting every active feature and a unique grep-able token in the HTML. Before deploying, grep the local version for each token. After deploy, curl the live site and grep the response. If a token vanishes, rollback immediately and alert. - D8 — Escalate When S3 Ahead of Local: If a diff reveals that S3 is newer than local for any critical file, stop the session. Document the delta. Alert CB with the diff. Do not proceed without explicit instruction.
Infrastructure: How the Regression Was Caught
The CloudFront distribution (d3...cloudfront.net, origin qos-prod-www.s3.us-west-2.amazonaws.com) propagates all S3 changes within 30 seconds. After the errant deploy, the live site showed the reverted hero and missing Stripe checkout. Validation tests against the staging environment (origin qos-staging-www.s3.us-west-2.amazonaws.com, separate CloudFront dist) confirmed the staging version still had the correct features.
The FEATURE_TOKENS.md` registry now being enforced will catch this earlier: a post-deploy grep for jada-book-now-fade and stripe-embedded-checkout in the live CloudFront response will fail before the user even reviews the site.
Key Decisions Going Forward
Why not enable S3 versioning? It adds cost and complexity. The snapshot-and-timestamp approach (rule D5) gives us the last five revisions locally with zero S3 storage overhead, and manual snapshots force intentionality—you only keep what matters.
Why staging first? A separate staging CloudFront distribution (with the same infrastructure as prod, pointing to the staging S3 bucket) lets humans review real CloudFront behavior—cache headers, content-type negotiation, gzip compression—before prod. This is not a feature branch; it's a production-identical clone.
Why a feature token registry? HTML is a bag of text. A grep-based smoke test is fast, deterministic, and requires zero JavaScript runtime. It catches deletions, not just broken functionality. Every feature that could regress gets a unique HTML comment or data attribute: <!-- FEATURE_TOKEN: jada-book-now-fade -->. A post-deploy script greps for all tokens in FEATURE_TOKENS.md and fails the build if any are missing.
What's Next
The rules are now embedded in the QOS CLAUDE.md and will be read at the start of every session. A condensed pointer was added to the top-level /Users/cb/Documents/repos/CLAUDE.md so developers on other sites understand the pattern. The next deployment to QOS will exercise rules D1–D8 end-to-end, with proof blocks printed to chat before any S3 command executes.
Pending: decision on whether to automate the feature token grep as a shell script or keep it manual for now. Manual gives humans a chance to review the list each session; automated gives speed and consistency.
```