Preventing S3 Deployment Regressions: Hard Rules for Stale Local File Detection
Last week, a routine deployment to queenofsandiego.com wiped three working features by uploading a stale local index.html over a newer production version in S3. The hero JADA→BOOK NOW crossfade animation, the Stripe embedded checkout flow, and a previously-removed "Ranch & Coast readers" line all vanished. This post documents the root cause, the detection rules we built to prevent recurrence, and the infrastructure pattern that should have caught it.
What Happened: The Stale File Problem
The deployment process looked straightforward:
cp /Users/cb/Documents/repos/sites/queenofsandiego.com/index.html s3://queenofsandiego-prod/index.html
The issue: the local index.html was older than the production version in S3. Three separate commits had modified the live version directly (or through a stale git branch), but the developer's working directory hadn't pulled those changes. The cp command executed without error—S3 accepted the overwrite—and the regression went live.
The deeper problem: deployment scripts typically don't validate that local files are newer than (or even identical to) remote versions. Git ensures this within a repository; S3 provides no such guarantee when you're copying files directly.
Root Cause: Missing Pre-Deployment Validation
A few factors aligned to create the failure:
- No pull-and-diff step: The deployment process skipped fetching the current S3 object before writing. A simple
aws s3 cp s3://queenofsandiego-prod/index.html ./index.html.remotefollowed bydiff -u ./index.html.remote ./index.htmlwould have surfaced the mismatch immediately. - Dual-target deploy: The command deployed to both
stagingandprodbuckets in a single operation, violating the staging-first principle. Had it been split into two steps with manual review between them, the regression would have been caught on staging first. - No feature-token registry: The production file contains time-sensitive feature flags and implementation details (e.g., the Stripe checkout session ID, the crossfade animation keyframes). Without a registry of currently-active features, there's no way to audit whether a deployed file has lost critical code.
- Ignored prior warnings: The previous session summary explicitly warned: "Local files may be stale relative to S3. Always pull and diff before editing." This guidance was present but not enforced in the deployment tooling.
The Fix: Eight Hard Rules for QOS Deployments
We added an auto-loading ruleset to /Users/cb/Documents/repos/sites/queenofsandiego.com/CLAUDE.md. Every new agent session for Queen of San Diego automatically inherits these rules:
- D1 — Pull and diff before any edit: Before modifying
index.htmllocally, run:
If the diff is non-empty, stop and report it to CB before proceeding.aws s3 cp s3://queenofsandiego-prod/index.html ./index.html.prod-snapshot diff -u ./index.html.prod-snapshot ./index.html | head -50 - D2 — Single-target, staging-first deploys: Never deploy to prod and staging in the same command. Deploy to staging first:
Wait for CB's approval, then promote:aws s3 cp ./index.html s3://queenofsandiego-staging/index.html --cache-control "max-age=0"aws s3 cp s3://queenofsandiego-staging/index.html s3://queenofsandiego-prod/index.html - D3 — One logical change per deploy: A single
cpcommand moves only one file for one stated reason (e.g., "fix hero fade keyframes" or "update Stripe session logic"). If you're tempted to change multiple files, split into separate commits and deploys. - D4 — Obey your own prior warnings: Every session that touches S3 deployment includes a warning block. Re-read it before running any
cporaws s3command. If your prior self said "check for stale files," check them. - D5 — Snapshot prod before overwriting: S3 bucket versioning is not enabled on queenofsandiego-prod (by design—cost and compliance). Before any
cpthat overwrites, manually snapshot:
This ensures rollback is oneaws s3 cp s3://queenofsandiego-prod/index.html ./index.html.backup-$(date +%s) git add ./index.html.backup-* && git commit -m "Pre-deploy snapshot"cpaway. - D6 — Print a six-line proof block before execution: Before running any
cp` to S3, print to chat:
Do not proceed until CB responds "deploy" or "hold."LOCAL FILE: index.html ($(wc -l < index.html) lines, $(md5sum index.html | cut -d' ' -f1)) REMOTE FILE: s3://queenofsandiego-prod/index.html (X lines, HASH) DIFF LINES: N insertions, M deletions TARGET: s3://queenofsandiego-prod/index.html REASON: [your stated reason for this change] APPROVAL: Waiting for CB confirmation - D7 — Maintain a feature-token registry: The
index.htmlfile contains live feature implementations. At the top of the file (or in a separateFEATURES.md), keep a simple registry:
Before deploying a new version, grep the remote S3 file for each token. If any are missing and shouldn't be, abort and alert CB.## Active Features (grep targets for S3 audit) - hero-jada-fade: line 342, CSS @keyframes hero-jada, animation-duration 4s - stripe-embedded-checkout: line 1204, form id="stripe-embed-checkout" - referral-code-input: line 892, input id="referral-code" (disabled until launch) - D8 — Escalate to CB if S3 is ahead of local: If
diff -u ./index.html.prod-snapshot ./index.htmlshows that the remote has code the local doesn't, do not merge or overwrite. Message CB with the diff and ask: "Should I rebase on this?" This is how we catch race conditions.
Infrastructure Context
S3 buckets involved:
queenofsandiego-staging— CloudFront distributionD2STAGING, servesstaging.queenofsandiego.com. Cache TTL 0 for index.html, 3600 for static assets.queenofsandiego-prod— CloudFront distributionD2PROD, servesqueenofsandiego.com. Cache TTL 300