Preventing S3 State Divergence: How a Stale Local File Regressed Three Features in Production
Last week, a deployment to queenofsandiego.com inadvertently reverted three working features by pushing a stale local index.html over a newer S3 production version. This post breaks down what happened, why it happened, and the hard rules we've implemented to prevent it from happening again.
What Went Wrong
During a three-hour development window, an agent deployed /Users/cb/Documents/repos/sites/queenofsandiego.com/index.html to the production S3 bucket without first pulling the current remote state and diffing against it. The local file was stale by approximately two commits, and the push wiped:
- The working JADA → BOOK NOW hero crossfade animation
- The Stripe embedded checkout booking flow integration
- The prior removal of the "For Ranch & Coast readers..." hero line (which was resurrected)
Additionally, the deployment violated an internal protocol by targeting both staging and prod distributions in a single CloudFront invalidation, rather than staging-first validation.
Root Cause: Bidirectional State Drift
The core issue was bidirectional state drift: the local filesystem and S3 had diverged, but the deployment process assumed local was authoritative. No diff was computed before the push, and no snapshot of production was captured beforehand.
The agent had also received a session-summary warning from a prior session that flagged stale local files, but this warning was not automatically enforced or re-checked before deployment.
Technical Details: The Deployment Flow
The deployment command structure was:
# What was attempted (WRONG):
cp index.html s3://queenofsandiego-prod/index.html
aws cloudfront create-invalidation --distribution-id PROD_DIST_ID --paths "/*"
aws cloudfront create-invalidation --distribution-id STAGING_DIST_ID --paths "/*"
The correct flow should have been:
# 1. Pull current S3 state to a temp file
aws s3 cp s3://queenofsandiego-prod/index.html ./index.html.remote
# 2. Diff local against remote
diff -u index.html.remote index.html
# 3. If divergent, inspect S3 git history (via CloudFront logs or S3 event logs)
# and verify local is truly ahead before proceeding
# 4. Deploy to staging first
cp index.html s3://queenofsandiego-staging/index.html
aws cloudfront create-invalidation --distribution-id STAGING_DIST_ID --paths "/*"
# 5. Wait and verify staging, then promote to prod
cp index.html s3://queenofsandiego-prod/index.html
aws cloudfront create-invalidation --distribution-id PROD_DIST_ID --paths "/*"
Infrastructure Context
The site uses a standard S3 + CloudFront + Route53 architecture:
- S3 Buckets:
queenofsandiego-prod(production content) andqueenofsandiego-staging(staging content) - CloudFront Distributions: Production distribution ID and staging distribution ID (both configured for index.html as default root object)
- Route53: Primary domain routed to prod distribution; staging subdomain routed to staging distribution
- S3 Versioning: Disabled (a critical gap — we now require versioning be enabled)
Because S3 versioning was not enabled, there was no recovery path once the stale file overwrote the production version. The old version was lost.
The Hard Rules (D1–D8)
To prevent this class of failure, we've implemented eight hard rules that auto-load into the QOS session context (stored in /Users/cb/Documents/repos/sites/queenofsandiego.com/CLAUDE.md):
- D1: Pull S3 and Diff Before Edit — Before modifying
index.html, runaws s3 cp s3://queenofsandiego-prod/index.html ./index.html.remoteand diff against local. If divergent, escalate to CB. - D2: Staging-Only Single-Target Deploys — Never push to
prodandstagingin the same command. Always deploy to staging first, verify, then promote. - D3: One File Per Logical Change — Each logical feature change gets its own commit and deployment window. No batching unrelated changes.
- D4: Obey Prior Session Warnings — If a prior session flagged stale local state, re-pull before proceeding. Do not assume the warning is stale.
- D5: Snapshot Prod Before Overwrite — Before deploying to prod, save the current
index.htmlto a timestamped backup file locally. S3 versioning is now enabled, but this is a manual safety net. - D6: Six-Line Proof Block — Before any
cpto S3, print a six-line proof block showing: the local file hash, the remote file hash, the diff summary, the feature token being changed, the staging invalidation ID, and the prod invalidation ID (once staging is confirmed). This forces explicit confirmation. - D7: Feature-Token Registry — Maintain a registry of feature tokens (e.g.,
JADA_HERO_FADE,STRIPE_CHECKOUT) in the site's CLAUDE.md. Before any push, grep the local file and S3 current state to confirm all expected tokens are present. - D8: Escalate When S3 is Ahead — If the remote S3 file is newer or larger than local (indicating undeployed changes in prod), escalate to CB immediately. Do not overwrite.
Command Examples: The Safe Flow
# Step 1: Pull and diff
aws s3 cp s3://queenofsandiego-prod/index.html ./index.html.remote
diff -u index.html.remote index.html > /tmp/changes.diff
cat /tmp/changes.diff
# Step 2: Verify feature tokens
grep -o "JADA_HERO_FADE\|STRIPE_CHECKOUT\|RANCH_COAST_LINE" index.html | sort | uniq -c
# Step 3: Stage to staging bucket
aws s3 cp index.html s3://queenofsandiego-staging/index.html --metadata "deployed-by=agent,timestamp=$(date +%s)"
# Step 4: Invalidate staging CloudFront
STAGING_INVALIDATION=$(aws cloudfront create-invalidation \
--distribution-id STAGING_DIST_ID \
--paths "/*" \
--query 'Invalidation.Id' \
--output text)
echo "Staging invalidation: $STAGING_INVALIDATION"
# Step 5: Wait and verify staging (manual step — check staging subdomain in browser)
# Once verified, proceed to prod
# Step 6: Snapshot prod
cp s3://queenofsand