```html

Preventing S3 Deployment Regressions: Hard Rules for Stale Local File Detection

Last week, a routine deployment to queenofsandiego.com wiped three working features by uploading a stale local index.html over a newer production version in S3. The hero JADA→BOOK NOW crossfade animation, the Stripe embedded checkout flow, and a previously-removed "Ranch & Coast readers" line all vanished. This post documents the root cause, the detection rules we built to prevent recurrence, and the infrastructure pattern that should have caught it.

What Happened: The Stale File Problem

The deployment process looked straightforward:

cp /Users/cb/Documents/repos/sites/queenofsandiego.com/index.html s3://queenofsandiego-prod/index.html

The issue: the local index.html was older than the production version in S3. Three separate commits had modified the live version directly (or through a stale git branch), but the developer's working directory hadn't pulled those changes. The cp command executed without error—S3 accepted the overwrite—and the regression went live.

The deeper problem: deployment scripts typically don't validate that local files are newer than (or even identical to) remote versions. Git ensures this within a repository; S3 provides no such guarantee when you're copying files directly.

Root Cause: Missing Pre-Deployment Validation

A few factors aligned to create the failure:

  • No pull-and-diff step: The deployment process skipped fetching the current S3 object before writing. A simple aws s3 cp s3://queenofsandiego-prod/index.html ./index.html.remote followed by diff -u ./index.html.remote ./index.html would have surfaced the mismatch immediately.
  • Dual-target deploy: The command deployed to both staging and prod buckets in a single operation, violating the staging-first principle. Had it been split into two steps with manual review between them, the regression would have been caught on staging first.
  • No feature-token registry: The production file contains time-sensitive feature flags and implementation details (e.g., the Stripe checkout session ID, the crossfade animation keyframes). Without a registry of currently-active features, there's no way to audit whether a deployed file has lost critical code.
  • Ignored prior warnings: The previous session summary explicitly warned: "Local files may be stale relative to S3. Always pull and diff before editing." This guidance was present but not enforced in the deployment tooling.

The Fix: Eight Hard Rules for QOS Deployments

We added an auto-loading ruleset to /Users/cb/Documents/repos/sites/queenofsandiego.com/CLAUDE.md. Every new agent session for Queen of San Diego automatically inherits these rules:

  • D1 — Pull and diff before any edit: Before modifying index.html locally, run:
    aws s3 cp s3://queenofsandiego-prod/index.html ./index.html.prod-snapshot
    diff -u ./index.html.prod-snapshot ./index.html | head -50
    
    If the diff is non-empty, stop and report it to CB before proceeding.
  • D2 — Single-target, staging-first deploys: Never deploy to prod and staging in the same command. Deploy to staging first:
    aws s3 cp ./index.html s3://queenofsandiego-staging/index.html --cache-control "max-age=0"
    
    Wait for CB's approval, then promote:
    aws s3 cp s3://queenofsandiego-staging/index.html s3://queenofsandiego-prod/index.html
    
  • D3 — One logical change per deploy: A single cp command moves only one file for one stated reason (e.g., "fix hero fade keyframes" or "update Stripe session logic"). If you're tempted to change multiple files, split into separate commits and deploys.
  • D4 — Obey your own prior warnings: Every session that touches S3 deployment includes a warning block. Re-read it before running any cp or aws s3 command. If your prior self said "check for stale files," check them.
  • D5 — Snapshot prod before overwriting: S3 bucket versioning is not enabled on queenofsandiego-prod (by design—cost and compliance). Before any cp that overwrites, manually snapshot:
    aws s3 cp s3://queenofsandiego-prod/index.html ./index.html.backup-$(date +%s)
    git add ./index.html.backup-* && git commit -m "Pre-deploy snapshot"
    
    This ensures rollback is one cp away.
  • D6 — Print a six-line proof block before execution: Before running any cp` to S3, print to chat:
    LOCAL FILE:  index.html ($(wc -l < index.html) lines, $(md5sum index.html | cut -d' ' -f1))
    REMOTE FILE: s3://queenofsandiego-prod/index.html (X lines, HASH)
    DIFF LINES:  N insertions, M deletions
    TARGET:      s3://queenofsandiego-prod/index.html
    REASON:      [your stated reason for this change]
    APPROVAL:    Waiting for CB confirmation
    
    Do not proceed until CB responds "deploy" or "hold."
  • D7 — Maintain a feature-token registry: The index.html file contains live feature implementations. At the top of the file (or in a separate FEATURES.md), keep a simple registry:
    ## Active Features (grep targets for S3 audit)
    - hero-jada-fade: line 342, CSS @keyframes hero-jada, animation-duration 4s
    - stripe-embedded-checkout: line 1204, form id="stripe-embed-checkout"
    - referral-code-input: line 892, input id="referral-code" (disabled until launch)
    
    Before deploying a new version, grep the remote S3 file for each token. If any are missing and shouldn't be, abort and alert CB.
  • D8 — Escalate to CB if S3 is ahead of local: If diff -u ./index.html.prod-snapshot ./index.html shows that the remote has code the local doesn't, do not merge or overwrite. Message CB with the diff and ask: "Should I rebase on this?" This is how we catch race conditions.

Infrastructure Context

S3 buckets involved:

  • queenofsandiego-staging — CloudFront distribution D2STAGING, serves staging.queenofsandiego.com. Cache TTL 0 for index.html, 3600 for static assets.
  • queenofsandiego-prod — CloudFront distribution D2PROD, serves queenofsandiego.com. Cache TTL 300