Preventing Deployment Regressions: Hardening a Multi-Environment Static Site Pipeline
Over a three-hour development session, a regression incident on queenofsandiego.com revealed a critical vulnerability in our CloudFront + S3 deployment workflow: stale local files overwrote production assets, erasing three working features in a single cp command. This post documents the failure mode, the eight hard rules now enforced in our CLAUDE.md system prompts, and the architectural patterns we adopted to make regressions detectable and preventable.
The Incident: What Broke
Our queenofsandiego.com site is hosted on S3 (s3://queenofsandiego.com) behind CloudFront distribution E2L4K9X1Z5M7Q. The HTML entry point is a single 3,650-line index.html that bundles hero animations, Stripe embedded checkout, and booking logic.
During the session, a local copy of index.html became stale—it lacked three commits merged into S3 prod over the previous two days:
- A JADA → BOOK NOW hero crossfade animation (CSS + JavaScript)
- The Stripe embedded checkout session flow (form handler)
- A deletion of the "For Ranch & Coast readers…" hero line (removed in a prior refactor)
The deployment command:
aws s3 cp /Users/cb/Documents/repos/sites/queenofsandiego.com/index.html \
s3://queenofsandiego.com/index.html
succeeded without error—but it overwrote the newer S3 version with the older local version. No warning. No versioning rollback. No diff output.
The CloudFront invalidation that followed (--path "/*") cache-busted the old file globally within seconds. By the time the session summary warned that S3 was ahead of local, the damage was live to users.
Why This Happened
Three human and process gaps combined:
- No pre-deploy diff discipline. The session did not pull S3 and diff against local before editing. We didn't know local was stale.
- Staging was bypassed. The deployment went directly to prod. Our stated rule is: staging first, always, single file, single environment. This was violated.
- Prior warnings were ignored. The session's own summary noted "stale local files detected" — but the next turn deployed anyway.
- No feature-token registry. We had no grep-able markers in S3 to verify that critical features were present before and after a deploy.
The Eight Hard Rules: Architecture of Prevention
We now enforce eight rules in /Users/cb/Documents/repos/sites/queenofsandiego.com/CLAUDE.md, automatically loaded by Claude on every QOS session. These are not suggestions—they block deployment until satisfied:
D1: Pull and diff before any local edit.
aws s3 cp s3://queenofsandiego.com/index.html ./index.html.prod
diff -u index.html.prod index.html | head -50
This proves local state vs. S3 state before we touch anything. If S3 is ahead, we rebase.
D2: Deploy to staging first, alone, always.
Every change goes to s3://staging-queenofsandiego.com with CloudFront dist E5N2X8P9K1Y4W first. We do NOT deploy to both in one command. After staging validation, we promote with a separate explicit command.
D3: One logical change per deployment.
If you're touching the hero animation AND the booking form AND fixing a typo, that's three deploys. This prevents "one bug hides another" scenarios and lets us bisect regressions fast.
D4: Obey your own prior-session warnings.
If a session summary says "S3 is ahead of local" or "stale files detected," the next turn must pull S3. Non-negotiable. We added a check in the memory system to flag this explicitly.
D5: Snapshot prod before overwriting.
S3 has versioning disabled (for cost). So before any cp, we snapshot:
aws s3 cp s3://queenofsandiego.com/index.html \
s3://queenofsandiego.com/.backups/index.html.$(date +%s)
This gives us a recoverable state if the deploy is wrong.
D6: Print a six-line proof block in chat before any cp.
Before executing aws s3 cp, the assistant must print:
=== DEPLOYMENT PROOF ===
TARGET: s3://queenofsandiego.com/index.html
SOURCE: /Users/cb/Documents/repos/sites/queenofsandiego.com/index.html
ENV: [staging|prod]
FEATURES PRESENT: [hero-fade, stripe-checkout, ...]
PRIOR DIFF: [5 lines of diff context]
===
This forces explicitness. If the proof looks wrong, you stop it before it runs.
D7: Maintain a feature-token registry in S3.
We now commit a _FEATURE_TOKENS.txt file to both staging and prod. It's a simple newline-delimited list:
hero-jada-crossfade=true
stripe-embedded-checkout=true
ranch-coast-hero-removed=true
After every deploy, we grep S3 to verify critical features are still present. Regression = tokens missing = rollback triggered.
D8: Escalate to CB if S3 is ahead of local.
If we detect S3 has commits not in local (via diff or git log), we stop, report it, and ask CB whether to rebase local or review S3 changes first. We never silently overwrite ahead state.
Infrastructure: Deployment Pipeline Architecture
Our QOS deployment now looks like this:
- Local repo:
/Users/cb/Documents/repos/sites/queenofsandiego.com/(git-tracked) - Staging S3:
s3://staging-queenofsandiego.com/+ CloudFrontE5N2X8P9K1Y4W - Prod S3:
s3://queenofsandiego.com/+ CloudFrontE2L4K9X1Z5M7Q - Backup bucket:
s3://queenofsandiego.com/.backups/(timestamped snapshots) - Feature registry:
s3://queenofsandiego.com/_FEATURE_TOKENS.txt(grep-indexed)
The promotion flow is now explicit