Preventing Deployment Regressions: Hard Rules for S3 CI/CD Without Versioning
Over the past 3 hours, a development session regressed three critical features on queenofsandiego.com by deploying a stale local index.html to production S3, wiping the hero image crossfade animation, the Stripe embedded checkout booking flow, and inadvertently resurrecting a deleted marketing line. The root cause: no S3 versioning, no pre-deploy S3 snapshot, and deployment of both staging and production in a single command. This post documents the exact hard rules we've now baked into the codebase to prevent this class of failure.
What Happened
- Local
/Users/cb/Documents/repos/sites/queenofsandiego.com/index.htmlwas stale (last edited 36 hours ago) - Production S3 bucket
queenofsandiego.comheld a newer, feature-complete version deployed 12 hours prior - A single
cpcommand overwrote S3 prod with the stale local file - CloudFront distribution
E2ABC123DEFG45cached the broken version before manual invalidation - Three features deleted: (1) JADA → BOOK NOW hero fade, (2) Stripe Session initialization, (3) conditional "For Ranch & Coast readers..." hero text removed in an earlier session
Technical Details: The Eight Hard Rules
We've codified eight mandatory checks into /Users/cb/Documents/repos/sites/queenofsandiego.com/CLAUDE.md, loaded automatically on every session:
D1: Pull and Diff S3 Before Any Edit
aws s3 cp s3://queenofsandiego.com/index.html ./index.html.prod
diff -u index.html.prod index.html | head -50
Why: Know what's actually live before touching anything. Local files can lag by hours; S3 is the source of truth for prod.
D2: Staging-Only, Single-Target Deploys
Never deploy to both staging.queenofsandiego.com and queenofsandiego.com in one command. Each deployment is a separate, deliberate action with explicit confirmation.
Why: The prior session had a shell script that deployed both in a loop. One typo = production is down.
D3: One Logical Change Per Deployment
If you're changing the hero image AND the checkout flow, that's two separate PRs, two separate deploys. Each change lands in a named feature branch and is deployed to staging for isolated review.
Why: When something breaks, you know exactly which change did it. Rollback is a one-file cp, not a six-file mystery.
D4: Obey Your Own Prior Session Warnings
The Sonnet 4.6 session that caused this regress left a summary note in MEMORY.md saying "Local index.html may be stale; pull S3 and diff before deploying." That note was ignored 90 minutes later in the same session.
Why: Session context drifts. Memory files exist to prevent your own future self from repeating mistakes. Treat them as law.
D5: Snapshot Prod Before Overwriting (No S3 Versioning)
S3 versioning is disabled on both production buckets to keep costs flat. Before any cp to prod:
aws s3 cp s3://queenofsandiego.com/index.html ./snapshots/index.html.$(date +%s).backup
Why: You now have a recovery point. If the deploy was a mistake, you can restore in under 30 seconds without version history clutter.
D6: Print a Six-Line Proof Block Before Pushing
Before executing the final aws s3 cp to production, print a block to chat showing:
- Source file path and timestamp
- Target S3 path and current CloudFront cache status
- The specific lines changed (diff stat)
- Which features are affected
- CloudFront distribution ID that will need invalidation
- Your explicit intent: "Deploying X to prod, invalidating Y, then verifying Z"
Why: CB reads this before you hit Enter. Catches "wait, that's the wrong file" before S3 is overwritten.
D7: Feature-Token Registry
Maintain a FEATURES.md file in the site root listing every major feature and a unique identifier (e.g., JADA_BOOK_NOW_FADE, STRIPE_CHECKOUT_V2). Before deploying to prod, grep the live S3 file for those tokens:
aws s3 cp s3://queenofsandiego.com/index.html ./index.html.live
grep "JADA_BOOK_NOW_FADE\|STRIPE_CHECKOUT_V2\|RANCH_COAST_HERO" index.html.live
Your local file must have at least as many tokens. If tokens are missing, abort and investigate.
Why: This caught the exact regression that just happened. Automated, un-ignorable, requires active deletion of the check to bypass.
D8: Escalate to CB if S3 is Ahead of Local
If the diff in D1 shows S3 has features local doesn't, stop. Write a message to CB explaining the delta, wait for a response. Never overwrite forward progress with an old snapshot.
Why: This is the nuclear scenario. It means either (a) another session deployed newer code and didn't pull it locally, or (b) production diverged for a reason. Either way, human judgment is required.
Infrastructure & Deployment Pipeline
S3 Configuration:
- Production:
s3://queenofsandiego.com(public, CloudFront distributionE2ABC123DEFG45) - Staging:
s3://staging.queenofsandiego.com(same distribution, prefix-based routing) - Versioning: Disabled (cost optimization)
- Static website hosting: Enabled, default index
index.html
Deployment Steps (Post-Rules):
- Pull current S3 prod, diff locally (D1)
- Make edits, test locally
- Deploy to staging-only (D2)
- Manual verification on
staging.queenofsandiego.com - Snapshot prod (D5)
- Print proof block (D6)
- Check feature tokens (D7)
- Deploy to prod, invalidate CloudFront cache
- Verify live on
queenofsandiego.com