Preventing Deployment Regressions: Hard Rules for Multi-Environment S3 + CloudFront Workflows
Over a three-hour development session, a regression incident on queenofsandiego.com wiped three working features by deploying a stale local index.html over a newer S3 production version. This post documents the failure mode, the infrastructure decisions that enabled it, and the hard rules we implemented to prevent it from recurring.
What Happened: The Regression
A prior session had deployed a JADA → BOOK NOW hero crossfade, a Stripe embedded checkout flow, and removed a deprecated "For Ranch & Coast readers..." hero text. All three features lived in a 3,650-line index.html file in production S3.
The regression session:
- Edited
index.htmllocally without pulling the current S3 version first - Deployed to both
stagingandprodenvironments in a single command (violating the staging-first rule) - Overwrote production with local state that was hours old, restoring the deleted hero line and removing the working Stripe checkout
- Had prior session-summary warnings in memory that explicitly flagged "stale local files" but did not re-read them
The immediate impact: booking flow broke, and the deprecated marketing copy reappeared.
Why This Was Possible: Infrastructure Design
The sailjada.com repo uses this pattern for queenofsandiego.com:
/Users/cb/Documents/repos/sites/sailjada.com/
├── index.html (working copy, local)
├── staging/ (S3 bucket: staging.queenofsandiego.com)
└── prod/ (S3 bucket: queenofsandiego.com)
Both S3 buckets sit behind CloudFront distributions:
- Distribution ID for prod: serves
queenofsandiego.comwith cache TTL 3600s - Distribution ID for staging: serves
staging.queenofsandiego.comwith cache TTL 300s
The deployment flow was manual: edit local index.html, then cp index.html staging/index.html && aws s3 cp staging/ s3://staging.queenofsandiego.com/ for staging, followed by cp index.html prod/index.html && aws s3 cp prod/ s3://queenofsandiego.com/ for production.
The weakness: no automated diff-check, no pull-before-edit step, and no file-level tracking of which features live in which lines. When a local file drifted from S3 (either through git merges, concurrent edits, or cache), deploying local state could silently revert remote changes.
Root Cause: Assumptions About Local State
The deployment scripts assumed the local index.html was always current. It wasn't. In a multi-session workflow, this breaks because:
- Session A deploys to production, leaving local unchanged (or with an older version)
- Session B edits local without comparing to S3 first
- Session B's local copy clobbers Session A's production changes
- No versioning on S3 means no rollback path
- CloudFront cache may serve stale content for 1–60 minutes, masking the damage initially
The Fix: Eight Hard Rules (D1–D8)
We codified these rules into /Users/cb/Documents/repos/sites/queenofsandiego.com/CLAUDE.md to auto-load at the start of every session:
D1: Pull-before-edit rule. Before any edit to index.html, run:
aws s3 cp s3://queenofsandiego.com/index.html index.html.prod-current
Then diff local against prod-current. If they differ, investigate why before proceeding.
D2: Staging-only single-target deploys. Never deploy to both staging and prod in one command. Enforce the order:
# First: staging only
aws s3 cp index.html s3://staging.queenofsandiego.com/index.html
aws cloudfront create-invalidation --distribution-id [STAGING_DIST_ID] --paths "/*"
# Wait for CloudFront cache clear (~30s)
# PAUSE: CB reviews staging.queenofsandiego.com
# Then: prod only (separate command)
aws s3 cp index.html s3://queenofsandiego.com/index.html
aws cloudfront create-invalidation --distribution-id [PROD_DIST_ID] --paths "/*"
D3: One logical change per deployment. If you edit the Stripe checkout flow, don't also edit hero text or email templates in the same commit. This keeps diffs reviewable and rollbacks surgical.
D4: Obey prior session warnings. Every session summary flags known risks (stale files, cache state, pending questions). Re-read them. Don't assume they were resolved.
D5: Snapshot prod before overwriting. Add to the deployment checklist:
aws s3 cp s3://queenofsandiego.com/index.html backups/index.html.prod.$(date +%s)
No S3 versioning is enabled, so this is the only rollback path if local state was corrupt.
D6: Proof block before deploy. Print a six-line block showing what changed, and have CB visually confirm before the cp command runs:
--- Deploying to STAGING ---
File: index.html (lines 240–310: Stripe Session ID fetch logic)
Old hash: abc123...
New hash: def456...
Size change: +48 bytes
Features affected: booking_flow, payment_verify
[CB: type YES to proceed]
D7: Feature-token registry. Maintain a FEATURE_TOKENS.md listing which lines implement which features (e.g., "Stripe checkout: lines 285–310", "Hero JADA crossfade: lines 45–72"). Before deploying, grep the new file to confirm tokens are present. If a token vanishes, abort and investigate.
D8: Escalate when S3 is ahead. If prod-current differs from local and the prod version is newer, stop and ask CB before overwriting. Don't assume local is right.
Implementation in the Codebase
These rules are version-controlled in two places:
/Users/cb/Documents/repos/sites/queenofsandiego.com/CLAUDE.md— site-specific, loaded automatically each session/Users/cb/Documents/repos/CLAUDE.md— top-level pointer for engineers working on other sites, linking to the full ruleset
A KEELY_BUILD_RUNBOOK.md was added to embed these checks into the Keely referral flow deployment, which touches the same