```html

Incident Recovery: Preventing S3 Deployment Regressions with Pre-Flight Validation and Hard Rules

What Happened

During a recent development session, a deployment of queenofsandiego.com/index.html to S3 inadvertently reverted three working features that had been removed or replaced in production:

  • The JADA → BOOK NOW hero crossfade animation
  • The Stripe embedded checkout booking flow
  • A previously-deleted "For Ranch & Coast readers..." hero line

The root cause: deploying a stale local copy of index.html over a newer version already in S3 production. The local file was several commits behind, and no diff or validation occurred before the cp command executed.

Secondary failure: both staging and prod CloudFront distributions were invalidated in the same command, violating the staging-first approval pattern that was already documented.

Technical Details: The Failure Chain

The deployment workflow lacked three critical gates:

  • Pre-flight S3 state snapshot: No pull of the current production file before local editing began. S3 versioning was not enabled on the bucket.
  • Diff validation before deploy: No comparison of local file against S3 current state. The developer relied on local commit history, which was incomplete relative to what was already live.
  • Single-target enforcement: CloudFront invalidation targeted both staging and production simultaneously, bypassing the human review gate on staging.

The file in question—/Users/cb/Documents/repos/sites/queenofsandiego.com/index.html (3,650 lines)—contains both HTML structure and inline JavaScript for booking logic. Changes to Stripe integration, hero animations, or CMS content can easily be lost if the wrong version is committed or deployed.

Infrastructure Context

The Queen of San Diego site uses a standard S3 + CloudFront + Route53 architecture:

  • S3 bucket: queenofsandiego.com (content origin)
  • CloudFront staging distribution: d...staging.cloudfront.net (internal review, cache TTL 60s)
  • CloudFront production distribution: d...prod.cloudfront.net (public, cache TTL 3600s)
  • Route53 zones: Two CNAME records point staging.queenofsandiego.com and queenofsandiego.com to their respective distributions

No S3 object versioning, no lifecycle policies, and no pre-deploy snapshots meant once the stale file was copied, recovery required manual intervention to restore from git history and re-validate against live data.

Solution: Eight Hard Rules (D1–D8)

To prevent this category of failure, we encoded eight mandatory rules into the project's Claude instructions file at /Users/cb/Documents/repos/sites/queenofsandiego.com/CLAUDE.md. These auto-load on every QOS-related session:

  • D1 — Pull S3 current state before any edit: Before modifying index.html, fetch the production version and diff it against local. This reveals what's actually live vs. what's in git.
  • D2 — Staging-only single-target deploys: Each aws s3 cp or CloudFront invalidation must target only one distribution. Staging gets deployed and reviewed first; prod is promoted separately after approval.
  • D3 — One logical change per deployment: Do not batch unrelated feature changes. If a single deploy breaks, rollback scope is clear.
  • D4 — Obey prior session summaries: If a previous session documented a known risk (e.g., "stale local files"), escalate rather than proceed.
  • D5 — Snapshot S3 before overwrite: Print the MD5 or first 200 bytes of the file being replaced to chat. Keep a timestamped backup in a backups/ prefix within the bucket.
  • D6 — Proof block before any cp: Print a six-line summary showing: file path, old version (via aws s3api head-object LastModified), new version (local git sha), diff summary, target distribution(s), and who approved it.
  • D7 — Feature-token registry: Maintain a comment-based registry in the file listing all active features (e.g., // FEATURE: hero-crossfade v1.2 [2025-01-15]). After any deploy, grep the live S3 object to confirm tokens are present.
  • D8 — Escalate when S3 is ahead: If S3 production is newer than local, stop. Ask the human for the latest commit, rebase, or fetch the S3 version explicitly before proceeding.

Deployment Command Example (Post-Rules)

The new workflow for deploying a single-file fix to staging:

# 1. Fetch prod current state
aws s3api head-object \
  --bucket queenofsandiego.com \
  --key index.html

# 2. Diff local against S3
aws s3 cp s3://queenofsandiego.com/index.html /tmp/s3-index.html
diff -u /tmp/s3-index.html \
  /Users/cb/Documents/repos/sites/queenofsandiego.com/index.html \
  | head -50

# 3. Print proof block (example output)
# ─────────────────────────────────────────────────
# FILE:     queenofsandiego.com/index.html
# OLD:      2025-01-15T14:22:33Z (MD5: abc123...)
# NEW:      local commit d7e9f1c (2025-01-15T15:10:00Z)
# DIFF:     Stripe Session init + hero fade CSS only (18 lines changed)
# TARGET:   staging.queenofsandiego.com (d...staging.cloudfront.net)
# APPROVED: CB via Slack [timestamp]
# ─────────────────────────────────────────────────

# 4. Deploy to staging only
aws s3 cp \
  /Users/cb/Documents/repos/sites/queenofsandiego.com/index.html \
  s3://queenofsandiego.com/index.html \
  --metadata "deployed-by=claude,deployed-at=2025-01-15T15:11:00Z,change-id=hero-fade-stripe-fix"

# 5. Invalidate staging distribution only
aws cloudfront create-invalidation \
  --distribution-id d...staging \
  --paths "/index.html"

# 6. Verify feature tokens on staging
curl -s https://staging.queenofsandiego.com/index.html \
  | grep -o "FEATURE: [^/]*" | head -5

Key Decisions

  • Rules in CLAUDE.md, not in code: Hard constraints live in the project's instruction file so they load automatically. The instructions themselves become the source of truth.
  • Proof blocks over checklists: A