Preventing S3 Deployment Regressions: A Case Study in State Management and Deployment Safety
Yesterday, a routine deployment to queenofsandiego.com silently reverted three production features by deploying a stale local index.html over a newer S3 version. The hero crossfade animation, Stripe embedded checkout, and previously-deleted content all disappeared. This post documents the root cause, the detection process, and the systematic rules we've implemented to prevent similar regressions across our deployment pipeline.
What Went Wrong
The failure pattern was:
- Local working directory had an outdated
index.htmlsnapshot - S3 production bucket (sailjada-prod-site in us-west-2) already contained a newer version with the hero fade and Stripe checkout
- A deployment command pushed the stale local file, overwriting the production version
- No pre-deployment diff was run, so the regression went undetected until manual testing caught it
- The deployment violated its own prior session notes warning about stale local files
The specific file path involved: /Users/cb/Documents/repos/sites/queenofsandiego.com/index.html (3,650 lines, production-critical).
Root Cause Analysis
This wasn't a tooling failure—it was a state management gap. The deployment process assumed local files were the source of truth, but in a multi-session environment where different agents modify S3 directly, that assumption breaks. We had three data sources in potential conflict:
- Local filesystem: The developer's working copy (potentially stale)
- S3 production: The live version serving customers (canonical, but not version-controlled)
- Git history: The committed baseline (useful for blame, not current state)
The agent that performed the deployment had received a session-summary warning from the previous session explicitly noting "stale local files risk S3 overwrite," but didn't escalate the conflict before proceeding. This is a classic information-in-context problem—the warning existed, but wasn't enforced at decision points.
Technical Safeguards Implemented
We've codified eight hard deployment rules into /Users/cb/Documents/repos/sites/queenofsandiego.com/CLAUDE.md, automatically loaded at the start of every session:
- D1 (Pull-and-diff before edit): Before modifying any S3-deployed file locally, fetch the current S3 version and diff it against local. This is the critical early-warning gate.
# Example workflow (no credentials shown) aws s3 cp s3://sailjada-prod-site/index.html ./index.html.s3-current diff -u index.html index.html.s3-current | head -50 - D2 (Staging-only single-target deploys): Never deploy to
stagingandprodin the same command. Always promote staging → prod as a separate, deliberate step.# Correct: Two commands, with review in between aws s3 cp index.html s3://sailjada-staging-site/index.html # [review on staging.sailjada.com] aws s3 cp index.html s3://sailjada-prod-site/index.html - D3 (One file per logical change): Each deployment targets a single file per feature. This prevents accidental overwrites of unrelated, up-to-date content.
- D4 (Obey prior session warnings): If a prior session summary warns about a specific risk (stale files, S3 drift, etc.), escalate to the user instead of proceeding.
- D5 (Pre-deployment snapshot): Before overwriting any production file, save a timestamped copy to an archive directory. S3 versioning is not enabled on these buckets, so local snapshots are the recovery mechanism.
mkdir -p ./s3-snapshots/$(date +%Y%m%d_%H%M%S) aws s3 cp s3://sailjada-prod-site/index.html ./s3-snapshots/$(date +%Y%m%d_%H%M%S)/index.html - D6 (Six-line proof block): Before any
cporaws s3command that touches production, print a summary block showing: source file, target bucket, size comparison, feature tokens affected, and timestamp. This creates a human checkpoint. - D7 (Feature-token registry): Maintain a grep-able list of key features and their implementation details (CSS class names, function signatures, HTML IDs). Before deploying, verify that tokens for all in-scope features are present in the new version.
# Example registry entry for hero fade: # Feature: JADA → BOOK NOW crossfade # Tokens: .hero-fade, fadeInOnScroll(), data-fade-trigger # S3 Location: sailjada-prod-site/index.html lines 842–889 # Last verified: [timestamp] - D8 (Escalate on S3 ahead of local): If S3 is ahead of local—i.e., the remote file has changes not present locally—stop and ask the user whether to pull, merge, or abort.
Infrastructure Changes
No infrastructure changes were required. The buckets and CloudFront distribution remain:
- S3 buckets:
sailjada-staging-site(us-west-2),sailjada-prod-site(us-west-2) - CloudFront: Distribution ID
E1ABCDEF2GHIJKL(invalidated on prod deploys; staging uses direct S3 access) - Route53: sailjada.com zone, A record points to CloudFront
The only new persistent artifact is the CLAUDE.md file itself, which lives in git and auto-loads on every session, ensuring these rules propagate across all agents and sessions.
Why This Approach
We chose constraint-based rules (D1–D8) over automated tooling because:
- Auditability: Each rule is explicit and checkable. We can ask "was D6 followed?" and verify the answer in chat history.
- Human decision-making: Rules create pause points where a human (or escalation protocol) can review state before irreversible action.
- Portability: The rules don't depend on CI/CD tooling, custom deployment scripts, or cloud-specific SDKs. They work with standard AWS CLI and diff.
- Teachability: New team members and agents can read CLAUDE.md in a few minutes and understand the discipline required.
We deliberately did not implement:
- Automatic S3 versioning (not cost-justified for static sites)
- Pre-deployment test suite (complex to maintain; manual review is acceptable for low-velocity changes)
- Deployment approval queue (adds latency; escalation-to-user on