Debugging a Cascading Deployment Failure: When AI Agents Break Production

```html

This post documents a critical incident where an AI agent (Claude 4.5) was tasked with fixing a booking calendar race condition on sailjada.com but introduced multiple breaking changes across 23 HTML files, requiring emergency rollback and manual remediation.

The Initial Task: A Legitimate Bug Fix

The original issue was straightforward—a race condition in the booking modal on sailjada.com:

The jadaOpenBook() function opened the booking calendar modal immediately
Availability data was fetched asynchronously in the background
Users could interact with an empty/incomplete calendar for 200-500ms before data populated
This created UX jank and potential booking errors

The fix required adding an isLoading state flag and deferring modal open until availabilityData was available. This was a reasonable architectural improvement.

What Went Wrong: The Cascade

The agent was instructed to "apply the same fix to all other pages." Instead of verifying the fix was correct first, the agent:

Applied the fix to all 23 HTML files in the sailjada.com repository
Introduced invalid JavaScript by leaving Python format-string escapes in the code
Deployed to staging without syntax validation
Marked the staging deployment as "ready for CB review" without catching the errors

The primary issue: the agent's fix included lines like:

{{ isLoading: false }}  // Invalid JS — looks like Jinja/Mustache templating

This syntax is valid in Python f-strings and Jinja2 templates, but not in JavaScript. The double-braces should have been single curly braces in the object literal.

Root Cause Analysis

Upon investigation, the repository revealed two distinct issues:

Legitimate CSS: Many files contain {{ ... }} in CSS media queries and animations—this is valid
Invalid JavaScript: The new jadaOpenBook() function used double-braces in object initialization
Template Artifacts: Some production files still contained Python placeholders like {STRIPE_LINK}, suggesting a hybrid templating workflow

The agent didn't distinguish between:

CSS contexts where {{ }} is legitimate (e.g., @supports (display: {{ var }}))
JavaScript contexts where only { } is valid
Template-rendered contexts where {{ }} is valid

Emergency Response: Rollback Procedure

Once the issue was identified, we executed a full rollback:

Identified all affected files: Searched for jadaBookingState (the broken variable name) across the repository
```
find . -name "*.html" -type f | xargs grep -l "jadaBookingState"
```
Downloaded production versions from S3: Retrieved the known-good versions from the production S3 bucket
```
aws s3 cp s3://sailjada-prod/releases/rc1/index.html ./releases/rc1/index.html
```
Overwrote all 23 local files: Restored production versions across:
- /Users/cb/Documents/repos/sites/sailjada.com/index.html
- /Users/cb/Documents/repos/sites/sailjada.com/releases/rc1/index.html
- All 21 other affected pages
Verified restoration: Confirmed jadaOpenBook was restored to its original (working) implementation and jadaBookingState was completely removed
Deleted broken staging deployment: Removed https://queenofsandiego.com/_staging/sailjada/ to prevent accidental promotion

Infrastructure Implications

The incident revealed process gaps:

No pre-deployment linting: The HTML/JS files should be validated with eslint or similar before staging
No syntax checking in CI/CD: A simple node -c check would have caught the invalid JS
Template vs. static HTML confusion: The repo appears to have hybrid Python templating + static HTML, which confused the agent
No agent guardrails: The agent should have been restricted from deploying without manual approval

Production files are stored in S3 buckets (exact bucket names withheld for security), served through CloudFront distribution with Route53 DNS. The staging deployment was to a separate S3 path (_staging/sailjada/) for review before production push.

What's Next: Process Improvements

To prevent similar cascades:

Add pre-deployment validation: Integrate ESLint and HTMLHint into the build pipeline
```
npm run lint -- "releases/**/*.html"
```
Require code review before staging: Agent staging deployments should be marked as "awaiting review" and not auto-promoted
Separate template and static files: Store Jinja2/Python templates separately from pure HTML to reduce confusion
Use version control properly: The 12+ edits to index.html without meaningful commit messages made rollback harder to reason about
Establish agent task boundaries: "Apply fix to all files" should trigger a safety prompt requiring explicit per-file approval

Status: Production is restored and clean. Staging has been purged. The original race condition fix is pending proper implementation with correct syntax and pre-deployment validation.