```html

Debugging a Cascading Deployment Failure: When AI Agents Break Production

This post documents a critical incident where an AI agent (Claude 4.5) was tasked with fixing a booking calendar race condition on sailjada.com but introduced multiple breaking changes across 23 HTML files, requiring emergency rollback and manual remediation.

The Initial Task: A Legitimate Bug Fix

The original issue was straightforward—a race condition in the booking modal on sailjada.com:

  • The jadaOpenBook() function opened the booking calendar modal immediately
  • Availability data was fetched asynchronously in the background
  • Users could interact with an empty/incomplete calendar for 200-500ms before data populated
  • This created UX jank and potential booking errors

The fix required adding an isLoading state flag and deferring modal open until availabilityData was available. This was a reasonable architectural improvement.

What Went Wrong: The Cascade

The agent was instructed to "apply the same fix to all other pages." Instead of verifying the fix was correct first, the agent:

  1. Applied the fix to all 23 HTML files in the sailjada.com repository
  2. Introduced invalid JavaScript by leaving Python format-string escapes in the code
  3. Deployed to staging without syntax validation
  4. Marked the staging deployment as "ready for CB review" without catching the errors

The primary issue: the agent's fix included lines like:

{{ isLoading: false }}  // Invalid JS — looks like Jinja/Mustache templating

This syntax is valid in Python f-strings and Jinja2 templates, but not in JavaScript. The double-braces should have been single curly braces in the object literal.

Root Cause Analysis

Upon investigation, the repository revealed two distinct issues:

  • Legitimate CSS: Many files contain {{ ... }} in CSS media queries and animations—this is valid
  • Invalid JavaScript: The new jadaOpenBook() function used double-braces in object initialization
  • Template Artifacts: Some production files still contained Python placeholders like {STRIPE_LINK}, suggesting a hybrid templating workflow

The agent didn't distinguish between:

  • CSS contexts where {{ }} is legitimate (e.g., @supports (display: {{ var }}))
  • JavaScript contexts where only { } is valid
  • Template-rendered contexts where {{ }} is valid

Emergency Response: Rollback Procedure

Once the issue was identified, we executed a full rollback:

  1. Identified all affected files: Searched for jadaBookingState (the broken variable name) across the repository
    find . -name "*.html" -type f | xargs grep -l "jadaBookingState"
  2. Downloaded production versions from S3: Retrieved the known-good versions from the production S3 bucket
    aws s3 cp s3://sailjada-prod/releases/rc1/index.html ./releases/rc1/index.html
  3. Overwrote all 23 local files: Restored production versions across:
    • /Users/cb/Documents/repos/sites/sailjada.com/index.html
    • /Users/cb/Documents/repos/sites/sailjada.com/releases/rc1/index.html
    • All 21 other affected pages
  4. Verified restoration: Confirmed jadaOpenBook was restored to its original (working) implementation and jadaBookingState was completely removed
  5. Deleted broken staging deployment: Removed https://queenofsandiego.com/_staging/sailjada/ to prevent accidental promotion
  6. Infrastructure Implications

    The incident revealed process gaps:

    • No pre-deployment linting: The HTML/JS files should be validated with eslint or similar before staging
    • No syntax checking in CI/CD: A simple node -c check would have caught the invalid JS
    • Template vs. static HTML confusion: The repo appears to have hybrid Python templating + static HTML, which confused the agent
    • No agent guardrails: The agent should have been restricted from deploying without manual approval

    Production files are stored in S3 buckets (exact bucket names withheld for security), served through CloudFront distribution with Route53 DNS. The staging deployment was to a separate S3 path (_staging/sailjada/) for review before production push.

    What's Next: Process Improvements

    To prevent similar cascades:

    1. Add pre-deployment validation: Integrate ESLint and HTMLHint into the build pipeline
      npm run lint -- "releases/**/*.html"
    2. Require code review before staging: Agent staging deployments should be marked as "awaiting review" and not auto-promoted
    3. Separate template and static files: Store Jinja2/Python templates separately from pure HTML to reduce confusion
    4. Use version control properly: The 12+ edits to index.html without meaningful commit messages made rollback harder to reason about
    5. Establish agent task boundaries: "Apply fix to all files" should trigger a safety prompt requiring explicit per-file approval

    Status: Production is restored and clean. Staging has been purged. The original race condition fix is pending proper implementation with correct syntax and pre-deployment validation.

    ```