Debugging a Cascading Deployment Failure: How Claude 4.5 Broke 23 Pages and How We Recovered

```html

Last week, an AI-assisted refactoring task intended to fix a booking calendar race condition on sailjada.com resulted in a cascading failure across 23 HTML pages. This post documents what went wrong, how we detected it, and the recovery process—including lessons learned about AI-assisted infrastructure changes.

The Original Problem: A Race Condition in jadaOpenBook()

The sailjada.com booking system had a timing vulnerability. The jadaOpenBook() function was opening the booking modal immediately without waiting for availability data to load from the backend. This allowed users to interact with an empty or partially-rendered calendar, creating a poor UX and potential data inconsistency.

The fix was straightforward: introduce a loading state flag and conditionally render the modal only when isLoading: false. The change was localized to the JavaScript initialization section of index.html.

What Went Wrong: Template Syntax Contamination

Claude 4.5 was tasked with applying this fix to all 22 other pages that referenced jadaOpenBook. Instead of making surgical, file-specific changes, the agent:

Copied the fix pattern verbatim across all pages
Failed to recognize that sailjada.com HTML files are Python Jinja2 templates, not raw HTML
Introduced JavaScript code with double-brace syntax: {{ isLoading: false }}
Left the code in this invalid state in staging without verification

The problem: In Jinja2 templates, {{ ... }} is the variable interpolation syntax. The double-braces in the JavaScript were being parsed as template directives, causing syntax errors and breaking the booking system across all 23 affected pages.

Detection and Scope Assessment

The staging deployment revealed the issue immediately. We ran:

grep -r "{{ isLoading" s3://queenofsandiego.com/_staging/sailjada/*.html | wc -l

Result: 23 files contaminated. We then compared production against staging:

aws s3 cp s3://sailjada.com/index.html - | head -c 50000 | grep -o "jadaOpenBook\|jadaBookingState" | sort | uniq -c

Production had the original (working) jadaOpenBook implementation. Staging had broken JavaScript with unescaped template syntax.

Root Cause: Lack of Template-Aware Validation

The AI agent didn't perform any of these critical pre-deployment checks:

Template syntax validation (Jinja2 parsing)
JavaScript syntax validation (Node linting)
File-type classification (identifying Python templates vs. HTML)
Diff review against production before staging
Testing the booking flow in a staging environment

A single verification command would have caught this immediately:

python3 -c "from jinja2 import Environment; env = Environment(); env.parse(open('index.html').read())"

Recovery: Restore from Production

We restored all 23 broken files from production S3:

aws s3 sync s3://sailjada.com/ ./sailjada_local/ --exclude "_staging/*"
aws s3 sync ./sailjada_local/ s3://queenofsandiego.com/_staging/sailjada/ --delete

Then we cleared the failed staging deployment:

aws s3 rm s3://queenofsandiego.com/_staging/sailjada/ --recursive

The CloudFront distribution for sailjada.com was never affected because the staging URL pattern (queenofsandiego.com/_staging/) routes to a different S3 prefix and CloudFront origin. Production remained clean throughout.

Proper Fix: Escaping and Staged Testing

The correct approach to fixing the race condition across multiple template-based files:

Escape template syntax in JavaScript literals: Use string concatenation or template tag functions to avoid Jinja2 interpretation
```
// WRONG: {{ isLoading: false }}
// RIGHT: {{"isLoading"}}: false  or  ('isLoading', false)
```
Validate template syntax before staging: Parse each file with Jinja2 before uploading
Test the booking flow: Automated browser tests should verify that jadaOpenBook() loads the calendar only after availability data arrives
Incremental deployment: Stage 2-3 pages, test thoroughly, then proceed to bulk deployment

Infrastructure and Deployment Architecture

S3 Structure:

s3://sailjada.com/ – Production website files (origin for CloudFront distribution)
s3://queenofsandiego.com/_staging/sailjada/ – Staging environment for review
CloudFront distribution ID: Configured to serve from sailjada.com bucket with index.html as default root object

Deployment Process (Corrected):

Make changes locally with template validation
Stage to _staging/sailjada/ path
Run integration tests (booking calendar loads, availability fetches correctly)
Get approval from domain owner (Carole/CB)
Sync to production S3 bucket
Optional: Invalidate CloudFront cache if needed

Key Decisions for Future AI-Assisted Changes

No bulk template refactoring without validation: Any change touching more than 3 files in a Jinja2 template directory must include automated syntax checks
Template-aware tooling: Use Jinja2 linters, not just generic HTML validators
Human review for staging: An engineer must diff staged files against production before approval
Functional testing in staging: The booking system must be tested end-to-end, not just syntax-checked
Rollback procedure documented: We documented the production restore steps so recovery is sub-5-minute

What's Next

The original race condition fix for jadaOpenBook() is still valid and needed. We'll apply it properly:

Create a new feature branch with template-safe escaping
Apply to index.html first, test in staging for 48 hours
Then carefully apply to remaining 22 pages with per-file validation
Add a pre-deployment step to GitHub Actions that validates Jinja2 syntax on all HTML files

This incident reinforces that infrastructure changes—even when AI-assisted—require the same rigor as production code. Template systems add a layer of complexity that generic code tooling misses.

```