Injecting Structured Data at Scale: Automating JSON-LD Deployment Across 12 Concert Event Pages

```html

The Problem

Concert event pages across our event subdomain network (paulsimonradyshell.com, etc.) were missing critical structured data markup. Without proper JSON-LD for Event and LocalBusiness schemas, search engines couldn't understand:

Event dates, times, and ticketing information
Venue details and geographic relevance
Rich snippet eligibility for search results
Knowledge graph integration opportunities

This visibility gap meant potential attendees couldn't find events through Google's rich results, and we were leaving conversion opportunities on the table. The challenge: manually editing 12+ pages across multiple subdomains and S3 buckets with CloudFront distribution invalidation complexity.

Solution Architecture: Automated Injection Pipeline

We built a three-stage deployment pipeline: local injection → S3 sync → CloudFront invalidation.

Stage 1: Structured Data Injection Script

Created /Users/cb/Documents/repos/tools/inject_structured_data.py — a Python utility that:

Scans all concert event pages for missing structured data
Parses existing page metadata (title, description, dates from HTML)
Generates valid JSON-LD Event and LocalBusiness schemas
Injects markup into the <head> section before closing tag
Preserves existing markup to avoid duplication

The script followed this logic:

1. Iterate through all event page files
2. Check for existing structured data (Event schema)
3. If missing:
   - Extract event date/time from filename or content
   - Extract venue location from page title
   - Build JSON-LD object with proper schema.org properties
   - Insert before </head> tag
4. Log changes for verification

Key design decision: We chose JSON-LD over Microdata or RDFa because:

Search engines (especially Google) prioritize JSON-LD for structured data
No DOM pollution — markup lives in <head> separate from content
Easier to version control and audit in source files
Works seamlessly with CloudFront caching (static asset)

Stage 2: Multi-Bucket S3 Deployment

Event subdomains are distributed across separate S3 buckets for isolation and performance:

paulsimonradyshell.com bucket: 4 event pages
sailjada.queenofsandiego.com bucket: 3 event pages
Additional event subdomain buckets: 5 pages combined

We synced updated pages using AWS CLI with selective targeting:

aws s3 sync ./updated-pages s3://paulsimonradyshell-events/ \
  --include "*.html" \
  --exclude "*.map" \
  --delete

The --delete flag ensures stale versions don't persist. CloudFront caching bypasses this concern for users, but origin consistency matters for direct S3 access and future audits.

Stage 3: CloudFront Invalidation

After S3 deployment, we invalidated CloudFront distribution caches to force edge servers to fetch updated content immediately:

aws cloudfront create-invalidation \
  --distribution-id E1A2B3C4D5E6F7 \
  --paths "/*.html" "/events/*"

We tracked CloudFront distribution IDs for each subdomain:

paulsimonradyshell.com: Distribution ID [recorded in deployment log]
sailjada.queenofsandiego.com: Distribution ID [recorded in deployment log]
Other event subdomains: IDs retrieved via AWS CLI query

Invalidation strategy: We used wildcard paths (/*.html) rather than individual file paths because:

Simpler to execute and audit
Prevents edge case misses where filename patterns change
CloudFront invalidation costs are the same regardless of path specificity
Propagates across all edge locations within minutes

Technical Implementation Details

Schema Structure: Event + LocalBusiness Combo

Each injected block includes two complementary schemas:

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "Event",
  "name": "Paul Simon at Rady Shell",
  "description": "Live performance...",
  "startDate": "2024-05-15T19:30:00-07:00",
  "endDate": "2024-05-15T22:00:00-07:00",
  "venue": {
    "@type": "Place",
    "name": "Rady Shell at Jacobs Park",
    "address": {
      "@type": "PostalAddress",
      "streetAddress": "2901 Park Boulevard",
      "addressLocality": "San Diego",
      "addressRegion": "CA",
      "postalCode": "92103"
    }
  },
  "offers": {
    "@type": "Offer",
    "url": "https://example.com/tickets",
    "priceCurrency": "USD",
    "price": "49.00"
  }
}
</script>

We included ISO 8601 timestamps with timezone offsets because Google's Event rich results require machine-readable dates. The venue nested Place object enables local search integration and knowledge graph enrichment.

File Organization

Updated pages were organized by source bucket before deployment:

/repos/sites/queenofsandiego.com/rady-shell-events/ — Primary event page templates
/repos/sites/sailjada.queenofsandiego.com/ — Subdomain event pages
/tmp/updated-pages/ — Staging directory before S3 sync

Key Decisions & Trade-Offs

Why inject at the page level rather than via rendering templates? Our event pages are built via multiple rendering pipelines (render_event_sites.py, static HTML). Centralizing schema injection at the source would require coordinating changes across multiple codebases. Page-level injection provides immediate impact without touching render logic, reducing regression risk.

Why JSON-LD in <head> rather than inline in content? Head placement ensures structured data loads before the DOM renders, improving parser performance. It also keeps markup auditable separate from content, critical for large-scale deployments where humans verify changes.

Why CloudFront invalidation over TTL adjustment? Event pages have long TTLs (24–48