```html

Injecting Structured Data at Scale: Automating JSON-LD Deployment Across 12 Concert Event Pages

The Problem

Concert event pages across our event subdomain network (paulsimonradyshell.com, etc.) were missing critical structured data markup. Without proper JSON-LD for Event and LocalBusiness schemas, search engines couldn't understand:

  • Event dates, times, and ticketing information
  • Venue details and geographic relevance
  • Rich snippet eligibility for search results
  • Knowledge graph integration opportunities

This visibility gap meant potential attendees couldn't find events through Google's rich results, and we were leaving conversion opportunities on the table. The challenge: manually editing 12+ pages across multiple subdomains and S3 buckets with CloudFront distribution invalidation complexity.

Solution Architecture: Automated Injection Pipeline

We built a three-stage deployment pipeline: local injection → S3 sync → CloudFront invalidation.

Stage 1: Structured Data Injection Script

Created /Users/cb/Documents/repos/tools/inject_structured_data.py — a Python utility that:

  • Scans all concert event pages for missing structured data
  • Parses existing page metadata (title, description, dates from HTML)
  • Generates valid JSON-LD Event and LocalBusiness schemas
  • Injects markup into the <head> section before closing tag
  • Preserves existing markup to avoid duplication

The script followed this logic:

1. Iterate through all event page files
2. Check for existing structured data (Event schema)
3. If missing:
   - Extract event date/time from filename or content
   - Extract venue location from page title
   - Build JSON-LD object with proper schema.org properties
   - Insert before </head> tag
4. Log changes for verification

Key design decision: We chose JSON-LD over Microdata or RDFa because:

  • Search engines (especially Google) prioritize JSON-LD for structured data
  • No DOM pollution — markup lives in <head> separate from content
  • Easier to version control and audit in source files
  • Works seamlessly with CloudFront caching (static asset)

Stage 2: Multi-Bucket S3 Deployment

Event subdomains are distributed across separate S3 buckets for isolation and performance:

  • paulsimonradyshell.com bucket: 4 event pages
  • sailjada.queenofsandiego.com bucket: 3 event pages
  • Additional event subdomain buckets: 5 pages combined

We synced updated pages using AWS CLI with selective targeting:

aws s3 sync ./updated-pages s3://paulsimonradyshell-events/ \
  --include "*.html" \
  --exclude "*.map" \
  --delete

The --delete flag ensures stale versions don't persist. CloudFront caching bypasses this concern for users, but origin consistency matters for direct S3 access and future audits.

Stage 3: CloudFront Invalidation

After S3 deployment, we invalidated CloudFront distribution caches to force edge servers to fetch updated content immediately:

aws cloudfront create-invalidation \
  --distribution-id E1A2B3C4D5E6F7 \
  --paths "/*.html" "/events/*"

We tracked CloudFront distribution IDs for each subdomain:

  • paulsimonradyshell.com: Distribution ID [recorded in deployment log]
  • sailjada.queenofsandiego.com: Distribution ID [recorded in deployment log]
  • Other event subdomains: IDs retrieved via AWS CLI query

Invalidation strategy: We used wildcard paths (/*.html) rather than individual file paths because:

  • Simpler to execute and audit
  • Prevents edge case misses where filename patterns change
  • CloudFront invalidation costs are the same regardless of path specificity
  • Propagates across all edge locations within minutes

Technical Implementation Details

Schema Structure: Event + LocalBusiness Combo

Each injected block includes two complementary schemas:

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "Event",
  "name": "Paul Simon at Rady Shell",
  "description": "Live performance...",
  "startDate": "2024-05-15T19:30:00-07:00",
  "endDate": "2024-05-15T22:00:00-07:00",
  "venue": {
    "@type": "Place",
    "name": "Rady Shell at Jacobs Park",
    "address": {
      "@type": "PostalAddress",
      "streetAddress": "2901 Park Boulevard",
      "addressLocality": "San Diego",
      "addressRegion": "CA",
      "postalCode": "92103"
    }
  },
  "offers": {
    "@type": "Offer",
    "url": "https://example.com/tickets",
    "priceCurrency": "USD",
    "price": "49.00"
  }
}
</script>

We included ISO 8601 timestamps with timezone offsets because Google's Event rich results require machine-readable dates. The venue nested Place object enables local search integration and knowledge graph enrichment.

File Organization

Updated pages were organized by source bucket before deployment:

  • /repos/sites/queenofsandiego.com/rady-shell-events/ — Primary event page templates
  • /repos/sites/sailjada.queenofsandiego.com/ — Subdomain event pages
  • /tmp/updated-pages/ — Staging directory before S3 sync

Key Decisions & Trade-Offs

Why inject at the page level rather than via rendering templates? Our event pages are built via multiple rendering pipelines (render_event_sites.py, static HTML). Centralizing schema injection at the source would require coordinating changes across multiple codebases. Page-level injection provides immediate impact without touching render logic, reducing regression risk.

Why JSON-LD in <head> rather than inline in content? Head placement ensures structured data loads before the DOM renders, improving parser performance. It also keeps markup auditable separate from content, critical for large-scale deployments where humans verify changes.

Why CloudFront invalidation over TTL adjustment? Event pages have long TTLs (24–48