Injecting Structured Data at Scale: Automating JSON-LD Deployment Across 12 Concert Event Pages
The Problem
Concert event pages across our event subdomain network (paulsimonradyshell.com, etc.) were missing critical structured data markup. Without proper JSON-LD for Event and LocalBusiness schemas, search engines couldn't understand:
- Event dates, times, and ticketing information
- Venue details and geographic relevance
- Rich snippet eligibility for search results
- Knowledge graph integration opportunities
This visibility gap meant potential attendees couldn't find events through Google's rich results, and we were leaving conversion opportunities on the table. The challenge: manually editing 12+ pages across multiple subdomains and S3 buckets with CloudFront distribution invalidation complexity.
Solution Architecture: Automated Injection Pipeline
We built a three-stage deployment pipeline: local injection → S3 sync → CloudFront invalidation.
Stage 1: Structured Data Injection Script
Created /Users/cb/Documents/repos/tools/inject_structured_data.py — a Python utility that:
- Scans all concert event pages for missing structured data
- Parses existing page metadata (title, description, dates from HTML)
- Generates valid JSON-LD Event and LocalBusiness schemas
- Injects markup into the
<head>section before closing tag - Preserves existing markup to avoid duplication
The script followed this logic:
1. Iterate through all event page files
2. Check for existing structured data (Event schema)
3. If missing:
- Extract event date/time from filename or content
- Extract venue location from page title
- Build JSON-LD object with proper schema.org properties
- Insert before </head> tag
4. Log changes for verification
Key design decision: We chose JSON-LD over Microdata or RDFa because:
- Search engines (especially Google) prioritize JSON-LD for structured data
- No DOM pollution — markup lives in
<head>separate from content - Easier to version control and audit in source files
- Works seamlessly with CloudFront caching (static asset)
Stage 2: Multi-Bucket S3 Deployment
Event subdomains are distributed across separate S3 buckets for isolation and performance:
paulsimonradyshell.combucket: 4 event pagessailjada.queenofsandiego.combucket: 3 event pages- Additional event subdomain buckets: 5 pages combined
We synced updated pages using AWS CLI with selective targeting:
aws s3 sync ./updated-pages s3://paulsimonradyshell-events/ \
--include "*.html" \
--exclude "*.map" \
--delete
The --delete flag ensures stale versions don't persist. CloudFront caching bypasses this concern for users, but origin consistency matters for direct S3 access and future audits.
Stage 3: CloudFront Invalidation
After S3 deployment, we invalidated CloudFront distribution caches to force edge servers to fetch updated content immediately:
aws cloudfront create-invalidation \
--distribution-id E1A2B3C4D5E6F7 \
--paths "/*.html" "/events/*"
We tracked CloudFront distribution IDs for each subdomain:
- paulsimonradyshell.com: Distribution ID [recorded in deployment log]
- sailjada.queenofsandiego.com: Distribution ID [recorded in deployment log]
- Other event subdomains: IDs retrieved via AWS CLI query
Invalidation strategy: We used wildcard paths (/*.html) rather than individual file paths because:
- Simpler to execute and audit
- Prevents edge case misses where filename patterns change
- CloudFront invalidation costs are the same regardless of path specificity
- Propagates across all edge locations within minutes
Technical Implementation Details
Schema Structure: Event + LocalBusiness Combo
Each injected block includes two complementary schemas:
<script type="application/ld+json">
{
"@context": "https://schema.org",
"@type": "Event",
"name": "Paul Simon at Rady Shell",
"description": "Live performance...",
"startDate": "2024-05-15T19:30:00-07:00",
"endDate": "2024-05-15T22:00:00-07:00",
"venue": {
"@type": "Place",
"name": "Rady Shell at Jacobs Park",
"address": {
"@type": "PostalAddress",
"streetAddress": "2901 Park Boulevard",
"addressLocality": "San Diego",
"addressRegion": "CA",
"postalCode": "92103"
}
},
"offers": {
"@type": "Offer",
"url": "https://example.com/tickets",
"priceCurrency": "USD",
"price": "49.00"
}
}
</script>
We included ISO 8601 timestamps with timezone offsets because Google's Event rich results require machine-readable dates. The venue nested Place object enables local search integration and knowledge graph enrichment.
File Organization
Updated pages were organized by source bucket before deployment:
/repos/sites/queenofsandiego.com/rady-shell-events/— Primary event page templates/repos/sites/sailjada.queenofsandiego.com/— Subdomain event pages/tmp/updated-pages/— Staging directory before S3 sync
Key Decisions & Trade-Offs
Why inject at the page level rather than via rendering templates? Our event pages are built via multiple rendering pipelines (render_event_sites.py, static HTML). Centralizing schema injection at the source would require coordinating changes across multiple codebases. Page-level injection provides immediate impact without touching render logic, reducing regression risk.
Why JSON-LD in <head> rather than inline in content? Head placement ensures structured data loads before the DOM renders, improving parser performance. It also keeps markup auditable separate from content, critical for large-scale deployments where humans verify changes.
Why CloudFront invalidation over TTL adjustment? Event pages have long TTLs (24–48