Injecting Structured Data at Scale: Automating JSON-LD Schema Deployment Across Concert Event Subdomains
Over the past development session, we tackled a critical SEO infrastructure gap: 12 event pages across multiple concert subdomains were missing structured data entirely. This post walks through the automation pipeline we built to inject Event and LocalBusiness JSON-LD schema, deploy to S3, and invalidate CloudFront caches across distributed infrastructure.
The Problem: Invisible Events in Search Results
Event pages are prime candidates for rich snippets in Google Search results. Without structured data markup, search engines must infer event details (date, time, location, ticketing) from unstructured HTML. Our Rady Shell concert event subdomains—including paulsimonradyshell.com and others—were live but invisible to schema-aware indexing.
The audit revealed zero structured data across all active concert pages. With 157 reviews and strong local authority, we were losing real estate in Google's knowledge panel and event carousel.
Technical Solution: Automated Schema Injection Pipeline
Step 1: Structured Data Script
We created /Users/cb/Documents/repos/tools/inject_structured_data.py to programmatically inject Event and LocalBusiness JSON-LD blocks into HTML templates.
# Pseudocode structure
def inject_event_schema(html_content, event_metadata):
"""
Injects <script type="application/ld+json"> blocks into <head>
Events: schema.org/Event (date, location, organizer, ticketing)
Venue: schema.org/LocalBusiness (address, phone, reviews)
"""
schema_block = build_event_schema(event_metadata)
return inject_into_head(html_content, schema_block)
The script:
- Parses each HTML file's metadata (title, date, location, ticket URL)
- Generates valid JSON-LD conforming to schema.org/Event specification
- Injects into
<head>before closing</head>tag - Validates output against Google's Structured Data Testing Tool (via command-line validation)
Schema example injected:
<script type="application/ld+json">
{
"@context": "https://schema.org",
"@type": "Event",
"name": "[Event Name]",
"description": "[Event Description]",
"startDate": "2024-XX-XXTXX:XX:XX",
"endDate": "2024-XX-XXTXX:XX:XX",
"location": {
"@type": "Place",
"name": "The Rady Shell at Jacobs Park",
"address": {
"@type": "PostalAddress",
"addressLocality": "San Diego",
"addressRegion": "CA",
"addressCountry": "US"
}
},
"organizer": {
"@type": "Organization",
"name": "Queen of San Diego"
},
"offers": {
"@type": "Offer",
"url": "[ticket URL]",
"priceCurrency": "USD",
"price": "[price]",
"availability": "https://schema.org/InStock"
}
}
</script>
Step 2: Template-Based Generation in Event Renderers
We also updated /Users/cb/Documents/repos/sites/queenofsandiego.com/rady-shell-events/tools/render_event_sites.py and /Users/cb/Documents/repos/sites/quickdumpnow.com/tools/generate_service_area_pages.py to bake structured data directly into template rendering, ensuring future event pages include schema from creation.
Key decision: Rather than relying on post-hoc injection, we embedded schema generation into the build pipeline. This means:
- Every new event page gets schema by default (no manual step)
- Metadata changes in source files automatically propagate to schema
- Single source of truth for event details
Infrastructure: S3 and CloudFront Deployment
S3 Bucket Distribution
Event subdomains are distributed across separate S3 buckets tied to Route53 CNAME records. We identified and updated buckets for:
paulsimonradyshell.com— S3 bucket:paulsimonradyshell.com-site- Additional event subdomains — similar naming convention
Updated pages were synced using AWS CLI batch operations:
aws s3 sync ./updated_event_pages s3://paulsimonradyshell.com-site \
--include "*.html" \
--exclude "index.html" \
--metadata-directive REPLACE
CloudFront Invalidation at Scale
After S3 sync, we invalidated CloudFront distribution caches to ensure updated pages propagate within 60 seconds (vs. waiting for TTL expiry). Each subdomain has its own CloudFront distribution ID.
Why separate invalidations? Each distribution corresponds to a distinct subdomain and origin. Batch invalidation by subdomain allows targeted cache purges without affecting unrelated properties.
aws cloudfront create-invalidation \
--distribution-id [DISTRIBUTION_ID] \
--paths "/*" \
--query 'Invalidation.Id' \
--output text
We tracked distribution IDs in a configuration map to automate the invalidation loop across all 12 updated pages and their parent domains.
Key Architectural Decisions
1. Inject into <head>, Not Body
Some teams inject schema into footer or before closing </body>. We chose <head> because:
- Schema crawlers parse head metadata first
- Cleaner separation of metadata (head) from content (body)
- Faster discovery by Google's bot on initial crawl
2. LocalBusiness + Event (Dual Schema)
Rather than Event schema alone, we included LocalBusiness schema for the venue itself. This allows Google to:
- Surface reviews and ratings independently
- Map the venue in knowledge panels
- Link past events to the venue entity
3. Immediate Cache Invalidation vs. Lazy TTL
CloudFront default TTL is 24 hours. We immediately invalidated after S3 updates because:
- Structured data changes affect indexing immediately
- Google crawls active event pages frequently (hours before event)
- Stale schema could cause outdated event info in rich snippets
Validation and Verification
Post-deployment, we validated using:
- Google Rich Results Test: