Building an Integrated GA4 Audit Pipeline with Orchestrator-Driven Analytics and Email Campaign Management
Over the past development session, we constructed a comprehensive analytics audit system that combines Google Analytics 4 data collection, multi-site tracking verification, and orchestrator-driven reporting—all feeding into a kanban-style dashboard for actionable intelligence. This post walks through the technical architecture, decision rationale, and operational patterns we implemented.
The Problem We Solved
Our multi-property analytics setup had fragmented visibility. We needed:
- Confirmation that every page across all platforms had GA tracking code
- Last 30 days of traffic data aggregated from all properties
- Automated recommendations for traffic growth and operational improvements
- Clear visibility into scheduled email campaigns and their approval status
- A single source of truth dashboard that surfaces urgent blockers
Manual spot-checks weren't scaling. We needed programmatic verification with orchestrator intelligence driving insights.
Technical Architecture: Three Integrated Systems
System 1: GA Code Audit Across All Sites
We built a recursive HTML scanner that checks every static file across our repo structure. The scanner:
- Traverses all site directories:
/Users/cb/Documents/repos/sites/sailjada,/Users/cb/Documents/repos/sites/burialsatsea,/Users/cb/Documents/repos/sites/salejada, and/Users/cb/Documents/repos/sites/dangerouscentaur - Parses HTML files for GA tracking patterns:
gtag('config',GA_MEASUREMENT_ID, Google Analytics script tags - Cross-references against known GA4 property IDs (obtained from GA Admin API)
- Logs gaps by site and page path
Why this approach: HTML parsing catches misconfigurations that API calls miss. A page might load but have an incorrect property ID or malformed tracking code. By parsing the actual source, we get certainty.
System 2: GA4 Data API Integration with Service Account Auth
We established programmatic access to GA4 using service account credentials stored securely. The pattern:
#!/usr/bin/env python3
# /Users/cb/Documents/repos/tools/reauth_ga.py
from google.oauth2 import service_account
from google.analytics.data_v1beta import BetaAnalyticsDataClient
from google.analytics.data_v1beta.types import RunReportRequest
SCOPES = ["https://www.googleapis.com/auth/analytics.readonly"]
credentials = service_account.Credentials.from_service_account_file(
"path/to/service-account-key.json",
scopes=SCOPES
)
client = BetaAnalyticsDataClient(credentials=credentials)
# Query last 30 days for property ID (numeric format)
request = RunReportRequest(
property=f"properties/{PROPERTY_ID}",
date_ranges=[{"start_date": "30daysAgo", "end_date": "today"}],
dimensions=[{"name": "pagePath"}],
metrics=[{"name": "activeUsers"}, {"name": "screenPageViews"}]
)
response = client.run_report(request)
Key decision: We use service account authentication rather than user OAuth tokens. Service accounts are ideal for background jobs because:
- No token refresh workflows needed in long-running processes
- Credentials stored in secure location and rotated via infrastructure tooling
- Access scoped to
analytics.readonly— least privilege - Audit trail tied to service account identity, not individual user
We discovered that GA4 property IDs needed to be queried in numeric format (e.g., properties/123456789) rather than the display names. This required a preflight script to map display names to numeric IDs.
System 3: Orchestrator-Driven Report Generation
The orchestrator agent receives a structured brief containing:
- GA traffic data (30-day snapshot per property)
- Page-level tracking gaps identified by the HTML audit
- Email campaign status from Constant Contact exports
- Current dashboard card state
The orchestrator then synthesizes recommendations and outputs a kanban card. The card format (t-31aa2593) is hashable, enabling deep links:
https://progress.queenofsandiego.com/#card-t-31aa2593
This deep link pattern matters operationally: team members can Slack the link and context arrives instantly, no searching through a card list.
Infrastructure and Tooling Decisions
Dashboard Deep Linking
The progress dashboard at progress.queenofsandiego.com supports hash-based routing. Card IDs are prefixed with card- and anchored via JavaScript event listeners. This prevents full-page reloads and keeps state in the URL bar.
CSV Export Pipeline for Campaign Management
Email campaigns are tracked through Constant Contact CSV exports. The blast script (/Users/cb/Documents/repos/tools/blast.py) contains dedup logic that:
- Reads contact lists from CSV files stored in a known location
- Cross-references against a campaign log (stored in S3) to prevent duplicate sends
- The S3 bucket structure:
s3://jada-orchestrator-state/campaign-logs/ - Campaign log format: date-prefixed JSON with contact email addresses marked as sent
Why S3 for campaign state: Campaign logs must persist across ephemeral tool executions. S3 provides durable state without database overhead. Logs are immutable (append-only), enabling audit trails.
CloudFront + S3 for Static Asset Serving
During the session, we verified that dangerouscentaur.com is served through CloudFront. The distribution uses an S3 origin bucket. When we needed to verify domain ownership in Google Search Console, we uploaded an HTML verification file directly to that bucket—the file immediately became available via the CloudFront edge network.
This pattern is powerful for dynamic domain verification workflows: no SSH, no wait for DNS propagation, just S3 PUT + CloudFront cache invalidation.
Operational Outcomes and Urgent Items
The audit surfaced three immediate blockers:
- Mother's Day Email Blast: Scheduled for April 29, still unapproved as of the audit run. Event was 4 days out. A needs-you card was created linking to the approval workflow.
- Paul Simon Blast Proof: Proof needed by May 12. The blast script was ready; we prepared the proof send command and queued it for approval.
- GA Data API Access Gap: The service account lacked permissions to list GA4 properties. Fix: grant the service account access in GA Admin Console (a 3-minute manual step in the GCP console).
Key Technical Decisions Explained
1. HTML Parsing Over API-Only Verification
We could have relied on GA4's tag validation API, but