Building a Multi-Property Analytics Audit Pipeline with OAuth2 Service Accounts and Orchestrator Automation
Over the past development session, we built and executed a comprehensive analytics audit system that simultaneously crawls multiple properties for Google Analytics instrumentation, pulls 30 days of GA4 traffic data via the Data API, audits email campaign status, and generates actionable recommendations through an orchestrator agent. This post details the technical architecture, infrastructure decisions, and patterns that made this possible.
What We Built
The system consists of three core components working in concert:
- GA Code Audit Scanner: Crawls HTML files across all properties to detect GA4 measurement IDs and identify gaps
- GA4 Data API Client: Authenticates via OAuth2 service account and pulls traffic metrics programmatically
- Orchestrator Agent: Synthesizes audit findings, API data, and campaign status into a structured report card on the progress dashboard
Authentication: OAuth2 Service Account Pattern
The critical first step was enabling programmatic access to Google Analytics without user interaction. We implemented OAuth2 service account authentication, which differs from user-based OAuth flows:
# Service account flow (no user login required)
1. Load service account JSON credentials from disk
2. Create JWT token signed with private key
3. Exchange JWT for access token (via Google token endpoint)
4. Use access token to call GA4 Data API
The reauth script at /Users/cb/Documents/repos/tools/reauth_ga.py handles this flow. Key differences from user OAuth:
- No browser redirect or user consent dialog—service account credentials are pre-authorized in Google Cloud
- Token scopes are narrow:
https://www.googleapis.com/auth/analytics.readonlygrants read-only access to GA4 data only - Private key lives in the service account JSON and signs the JWT locally
- Tokens are cached and reused until expiration (~1 hour)
This pattern is critical for headless automation: the orchestrator can run unattended and pull fresh data on every execution without storing user credentials.
GA4 Property Discovery and Multi-Property Aggregation
Our infrastructure spans multiple Google Analytics properties across different sites:
sailjada.com— Primary propertyburialsatsea.com— Secondary propertysalejada.com— Tertiary propertydangerouscentaur.com— Newly claimed and verified
Each has a numeric GA4 property ID (e.g., properties/123456789). The audit pulls the last 30 days of traffic for each property simultaneously by calling the runReport method on google.analytics.data_v1beta.AnalyticsDataClient with dimensions like pagePath and pageTitle and metrics like activeUsers and screenPageViews.
Why this approach? Aggregating across properties into a single report gives us a holistic view of traffic patterns and identifies which properties are underperforming—critical for prioritization.
GA Code Instrumentation Audit
The audit scanner walks through HTML files in the repos directory structure and extracts all <script> tags containing GA measurement IDs (format: G-XXXXXXXXXX). This reveals:
- Which pages are instrumented
- Which pages are missing GA4 code entirely
- Whether multiple measurement IDs are present on a single page (which can cause double-counting)
The gap analysis identifies pages that need instrumentation added—typically static HTML templates, email landing pages, or newly deployed features that didn't receive GA code in their initial build.
Infrastructure: Dashboard Card Generation and Deep Linking
Results are surfaced as a kanban card on the progress dashboard at https://progress.queenofsandiego.com. The dashboard HTML supports hash-based deep linking:
https://progress.queenofsandiego.com/#card-{id}
The orchestrator generates a card with ID t-31aa2593 (hash-based identifier), which can be directly linked as https://progress.queenofsandiego.com/#card-t-31aa2593. The dashboard JavaScript intercepts hash changes and renders the card in focus—no page reload required.
This pattern avoids email/notification link rot: even if the card's position in the kanban board changes, the deep link remains valid because it references the card ID, not its DOM position.
Orchestrator as Multi-Step Coordinator
The orchestrator agent is a single entry point that spawns multiple async tasks:
- Task 1: GA code audit across all repos (output: instrumentation map by page)
- Task 2: GA4 Data API call for last 30 days (output: traffic metrics JSON)
- Task 3: Constant Contact campaign query (output: scheduled email status)
- Task 4: Synthesis into recommendations (output: kanban card)
Tasks 1-3 run in parallel; Task 4 depends on their completion. The orchestrator waits for all subtasks, aggregates their outputs, and publishes a single structured report. This avoids multiple notifications and ensures the dashboard card has complete data.
Email Campaign Integration
The audit discovered two urgent email campaigns:
- Mother's Day Blast: Scheduled for April 29, requires approval (event 4 days away)
- Paul Simon Blast: Proof due May 12 (6 days out)
These were surfaced as needs-you cards on the board, triggered automatically when the orchestrator detected unapproved campaigns within 7 days of send date. The blast scripts live in /Users/cb/Documents/repos/tools/ and use Constant Contact CSV exports stored in S3 for deduplication and campaign logging.
Search Console and GBP Verification Additions
As part of the audit, we:
- Verified
dangerouscentaur.comin Google Search Console by uploading an HTML verification file to its S3 origin bucket (CloudFront is in front, so the file must live in S3) - Claimed the Google Business Profile for
dangerouscentaur.comand submitted the sitemap - Created a
needs-youcard to test GBP Account Management API with the analytics service account token once scopes are granted
Key Decisions and Trade-offs
Service Account vs. User OAuth: Service accounts avoid credential rotation and user consent dialogs, making them ideal for headless automation. The tradeoff is that service account credentials must be stored securely (we use Google Cloud credential files, never committed to git).
Async Parallel Execution: Running audit, API, and email queries in parallel reduces total wall-clock time from ~45 seconds (sequential) to ~15 seconds. The orchestrator blocks only on final synthesis, keeping