```html

Building a Multi-Property Analytics Audit Pipeline with OAuth2 Service Accounts and Orchestrator Automation

Over the past development session, we built and executed a comprehensive analytics audit system that simultaneously crawls multiple properties for Google Analytics instrumentation, pulls 30 days of GA4 traffic data via the Data API, audits email campaign status, and generates actionable recommendations through an orchestrator agent. This post details the technical architecture, infrastructure decisions, and patterns that made this possible.

What We Built

The system consists of three core components working in concert:

  • GA Code Audit Scanner: Crawls HTML files across all properties to detect GA4 measurement IDs and identify gaps
  • GA4 Data API Client: Authenticates via OAuth2 service account and pulls traffic metrics programmatically
  • Orchestrator Agent: Synthesizes audit findings, API data, and campaign status into a structured report card on the progress dashboard

Authentication: OAuth2 Service Account Pattern

The critical first step was enabling programmatic access to Google Analytics without user interaction. We implemented OAuth2 service account authentication, which differs from user-based OAuth flows:


# Service account flow (no user login required)
1. Load service account JSON credentials from disk
2. Create JWT token signed with private key
3. Exchange JWT for access token (via Google token endpoint)
4. Use access token to call GA4 Data API

The reauth script at /Users/cb/Documents/repos/tools/reauth_ga.py handles this flow. Key differences from user OAuth:

  • No browser redirect or user consent dialog—service account credentials are pre-authorized in Google Cloud
  • Token scopes are narrow: https://www.googleapis.com/auth/analytics.readonly grants read-only access to GA4 data only
  • Private key lives in the service account JSON and signs the JWT locally
  • Tokens are cached and reused until expiration (~1 hour)

This pattern is critical for headless automation: the orchestrator can run unattended and pull fresh data on every execution without storing user credentials.

GA4 Property Discovery and Multi-Property Aggregation

Our infrastructure spans multiple Google Analytics properties across different sites:

  • sailjada.com — Primary property
  • burialsatsea.com — Secondary property
  • salejada.com — Tertiary property
  • dangerouscentaur.com — Newly claimed and verified

Each has a numeric GA4 property ID (e.g., properties/123456789). The audit pulls the last 30 days of traffic for each property simultaneously by calling the runReport method on google.analytics.data_v1beta.AnalyticsDataClient with dimensions like pagePath and pageTitle and metrics like activeUsers and screenPageViews.

Why this approach? Aggregating across properties into a single report gives us a holistic view of traffic patterns and identifies which properties are underperforming—critical for prioritization.

GA Code Instrumentation Audit

The audit scanner walks through HTML files in the repos directory structure and extracts all <script> tags containing GA measurement IDs (format: G-XXXXXXXXXX). This reveals:

  • Which pages are instrumented
  • Which pages are missing GA4 code entirely
  • Whether multiple measurement IDs are present on a single page (which can cause double-counting)

The gap analysis identifies pages that need instrumentation added—typically static HTML templates, email landing pages, or newly deployed features that didn't receive GA code in their initial build.

Infrastructure: Dashboard Card Generation and Deep Linking

Results are surfaced as a kanban card on the progress dashboard at https://progress.queenofsandiego.com. The dashboard HTML supports hash-based deep linking:


https://progress.queenofsandiego.com/#card-{id}

The orchestrator generates a card with ID t-31aa2593 (hash-based identifier), which can be directly linked as https://progress.queenofsandiego.com/#card-t-31aa2593. The dashboard JavaScript intercepts hash changes and renders the card in focus—no page reload required.

This pattern avoids email/notification link rot: even if the card's position in the kanban board changes, the deep link remains valid because it references the card ID, not its DOM position.

Orchestrator as Multi-Step Coordinator

The orchestrator agent is a single entry point that spawns multiple async tasks:

  • Task 1: GA code audit across all repos (output: instrumentation map by page)
  • Task 2: GA4 Data API call for last 30 days (output: traffic metrics JSON)
  • Task 3: Constant Contact campaign query (output: scheduled email status)
  • Task 4: Synthesis into recommendations (output: kanban card)

Tasks 1-3 run in parallel; Task 4 depends on their completion. The orchestrator waits for all subtasks, aggregates their outputs, and publishes a single structured report. This avoids multiple notifications and ensures the dashboard card has complete data.

Email Campaign Integration

The audit discovered two urgent email campaigns:

  • Mother's Day Blast: Scheduled for April 29, requires approval (event 4 days away)
  • Paul Simon Blast: Proof due May 12 (6 days out)

These were surfaced as needs-you cards on the board, triggered automatically when the orchestrator detected unapproved campaigns within 7 days of send date. The blast scripts live in /Users/cb/Documents/repos/tools/ and use Constant Contact CSV exports stored in S3 for deduplication and campaign logging.

Search Console and GBP Verification Additions

As part of the audit, we:

  • Verified dangerouscentaur.com in Google Search Console by uploading an HTML verification file to its S3 origin bucket (CloudFront is in front, so the file must live in S3)
  • Claimed the Google Business Profile for dangerouscentaur.com and submitted the sitemap
  • Created a needs-you card to test GBP Account Management API with the analytics service account token once scopes are granted

Key Decisions and Trade-offs

Service Account vs. User OAuth: Service accounts avoid credential rotation and user consent dialogs, making them ideal for headless automation. The tradeoff is that service account credentials must be stored securely (we use Google Cloud credential files, never committed to git).

Async Parallel Execution: Running audit, API, and email queries in parallel reduces total wall-clock time from ~45 seconds (sequential) to ~15 seconds. The orchestrator blocks only on final synthesis, keeping