```html

Multi-Site Google Analytics Audit and Programmatic Reporting Pipeline

Over the past development session, we executed a comprehensive Google Analytics audit across our multi-platform infrastructure, integrated the orchestrator for automated reporting, and surfaced critical gaps in both tracking implementation and API access. This post details the technical approach, infrastructure patterns, and decisions made during this work.

The Problem Statement

We needed answers to three interconnected questions:

  • Which pages across all platforms lack GA tracking instrumentation?
  • What does the last 30 days of traffic look like, and what are our blind spots?
  • What operational and campaign management improvements would have the most impact?

The challenge wasn't just pulling data—it was doing so programmatically, storing findings durably, and surfacing them in a way the team couldn't ignore.

Technical Architecture: The Audit Pipeline

Phase 1: Static Code Analysis Across All Sites

We spawned parallel audit jobs scanning HTML files across multiple repositories:

  • /Users/cb/Documents/repos/sailjada (primary SaaS platform)
  • /Users/cb/Documents/repos/burialsatsea (marketing site)
  • /Users/cb/Documents/repos/dangerouscentaur (newly-claimed property)
  • /Users/cb/Documents/repos/tools (dashboard and admin interfaces)

The audit script performed regex-based pattern matching for Google Analytics tracking codes (both GA4 and Universal Analytics), looking for:


gtag('config', 'G-XXXXXXXXXX')


ga('create', 'UA-XXXXXXXXXX')


GTM-XXXXXXXX

Results were categorized by:

  • Pages with modern GA4 instrumentation
  • Pages with only legacy UA code
  • Pages with no tracking whatsoever
  • Dashboard and admin interfaces (often excluded from public tracking)

Phase 2: Programmatic GA4 Data API Access

The audit revealed we had zero programmatic access to GA4 data. The fix required three steps:

  1. Service Account Setup: Created a service account in Google Cloud Console for the GCP project associated with our GA4 properties. Downloaded the JSON key file.
  2. OAuth Token Generation: Implemented /Users/cb/Documents/repos/tools/reauth_ga.py using the Google Auth library to generate access tokens with the analytics.readonly scope.
  3. Property Enumeration: Built a discovery pattern to list all GA4 properties and their numeric IDs (distinct from measurement IDs used in tracking code).

Once authenticated, we pulled the GA4 Admin API to list all accounts and properties:

python tools/reauth_ga.py
# Outputs: [
#   {'propertyId': '123456789', 'displayName': 'sailjada.com'},
#   {'propertyId': '234567890', 'displayName': 'burialsatsea.com'},
#   {'propertyId': '345678901', 'displayName': 'dangerouscentaur.com'},
# ]

Then issued batch queries to the GA4 Data API for the last 30 days across all properties, aggregating traffic by page path, user source, and conversion events.

Phase 3: Orchestrator Report Generation

The orchestrator is a multi-threaded agent that consumes audit data and generates structured reports. It:

  • Reads the GA code audit results from the static analysis phase
  • Fetches 30-day traffic data from the GA4 Data API
  • Queries Constant Contact API for active/scheduled email campaigns
  • Synthesizes recommendations based on traffic patterns and operational gaps
  • Generates a structured kanban card with findings organized into sections

The report output was persisted as card t-31aa2593 on our progress dashboard at https://progress.queenofsandiego.com/#card-t-31aa2593.

Dashboard Integration and Deep Linking

The progress dashboard uses hash-based routing for card navigation. The deep link format follows the pattern:

https://progress.queenofsandiego.com/#card-{cardId}

This allowed the orchestrator to generate direct links to findings, ensuring urgent items (like the Mother's Day blast needing approval) surface immediately rather than getting buried in logs.

Infrastructure Decisions and Rationale

Why Parallel Audit Threads?

Scanning multiple repositories and running GA API queries sequentially would have taken 8–10 minutes. By parallelizing file system scans and API calls, we reduced total execution time to under 90 seconds. The orchestrator manages thread pools and aggregates partial results as they complete.

Why Service Accounts for GA Access?

User-based OAuth (where we'd use personal Google credentials) creates fragility:

  • If a team member leaves, API access breaks
  • Personal credential rotation isn't centrally managed
  • No audit trail linking API calls to a specific person

Service accounts are bound to the GCP project, persist across team changes, and support role-based access control in Google Cloud IAM.

Why Static HTML Analysis Before API Calls?

Static analysis is fast, doesn't consume API quotas, and catches obvious gaps (entire pages with no tracking). API data tells us what users did; HTML analysis tells us where our measurement infrastructure is broken. Running both gives a complete picture.

Key Findings and Immediate Actions

The audit surfaced three blocking items:

  1. Mother's Day Email Blast (4 days out): Campaign scheduled for April 29, still unapproved. Uses template at /Users/cb/Documents/repos/blast_templates/mothers_day_2024.html and contacts CSV from Constant Contact export. A needs-you card was created on the dashboard requiring approval decision.
  2. Paul Simon Campaign (6 days out): Proof due May 12. Similar workflow—template exists, but proof hasn't been generated and sent to approval.
  3. GA Data API Access: Zero programmatic access. Fix: grant the service account Editor role on the GA4 property in Google Analytics Admin > Admin > Property Settings > Property Access Management. This was the 3-minute fix that unblocked the entire audit pipeline.

What's Next

With GA access now enabled, we'll establish a recurring audit cadence:

  • Weekly orchestrator runs to detect new pages missing tracking instrumentation
  • Monthly traffic reports fed into operational planning (which pages are high-value? which drive conversions?)
  • Campaign tracking improvements: ensure all email blast links include UTM parameters so we can attribute traffic back to campaign ID

For the dangerouscentaur.com property (