Building a Unified Analytics Audit Pipeline: GA Code Coverage, Data API Access, and Orchestrator Integration
Over the past development session, we built out a comprehensive analytics auditing system that performs three critical functions: scanning all HTML across multiple platforms for Google Analytics tracking code, establishing programmatic access to GA4 data via service accounts, and orchestrating automated recommendations into our kanban-based progress dashboard. This post details the technical architecture, decision rationale, and specific implementation patterns.
The Problem: Fragmented Analytics Visibility
We were operating with incomplete visibility into user traffic across multiple properties (sailjada.com, burialsatsea.com, salejada.com, queenofsandiego.com). The core issues were:
- No programmatic access to GA4 data — all analysis was manual and delayed
- Unknown GA code coverage across properties — some pages likely missing tracking
- No systematic way to capture audit findings and recommendations
- Campaign status scattered across email providers and local logs
The solution required three parallel tracks: a code auditor, API access configuration, and orchestrator integration.
Part 1: GA Code Coverage Audit
Scanning Strategy
We built a recursive HTML scanner that traverses all site repositories and searches for Google Analytics tracking patterns. The script checks for both legacy (ga.js, analytics.js) and modern (gtag.js, GA4) implementations.
Script Location: /Users/cb/Documents/repos/tools/
Files scanned:
- /Users/cb/Documents/repos/*/public/**/*.html
- /Users/cb/Documents/repos/*/src/**/*.html
- /Users/cb/Documents/repos/*/templates/**/*.html
- /Users/cb/Documents/repos/*/dist/**/*.html
Patterns searched:
- Google Tag Manager: <script src="https://www.googletagmanager.com/gtag/js">
- gtag initialization: window.dataLayer = window.dataLayer || [];
- GA4 property IDs: G-[A-Z0-9]{10}
- Legacy analytics.js: <script src="https://www.google-analytics.com/analytics.js">
Why This Approach
Rather than relying on self-reported analytics implementation, we chose to scan the actual deployed HTML. This prevents gaps from misconfiguration or outdated documentation. We prioritized modern GA4 (gtag.js) detection since that's our current standard, but still checked for legacy implementations to catch pages that hadn't been migrated.
The audit output feeds directly into a dashboard card with specific gaps by property and page, making remediation actionable.
Part 2: GA4 Data API Access via Service Account
Authentication Flow
We established programmatic access to GA4 by configuring a service account with the Analytics Readonly scope. The implementation follows Google's OAuth 2.0 Service Account flow:
File: /Users/cb/Documents/repos/tools/reauth_ga.py
Process:
1. Read service account JSON from secure credential store
2. Build JWT assertion with Analytics Readonly scope
3. Exchange JWT for access token via Google token endpoint
4. Cache token with 1-hour TTL to minimize API calls
5. Use token in GA4 Data API requests
Service Account Scopes:
- https://www.googleapis.com/auth/analytics.readonly
GA4 Properties Mapped:
- sailjada.com: G-[NUMERIC_ID]
- burialsatsea.com: G-[NUMERIC_ID]
- salejada.com: G-[NUMERIC_ID]
- queenofsandiego.com: G-[NUMERIC_ID]
Why Service Accounts Over OAuth
We chose service account authentication instead of user-based OAuth because:
- No token refresh UI needed — service runs unattended in orchestrator
- Scoped to readonly — minimizes blast radius if credentials leak
- Service account exists in GCP project where GA properties are managed
- Token can be cached with long TTL without user logout concerns
The service account itself is configured in Google Analytics Admin console with Editor role on the GA4 properties. This is the recommended pattern for backend data pipelines.
Part 3: Orchestrator Integration & Dashboard Reporting
Data Flow Architecture
Local Scripts (reauth_ga.py, preflight_check.py)
↓
Orchestrator Service (async task spawning)
↓
GA4 Data API (last 30 days metrics pull)
↓
Constant Contact API (campaign status check)
↓
HTML Code Audit Results (GA code coverage)
↓
Recommendation Engine
↓
Kanban Card Generation (t-31aa2593)
↓
Dashboard Deep Link Storage
↓
User Notification
Card Storage & Deep Linking
Audit findings are stored as dashboard cards at the kanban API endpoint with full hash-navigation support:
Card Format: https://progress.queenofsandiego.com/#card-{id}
Example live card from this audit:
https://progress.queenofsandiego.com/#card-t-31aa2593
Card sections generated:
1. GA Code Coverage Report (by property, by page)
2. Last 30 Days Traffic Metrics (sessions, users, conversion rate)
3. Email Campaign Status (current scheduled blasts)
4. Traffic Growth Recommendations
5. Operational Excellence Gaps
Why Kanban Over Email
Instead of dumping findings into console logs or email, we chose kanban cards because:
- Findings don't get buried in chat history
- Deep links allow precise references in discussions
- Card metadata tags (urgent, needs-you, in-progress) provide workflow clarity
- Multiple engineers can see the same state without async communication delays
- Findings stay queryable and filterable over time
Key Infrastructure & Configuration Changes
Google Cloud Project Setup
- Service account created in main GCP project (credentials stored in secure vault)
- Service account added as Editor to GA4 properties via Admin console
- GA4 Data API enabled in GCP APIs dashboard
- Token cache configured with 3600-second TTL
Script Locations & Dependencies
/Users/cb/Documents/repos/tools/reauth_ga.py
- Handles GA4 service account auth and token caching
- Dependencies: google-auth, google-analytics-data
/Users/cb/Documents/repos/tools/preflight_check.py
- Validates service account credentials and scopes
- Checks API enablement in GCP project
- Verifies GA properties are accessible
/Users/cb/Documents/repos/tools/reauth_gbp.py
- Similar auth pattern for Google Business Profile API (future use)
- Shows the extensibility of the service account pattern
Feedback Memory System
We documented the deep link format and audit patterns in the project memory system:
File: /Users/cb/.claude/projects/-Users-cb-Documents-repos/memory/
feedback_dashboard_deep_links