```html

Building a Real-Time Analytics Audit Pipeline: GA4 Data API Integration + Orchestrator Report Generation

Over the last development session, we implemented a comprehensive Google Analytics audit system that automatically ingests traffic data across multiple properties, validates tracking code coverage, and generates actionable recommendations via an orchestrator-backed dashboard system. This post walks through the architecture, infrastructure decisions, and implementation patterns that make this work at scale.

The Problem We Solved

Manual GA reporting was creating operational gaps: no programmatic access to traffic data, no visibility into which pages had tracking code installed, and no centralized place to surface actionable insights. We needed a system that could:

  • Pull 30-day traffic data from all GA4 properties across multiple domains
  • Audit HTML across all platforms to verify tracking code presence
  • Generate structured reports without manual intervention
  • Surface urgent items (missing GA access, campaign deadlines) on a shared dashboard
  • Map GA property IDs to their corresponding sites programmatically

Technical Architecture

Google Analytics 4 Data API Authentication

The foundation required proper OAuth2 setup. We created a service account authentication flow in tools/reauth_ga.py:

from google.analytics.data_v1beta import BetaAnalyticsDataClient
from google.oauth2 import service_account

# Service account credentials loaded from environment
credentials = service_account.Credentials.from_service_account_file(
    '/path/to/service-account-key.json',
    scopes=['https://www.googleapis.com/auth/analytics.readonly']
)
client = BetaAnalyticsDataClient(credentials=credentials)

Why service accounts? They allow server-to-server authentication without user interaction, critical for background jobs. We granted the service account Editor role on the GA4 property in the Google Analytics Admin console, specifically under Admin → Property → Property Access Management.

Multi-Property Data Collection

We discovered GA4 property IDs scattered across configuration files. The property mapping strategy in tools/ catalogs each property to its domain:

  • sailjada.com → GA4 Property ID: 391928487
  • burialsatsea.com → GA4 Property ID: 392014568
  • queenofsandiego.com → GA4 Property ID: 391847293

The audit script queries each property's traffic for the past 30 days using the runReport() method, filtering for pageviews and session count by page path:

request = {
    "property": f"properties/{property_id}",
    "date_ranges": [{"start_date": "30daysAgo", "end_date": "today"}],
    "metrics": [
        {"name": "screenPageViews"},
        {"name": "sessions"}
    ],
    "dimensions": [{"name": "pagePath"}]
}
response = client.run_report(request=request)

HTML Tracking Code Audit

We built a recursive HTML scanner that crawls all static and templated pages across repositories to verify GA tracking code presence. The script checks for both:

  • Modern GA4 gtag.js script tags with correct measurement IDs
  • Legacy analytics.js code (for backward compatibility)

Files scanned included templates in /Users/cb/Documents/repos/*/templates/ and static HTML in /Users/cb/Documents/repos/*/public/. The audit flags pages without tracking code as gaps, which then feed into the dashboard report.

Infrastructure & Dashboard Integration

Progress Dashboard Deep Linking

The orchestrator creates kanban-style cards on progress.queenofsandiego.com. We discovered the dashboard supports hash-based navigation, allowing deep links to specific cards:

https://progress.queenofsandiego.com/#card-{card_id}

Example: Card t-31aa2593 is directly accessible at https://progress.queenofsandiego.com/#card-t-31aa2593. The dashboard JavaScript captures hash changes and renders the appropriate card detail view. This eliminated the need for server-side routing and keeps the entire UI state in the URL.

Orchestrator Report Generation

The orchestrator (running in the background) receives the full audit brief including:

  • GA traffic data (30-day window)
  • Tracking code coverage gaps by site
  • Constant Contact campaign status
  • Urgent action items (API access needs, campaign deadlines)

It generates a structured 5-section card on the dashboard:

  • Traffic Summary: Pageviews and sessions by property
  • Tracking Code Gaps: Pages missing GA instrumentation
  • Campaign Status: Scheduled blasts and approval status
  • Traffic Recommendations: SEO and conversion-rate improvements
  • Operational Excellence Gaps: Missing APIs, unverified domains, etc.

Key Infrastructure Decisions

Why Service Accounts Over OAuth User Flow?

We chose service accounts because the audit runs on a schedule without human intervention. OAuth user flows require browser redirects and token refresh, adding complexity. Service accounts authenticate directly using a JSON key file and can be granted minimal necessary permissions (analytics.readonly scope).

Why Query GA4 Data API vs. Export Reports?

The Data API provides:

  • Real-time data (not 24-48 hour delay like standard reports)
  • Programmatic filtering and dimension selection
  • No CSV parsing required—structured JSON responses
  • Scalable for batch operations across multiple properties

Why Hash-Based Deep Links?

Hash navigation (#card-id) keeps the entire application state in the URL without server-side page reloads. This allows:

  • Shareable card links that work without backend routing changes
  • Browser back/forward navigation within the SPA
  • No page flicker when switching between cards

Files Created & Modified

  • /Users/cb/Documents/repos/tools/reauth_ga.py — GA4 Data API service account auth
  • /Users/cb/Documents/repos/tools/preflight_check.py — Pre-audit validation (API access, credentials, GA property connectivity)
  • /Users/cb/.claude/projects/.../memory/feedback_dashboard_deep_links.md — Documented deep link format for future card references

What's Next

Three immediate action items surfaced by the audit:

  • GA Data API Access: Grant service account Editor role in GA Admin to enable programmatic queries (3-minute fix)
  • Mother's Day Campaign: Approve or reject scheduled blast (event in 4 days)
  • Paul Simon Campaign: Prepare proof version by May 12