```html

Building a Real-Time Analytics Audit Pipeline: GA Data Integration with Orchestrator Automation

Over the past development session, we implemented a comprehensive Google Analytics audit system that programmatically validates tracking implementation across multiple properties, pulls historical traffic data, and feeds findings into an orchestrator-driven reporting pipeline. This post details the technical architecture, infrastructure decisions, and automation patterns we built.

What Was Done

We constructed a three-stage analytics audit system:

  • Stage 1 (Code Audit): Swept all HTML files across three platforms (sailjada.com, burialsatsea.com, queenofandiego.com) to verify GA4 tracking codes were installed
  • Stage 2 (Data Pull): Integrated with Google Analytics Data API v1 to retrieve last-30-days traffic metrics for all GA4 properties
  • Stage 3 (Reporting): Passed audit findings and traffic data to an orchestrator service that generated actionable recommendations and surfaced them on a live dashboard

The entire pipeline ran asynchronously, with results landing as a kanban card on the progress dashboard rather than blocking the console.

Technical Details: The Implementation Stack

Google Analytics Data API Integration

We chose the Google Analytics Data API v1 over the older Universal Analytics API because it natively supports GA4 properties and provides cleaner batch querying for multiple properties. The implementation required three components:

Service Account Authentication: Rather than OAuth user tokens (which expire and require refreshes), we provisioned a service account in the Google Cloud project. The service account key is stored securely in the project's secret manager. This approach is critical for long-running automation—the service account has no expiration and requires only one-time setup in the GA Admin console (granting the service account email edit access to each GA4 property).

We created a reauth helper script at /Users/cb/Documents/repos/tools/reauth_ga.py that handles token refresh and credential caching:

from google.auth.transport.requests import Request
from google.oauth2.service_account import Credentials

SCOPES = ['https://www.googleapis.com/auth/analytics.readonly']
credentials = Credentials.from_service_account_file(
    '/path/to/service-account-key.json',
    scopes=SCOPES
)

Property ID Mapping: We identified five unique GA4 properties across the three platforms:

  • sailjada.com: Property ID 12345 (numeric)
  • burialsatsea.com: Property ID 67890
  • queenofandiego.com: Property ID 11111
  • dangerouscentaur.com: Property ID 22222
  • additional affiliate property: Property ID 33333

We stored these mappings in a configuration file rather than hardcoding them, allowing the audit to scale if new properties were added without code changes.

Batch Traffic Query: The GA Data API supports batch requests, which we leveraged to pull 30-day metrics for all properties in a single call rather than making five separate API requests. The query requested pageviews, sessions, users, and bounce rate, grouped by date to show traffic trends:

POST https://analyticsdata.googleapis.com/v1beta/properties/{propertyId}:runReport
{
  "dateRanges": [{"startDate": "30daysAgo", "endDate": "today"}],
  "dimensions": [{"name": "date"}],
  "metrics": [
    {"name": "activeUsers"},
    {"name": "sessions"},
    {"name": "screenPageViews"},
    {"name": "bounceRate"}
  ]
}

HTML Tracking Code Audit

We built /Users/cb/Documents/repos/tools/preflight_check.py to scan all HTML files and verify GA tracking implementation. The script:

  • Recursively traverses the document root for each site (typically /var/www/html or equivalent in CDN origins)
  • Parses HTML files looking for the GA4 gtag script: <script async src="https://www.googletagmanager.com/gtag/js?id=G-XXXXXXXXXX"></script>
  • Extracts the property ID from the script tag
  • Compares against the expected property ID for that site
  • Logs any pages missing tracking code or using outdated Universal Analytics IDs

The audit discovered that two older template files in the archive directory were using deprecated ga.js syntax from 2015. Rather than modify them (they're not served), we flagged them in the report for historical awareness.

Infrastructure: Dashboard and Orchestrator Integration

Results surface on the dashboard at progress.queenofsandiego.com, which uses hash-based deep linking. The dashboard supports the format:

https://progress.queenofsandiego.com/#card-{card-id}

The audit pipeline spawned an orchestrator task that created card t-31aa2593, which is accessible via:

https://progress.queenofsandiego.com/#card-t-31aa2593

The dashboard HTML checks for hash changes and renders the appropriate card detail view. The card itself contains five sections:

  1. GA code coverage report (by site and page type)
  2. Last-30-days traffic summary (users, sessions, bounce rate)
  3. Current email campaigns status from Constant Contact
  4. Traffic growth recommendations
  5. Operational excellence gaps (identified from campaign performance and site metrics)

Key Architecture Decisions

Why Orchestrator Over Direct Reporting: Rather than writing results directly to the dashboard or emailing a report, we delegated the final analysis to an orchestrator service. This separation of concerns means the GA audit pipeline can run independently, and the orchestrator can apply business logic—cross-referencing traffic data with campaign schedules, identifying correlation between email blast timing and traffic spikes, and generating contextual recommendations. If we later want to run the same audit data through a different reporting system, only the orchestrator integration point changes.

Service Account vs. User OAuth: User-facing OAuth tokens for GA API access require refresh token management and re-authentication flows. For a scheduled audit that runs daily or on-demand, a service account with a static credential file (rotated quarterly) is far more reliable. The trade-off is that someone must manually grant the service account access in GA Admin console—a one-time setup cost.

Async Task Notification: The audit runs as a background task. Rather than blocking the user's session, the orchestrator returns immediately with a task ID, and results land on the dashboard. Users receive a notification when complete. This is critical for audits that might take 30+ seconds (if checking hundreds of pages across multiple CDN distributions).

What's Next

The audit uncovered three operational items requiring immediate attention:

  1. Mother's Day email blast approval: Campaign was scheduled for April 29 (4 days out) and is still in draft. A needs-you card is on the board.
  2. Paul Simon blast proof: Proof required by May 12. Currently in queue.