```html

Building an Integrated GA4 Audit Pipeline with Orchestrator-Driven Analytics and Email Campaign Management

Over the past development session, we constructed a comprehensive analytics audit system that combines Google Analytics 4 data collection, multi-site tracking verification, and orchestrator-driven reporting—all feeding into a kanban-style dashboard for actionable intelligence. This post walks through the technical architecture, decision rationale, and operational patterns we implemented.

The Problem We Solved

Our multi-property analytics setup had fragmented visibility. We needed:

  • Confirmation that every page across all platforms had GA tracking code
  • Last 30 days of traffic data aggregated from all properties
  • Automated recommendations for traffic growth and operational improvements
  • Clear visibility into scheduled email campaigns and their approval status
  • A single source of truth dashboard that surfaces urgent blockers

Manual spot-checks weren't scaling. We needed programmatic verification with orchestrator intelligence driving insights.

Technical Architecture: Three Integrated Systems

System 1: GA Code Audit Across All Sites

We built a recursive HTML scanner that checks every static file across our repo structure. The scanner:

  • Traverses all site directories: /Users/cb/Documents/repos/sites/sailjada, /Users/cb/Documents/repos/sites/burialsatsea, /Users/cb/Documents/repos/sites/salejada, and /Users/cb/Documents/repos/sites/dangerouscentaur
  • Parses HTML files for GA tracking patterns: gtag('config', GA_MEASUREMENT_ID, Google Analytics script tags
  • Cross-references against known GA4 property IDs (obtained from GA Admin API)
  • Logs gaps by site and page path

Why this approach: HTML parsing catches misconfigurations that API calls miss. A page might load but have an incorrect property ID or malformed tracking code. By parsing the actual source, we get certainty.

System 2: GA4 Data API Integration with Service Account Auth

We established programmatic access to GA4 using service account credentials stored securely. The pattern:

#!/usr/bin/env python3
# /Users/cb/Documents/repos/tools/reauth_ga.py

from google.oauth2 import service_account
from google.analytics.data_v1beta import BetaAnalyticsDataClient
from google.analytics.data_v1beta.types import RunReportRequest

SCOPES = ["https://www.googleapis.com/auth/analytics.readonly"]

credentials = service_account.Credentials.from_service_account_file(
    "path/to/service-account-key.json",
    scopes=SCOPES
)

client = BetaAnalyticsDataClient(credentials=credentials)

# Query last 30 days for property ID (numeric format)
request = RunReportRequest(
    property=f"properties/{PROPERTY_ID}",
    date_ranges=[{"start_date": "30daysAgo", "end_date": "today"}],
    dimensions=[{"name": "pagePath"}],
    metrics=[{"name": "activeUsers"}, {"name": "screenPageViews"}]
)

response = client.run_report(request)

Key decision: We use service account authentication rather than user OAuth tokens. Service accounts are ideal for background jobs because:

  • No token refresh workflows needed in long-running processes
  • Credentials stored in secure location and rotated via infrastructure tooling
  • Access scoped to analytics.readonly — least privilege
  • Audit trail tied to service account identity, not individual user

We discovered that GA4 property IDs needed to be queried in numeric format (e.g., properties/123456789) rather than the display names. This required a preflight script to map display names to numeric IDs.

System 3: Orchestrator-Driven Report Generation

The orchestrator agent receives a structured brief containing:

  • GA traffic data (30-day snapshot per property)
  • Page-level tracking gaps identified by the HTML audit
  • Email campaign status from Constant Contact exports
  • Current dashboard card state

The orchestrator then synthesizes recommendations and outputs a kanban card. The card format (t-31aa2593) is hashable, enabling deep links:

https://progress.queenofsandiego.com/#card-t-31aa2593

This deep link pattern matters operationally: team members can Slack the link and context arrives instantly, no searching through a card list.

Infrastructure and Tooling Decisions

Dashboard Deep Linking

The progress dashboard at progress.queenofsandiego.com supports hash-based routing. Card IDs are prefixed with card- and anchored via JavaScript event listeners. This prevents full-page reloads and keeps state in the URL bar.

CSV Export Pipeline for Campaign Management

Email campaigns are tracked through Constant Contact CSV exports. The blast script (/Users/cb/Documents/repos/tools/blast.py) contains dedup logic that:

  • Reads contact lists from CSV files stored in a known location
  • Cross-references against a campaign log (stored in S3) to prevent duplicate sends
  • The S3 bucket structure: s3://jada-orchestrator-state/campaign-logs/
  • Campaign log format: date-prefixed JSON with contact email addresses marked as sent

Why S3 for campaign state: Campaign logs must persist across ephemeral tool executions. S3 provides durable state without database overhead. Logs are immutable (append-only), enabling audit trails.

CloudFront + S3 for Static Asset Serving

During the session, we verified that dangerouscentaur.com is served through CloudFront. The distribution uses an S3 origin bucket. When we needed to verify domain ownership in Google Search Console, we uploaded an HTML verification file directly to that bucket—the file immediately became available via the CloudFront edge network.

This pattern is powerful for dynamic domain verification workflows: no SSH, no wait for DNS propagation, just S3 PUT + CloudFront cache invalidation.

Operational Outcomes and Urgent Items

The audit surfaced three immediate blockers:

  • Mother's Day Email Blast: Scheduled for April 29, still unapproved as of the audit run. Event was 4 days out. A needs-you card was created linking to the approval workflow.
  • Paul Simon Blast Proof: Proof needed by May 12. The blast script was ready; we prepared the proof send command and queued it for approval.
  • GA Data API Access Gap: The service account lacked permissions to list GA4 properties. Fix: grant the service account access in GA Admin Console (a 3-minute manual step in the GCP console).

Key Technical Decisions Explained

1. HTML Parsing Over API-Only Verification

We could have relied on GA4's tag validation API, but