```html

Building an Automated GA4 Traffic Audit Pipeline with Orchestrated Reporting

What Was Done

We built an end-to-end Google Analytics 4 (GA4) audit system that programmatically inventories tracking code across all web properties, pulls 30-day traffic data via the GA4 Data API, validates campaign configurations, and generates a consolidated executive report delivered as a kanban card on our progress dashboard. The system identified three critical gaps: missing GA4 tracking on several pages, zero programmatic API access for automated reporting, and two time-sensitive email campaigns requiring immediate attention.

Technical Details: GA4 Code Audit

The audit phase scanned HTML files across all deployed sites to verify GA4 measurement ID presence. Rather than manually checking each property, we built a recursive file scanner that:

  • Traverses /Users/cb/Documents/repos/ directory structure across all site repositories
  • Extracts HTML files and searches for GA4 script tags: <script async src="https://www.googletagmanager.com/gtag/js?id=G-[MEASUREMENT_ID]">
  • Maps each measurement ID to its corresponding site and environment (staging/production)
  • Flags missing instrumentation by comparing deployed page inventory against GA property configuration
  • Generates a detailed gap report by site

This approach catches the most common tracking failure mode: new pages deployed without corresponding GA4 initialization code. We discovered that dangerouscentaur.com and several secondary properties were missing instrumentation entirely, meaning traffic on those domains was invisible to our analytics.

GA4 Data API Access: The Service Account Pattern

To pull 30-day traffic data programmatically, we needed to establish OAuth 2.0 service account authentication to the GA4 Data API. The process involved:

  • Creating a service account in Google Cloud (project: analytics-automation)
  • Generating a client secret JSON credential file (stored securely in ~/.config/gcloud/)
  • Granting the service account Viewer role on each GA4 property in Google Analytics Admin
  • Testing token generation and API calls using the Python Google Analytics Data client library

The key insight: service accounts bypass the OAuth browser flow entirely, enabling automated, unattended API access. We installed the official client library with:

pip install --user google-analytics-data

Then validated token generation against the correct GA4 numeric property IDs (not the measurement IDs). This distinction tripped up the initial attempt—GA4 property IDs are numeric (e.g., 123456789), while measurement IDs are alphanumeric prefixed with G- (e.g., G-ABCD1234EF).

Multi-Property Data Aggregation

We pull traffic data from three main GA4 properties covering our portfolio:

  • sailjada.com (primary commerce property)
  • burialsatsea.events (event ticketing)
  • queenofsandiego.com (content/marketing)

The orchestrator queries the google.analytics.data_v1beta.BetaAnalyticsDataClient for each property using:

RunReportRequest(
    property="properties/[NUMERIC_ID]",
    date_ranges=[DateRange(start_date="30daysAgo", end_date="today")],
    dimensions=[Dimension(name="pagePath"), Dimension(name="sessionDefaultChannelGrouping")],
    metrics=[Metric(name="activeUsers"), Metric(name="sessions"), Metric(name="bounceRate")]
)

This gives us user engagement by page and acquisition channel for the past month, aggregated across all properties in a single report.

Infrastructure: Orchestrator-to-Dashboard Pipeline

The results feed into our kanban-style dashboard system running at progress.queenofsandiego.com. The dashboard supports hash-based deep linking with the pattern:

https://progress.queenofsandiego.com/#card-{id}

The orchestrator creates a card with ID t-31aa2593 containing five report sections:

  • GA Code Audit: Site-by-site instrumentation status and identified gaps
  • Last 30 Days Traffic: Aggregated users, sessions, and engagement metrics by property
  • Channel Performance: Organic search vs. paid vs. direct vs. referral traffic breakdown
  • Email Campaign Status: Constant Contact campaign inventory and send status
  • Operational Excellence Recommendations: Specific, actionable next steps

This card-based approach prevents report fatigue—findings live in a versioned kanban system rather than disappearing into email.

Email Campaign Monitoring: Constant Contact Integration

The audit revealed two time-sensitive campaigns:

  • Mother's Day Blast (scheduled April 29, 4 days away): Currently unapproved. Contacts already deduped and ready. Marked as needs-you for immediate approval.
  • Paul Simon Concert Promotion (proof deadline May 12, 6 days away): Template ready but proof version needs sign-off before send.

We cross-reference the Constant Contact export CSV (located at /var/contacts/constant_contact_export.csv) against campaign logs stored in S3 at s3://campaign-logs/[campaign_id]/sent_log.json to identify which contacts have already received each campaign, preventing duplicate sends.

Search Console and CDN Validation

As part of operational excellence, we also verified Search Console ownership for all properties. For dangerouscentaur.com, which was missing GA tracking, we:

  • Identified its CloudFront distribution ID (used for cache invalidation)
  • Located the origin S3 bucket
  • Generated a GSC verification token and uploaded the HTML verification file to the bucket root
  • Confirmed ownership in Search Console and submitted the sitemap

This ensures Google can crawl and index the property while we add GA4 instrumentation.

Key Decisions and Trade-offs

Service Accounts vs. User OAuth: We chose service accounts because they don't require manual token refresh. User OAuth credentials expire every hour and require interactive browser login—inappropriate for a scheduled, unattended pipeline.

Numeric Property IDs vs. Measurement IDs: The Data API requires numeric GA4 property IDs, not the G-XXXX measurement IDs. This is non-obvious from the docs and caused an initial failure. We now maintain a mapping file in /Users/cb/Documents/repos/tools/ga_property_map.py to convert between the two formats.

Card-Based Reporting: Rather than email reports or static dashboards, we use kanban cards because they enable filtering, sorting, and deep-linking. Engineers can reference findings by card ID, and cards live in version control.

What's Next

  • Add GA4 tracking code to all flag