Building an Automated GA4 Audit Pipeline with Orchestrator Integration and Deep-Link Dashboard Navigation
What Was Done
We implemented a comprehensive Google Analytics 4 audit system that programmatically scans all website repositories for tracking code coverage, pulls the last 30 days of traffic data across multiple GA4 properties, surfaces findings through an orchestrator-generated report, and integrated deep-linking into our kanban-style progress dashboard. This effort resolved a critical visibility gap: we had no programmatic access to GA4 data and no systematic way to verify tracking implementation across our platform portfolio.
Technical Architecture
The system consists of three main components working in tandem:
- GA Code Audit Scanner — Walks the file tree across all site repositories, grepping for Google Analytics measurement IDs in HTML templates and checking for gtag initialization patterns
- GA4 Data API Client — Authenticates via service account credentials and pulls traffic metrics (sessions, users, pageviews) for the last 30 days across all GA4 properties
- Orchestrator Report Generator — Consumes audit and traffic data, generates structured findings, creates kanban cards on the progress dashboard, and identifies operational gaps
Service Account Setup and OAuth Flow
We followed Google's recommended pattern for headless/backend API access. Rather than using user OAuth tokens (which expire and require browser interaction), we created a service account in Google Cloud Console and granted it read-only access to our GA4 properties via the Analytics Admin API.
The pattern used in /Users/cb/Documents/repos/tools/reauth_ga.py establishes this flow:
# Load service account JSON from secure storage
credentials = service_account.Credentials.from_service_account_file(
'/path/to/service-account-key.json',
scopes=['https://www.googleapis.com/auth/analytics.readonly']
)
# Build GA4 Data API client
analyticsdata_v1beta = build(
'analyticsdata',
'v1beta',
credentials=credentials
)
# Query last 30 days of traffic data
request = analyticsdata_v1beta.properties().runReport(
property=f'properties/{ga4_property_id}',
body={
'dateRanges': [{'startDate': '30daysAgo', 'endDate': 'today'}],
'metrics': [
{'name': 'sessions'},
{'name': 'totalUsers'},
{'name': 'screenPageViews'}
]
}
)
response = request.execute()
Why this approach: Service accounts eliminate token rotation overhead and allow the system to run unattended. We granted analytics.readonly scope only—the principle of least privilege. No dashboard code or report generator has write access to GA4 configuration.
GA Property Mapping and Multi-Site Coverage
We identified five distinct GA4 properties across our site portfolio:
- sailjada.com (primary booking/operational site)
- queenofsandiego.com (tour/experience site)
- dangerouscentaur.com (newly onboarded, required verification)
- burialsatsea.com (legacy property)
- salejada.com (secondary sales funnel)
Each property ID was mapped in our tools configuration. The audit scanner checked each site's codebase for the corresponding measurement ID pattern. For example, in sailjada.com templates, we verified presence of:
<script async src="https://www.googletagmanager.com/gtag/js?id=G-XXXXX"></script>
<script>
window.dataLayer = window.dataLayer || [];
function gtag(){dataLayer.push(arguments);}
gtag('js', new Date());
gtag('config', 'G-XXXXX');
</script>
The audit identified pages missing this code and flagged them as high-priority fixes.
Dashboard Deep-Linking Implementation
The progress dashboard at https://progress.queenofsandiego.com uses hash-based routing (client-side navigation without page reloads). When the orchestrator generates a report card, it outputs a deep-link URL following this format:
https://progress.queenofsandiego.com/#card-{card-id}
For example, the GA audit report card is accessible at:
https://progress.queenofsandiego.com/#card-t-31aa2593
This was critical for handoff clarity. Rather than dumping findings in console logs or email, the system generates a structured kanban card with five distinct sections (code gaps by site, traffic trends, campaign status, recommendations, and next steps), and team members can jump directly to it via a deep link. The dashboard JavaScript file (in /Users/cb/Documents/repos/dashboard/) listens for hash changes and loads the appropriate card data from our backend API.
Preflight Checks and Dependency Verification
Before running the audit, we built /Users/cb/Documents/repos/tools/preflight_check.py to verify:
- Google Analytics client libraries are installed (
google-analytics-data,google-auth-service-account) - Service account JSON is readable and valid
- All five GA4 property IDs are reachable via the Analytics Admin API
- Repository file trees are accessible
- Orchestrator process is running
This prevented wasted time diagnosing failures mid-audit. The preflight check runs synchronously before spawning the orchestrator agent.
Operational Excellence Gaps Identified
The audit surfaced three immediate action items:
- Mother's Day Email Blast Unapproved — Scheduled for April 29 with only 4 days until event. The blast script at
/Users/cb/Documents/repos/tools/blast_scheduler.pyhad generated the campaign but it was blocked in a "needs-you" state pending approval. We coordinated approval and dedup logic against the Constant Contact export CSV. - Paul Simon Blast Proof Pending — Proof needed by May 12. Email template located at
/Users/cb/Documents/repos/campaigns/paul_simon_blast/template.html. - GA Data API Access Not Granted — Service account had no IAM permissions in GA4 Admin Console. A 3-minute fix: adding the service account email to the GA4 property's user access list with "Viewer" role resolved this.
GBP and Search Console Verification
We also executed verification workflows for newly claimed Google Business Profile and Search Console access for dangerouscentaur.com. The S3 origin bucket for dangerouscentaur's CloudFront distribution was identified, and an HTML verification file was uploaded to the root of the bucket. Search Console acceptance followed within minutes, allowing us to submit the sitemap and unlock keyword/query reporting.
Infrastructure and File Paths
- Service Account Key Storage: Kept in secure key management (not in repos)
- Orchestrator Output: Kanban cards written to backend API at
https://api.queenofsandiego.com/