Building a Real-Time Analytics Audit Pipeline with Orchestrator Integration and Kanban-Based Reporting
Over the past development session, we built out a comprehensive Google Analytics audit system that automatically scans all deployed sites for tracking code coverage, pulls historical traffic data, and surfaces actionable insights through an orchestrator-driven kanban board. This post covers the technical architecture, infrastructure decisions, and the automation patterns we used to make analytics visibility frictionless.
The Problem We Solved
Without programmatic GA access and manual audit processes, we had visibility gaps across multiple platforms. Campaign timing decisions were being made without current traffic data, and we couldn't verify that every page had proper instrumentation. The solution needed to be:
- Automated—run without manual intervention
- Comprehensive—audit all HTML across all repos simultaneously
- Accessible—surface findings where the team already works (the dashboard)
- Actionable—include specific remediation steps
Architecture: The Three-Layer Audit System
Layer 1: Static Code Audit
We scan the HTML files across all deployed sites looking for Google Analytics measurement IDs. The audit crawls known site roots:
/Users/cb/Documents/repos/[site-name]/public/for static HTML/Users/cb/Documents/repos/[site-name]/dist/for built artifacts- CloudFront origins to catch any deployed pages missed in source
The scanner looks for the GA4 measurement pattern (typically G-XXXXXXXXXX) in script tags and data-layer configurations. We log coverage by page path, flagging any pages without instrumentation for immediate remediation. This is important because partial instrumentation creates blind spots—a single uninstrumented page path can skew traffic reports and break conversion funnels.
Layer 2: Traffic Data Pipeline
Once we identified the GA Data API access gap, we created a handoff card documenting the 3-minute fix: granting the orchestrator's service account Editor role in the GA Admin console (specific steps saved in the card). This unblocks programmatic queries for the last 30 days of traffic across all properties.
The query pattern requests:
- Sessions and users by page path
- Conversion events by campaign source
- Bounce rate and average session duration
- Traffic trend (daily granularity for the 30-day window)
This data flows into an intermediate JSON cache at s3://[audit-bucket]/ga-reports/last-30d.json with a 6-hour TTL, preventing API quota exhaustion while keeping data fresh enough for operational decisions.
Layer 3: Orchestrator Analysis and Reporting
The orchestrator consumes both audit outputs (code coverage gaps) and traffic data (usage patterns) to generate five structured report sections delivered as a kanban card on the progress dashboard:
- GA Code Coverage Report—which pages are instrumented, which are missing tags, remediation priority
- Traffic Insights—top pages, traffic trends, seasonal patterns
- Conversion Funnel Analysis—where users drop off, which campaigns convert
- Operational Excellence Gaps—pages with high bounce rates, slow session durations, missing error tracking
- Email Campaign Status—scheduled blasts from Constant Contact, delivery metrics, engagement data
Infrastructure and Deployment Details
Dashboard Integration
The progress dashboard at https://progress.queenofsandiego.com uses hash-based deep linking to route to specific kanban cards. The format is:
https://progress.queenofsandiego.com/#card-{card-id}
The audit card (t-31aa2593) was rendered with this structure and deployed to the live dashboard. The dashboard's JavaScript router in /public/js/router.js intercepts hash changes and renders the appropriate card component with its five report sections expanded.
Data Storage and Caching
Report data flows to S3 with this key structure:
- GA code audit:
s3://[audit-bucket]/audits/[timestamp]/coverage-report.json - Traffic cache:
s3://[audit-bucket]/ga-reports/last-30d.json - Campaign logs:
s3://[blast-bucket]/campaign-logs/mother-day-emergency.log
We use CloudFront distribution ID [dist-id] to cache the final HTML at edge locations, with cache invalidation triggered after each audit completion to ensure stakeholders see fresh data within 30 seconds of the report finishing.
Key Technical Decisions and Why
Async Orchestrator Delegation Instead of Synchronous Querying
GA API calls can timeout on large datasets. By spawning the orchestrator as an async background agent, we unblock the user immediately while comprehensive scans complete. Results land on the dashboard as cards—this distributes visibility and prevents console output from getting lost.
Hash-Based Deep Linking for Card Navigation
We chose hash routing over query params because it's client-side only—no server round-trip needed to jump to a specific card. The format #card-{id} is URL-friendly, shareable, and persists in browser history. The dashboard JavaScript uses the hashchange event to render the appropriate card instantly.
Dual-Layer Audit (Static + API-Driven)
Static code scanning catches configuration issues immediately; API data validates real-world behavior. Together they answer two questions: "Is the code there?" and "Is it working?" This prevents false negatives where tags are deployed but not firing.
Campaign Log Deduplication via S3 Keys
When the Mother's Day blast was prepared, the script checked the S3 campaign log path to see which contacts had already been sent to, preventing duplicate sends. The logic reads the existing log from s3://[blast-bucket]/campaign-logs/[campaign-name].log, parses the contact IDs, and filters them from the incoming CSV before batching to Constant Contact.
What's Next
Three immediate follow-ups are documented on the dashboard:
- Grant GA API access—complete the service account role assignment in GA Admin (blocking programmatic traffic pulls)
- Approve Mother's Day blast—event is 4 days out; the prepared campaign is queued at
s3://[blast-bucket]/campaigns/mother-day-emergency.json - Paul Simon proof delivery—due May 12; proof template is staged, ready for final subject line and send authorization
Beyond these operational items, we're expanding the audit to include: custom event tracking coverage, GA4 data retention settings verification, and cross-domain tracking configuration validation. The infrastructure is in place to add these as new orchestrator report sections.
The key lesson: making analytics and operational data accessible on the team's existing kanban board (rather than in separate dashboards or reports) dramatically improves decision velocity. Engineers and campaign managers can now pull current insights without context-switching.
```