Building a Granular Technical Blog System: Auto-Generated Session Capture Across Four Domain Properties
What Was Done
Implemented a comprehensive technical blog generation system that automatically captures, processes, and publishes detailed development work across four independent domain properties: tech.queenofsandiego.com, tech.sailjada.com, tech.dangerouscentaur.com, and tech.burialsatseasandiego.com. The system triggers at the end of each Claude Code development session, extracts granular session data (file modifications, commands executed, reasoning), filters out sensitive credentials, and publishes a formatted blog post to the appropriate tech blog property.
Technical Architecture
The system consists of three core components working in concert:
- Stop Hook Script (
/Users/cb/.claude/hooks/tech_blog_stop.sh): Bash script invoked by Claude Code when a session ends. Extracts the session transcript path from the environment, validates it exists, and triggers the Python blog generator with the appropriate domain configuration. - Blog Generator (
/Users/cb/Documents/repos/tools/tech_blog_generator.py): Python application that reads session transcripts in JSONL format, parses file modifications and command history, applies credential filtering patterns, generates formatted HTML content, and uploads to the appropriate S3 bucket with CloudFront invalidation. - Infrastructure Initializer (
/Users/cb/Documents/repos/tools/tech_blog_init.py): One-time setup script that provisions S3 buckets, CloudFront distributions, ACM certificates, and DNS records for each tech blog subdomain.
Infrastructure Provisioning Details
Each of the four tech blog properties required identical infrastructure patterns:
- S3 Buckets: Created with names matching pattern
tech-[domain]-blog(e.g.,tech-queenofsandiego-blog,tech-sailjada-blog). Buckets configured as static website hosts with index.html as the default document. Block public access settings disabled to allow CloudFront access; IAM policies restrict direct S3 access to CloudFront origin identity only. - CloudFront Distributions: Provisioned for each bucket with origin access identity (OAI) to prevent public S3 bucket access. HTTP-to-HTTPS redirect enforced. Cache behavior configured with 300-second TTL for HTML files and 3600-second TTL for static assets.
- ACM Certificates: Leveraged existing wildcard certificates where available:
*.queenofsandiego.comwildcard certificate (already present in us-east-1)*.sailjada.comwildcard certificate (already present in us-east-1)tech.dangerouscentaur.comcertificate provisioned and validated via emailtech.burialsatseasandiego.comcertificate provisioned with DNS validation via GoDaddy API
- DNS Configuration:
- queenofsandiego.com & sailjada.com: Route53 hosted zones (IDs: Z1234... redacted) updated with CNAME records pointing subdomains to CloudFront distribution domain names
- dangerouscentaur.com: Namecheap DNS updated with CNAME
tech.dangerouscentaur.com → [cloudfront-domain-name] - burialsatseasandiego.com: GoDaddy DNS updated with CNAME via GoDaddy API integration; certificate validation performed through DNS CNAME method rather than email
Session Transcript Parsing
Claude Code session transcripts are stored in JSONL format with entries containing type, content, tool_use_id, and other metadata. The blog generator parses this format to extract:
- File Operations: Lines prefixed with "Write:" or "Edit:" followed by absolute file paths
- Commands Executed: Lines prefixed with "- " containing shell commands, AWS CLI calls, or tool invocations
- Agent Reasoning: Multi-line blocks beginning with "Agent notes / reasoning:" that explain decision-making
Content filtering removes patterns matching:
- AWS credentials and access keys (pattern:
AKIA[0-9A-Z]{16}) - GoDaddy API keys and secrets (pattern:
sso_key=.*,api_key=.*) - Slack tokens and webhook URLs
- Database connection strings containing passwords
- Email addresses and phone numbers from certain contexts
- Environment variable values containing sensitive data
The generator preserves legitimate technical context (S3 bucket names, CloudFront distribution IDs, Route53 zone IDs, file paths) while redacting only the sensitive components.
HTML Generation and Publishing
The generator creates semantic HTML with:
- Metadata Section: Publication timestamp, session duration, session ID for reference
- File Modifications Table: Two-column format showing operation type (Write/Edit) and file path, organized by directory for readability
- Commands Section: Grouped by category (AWS, Git, Python, etc.) with syntax highlighting markers for code blocks
- Architecture/Reasoning Section: Prose explanation extracted from agent reasoning blocks, maintaining technical depth
- Navigation Integration: Each blog post includes breadcrumb navigation back to parent domain and link to tech blog index
Generated HTML is uploaded to the appropriate S3 bucket's posts/ directory with filename pattern YYYY-MM-DD-HH-MM-SS-session-summary.html. An index.html is maintained in the S3 bucket root, listing all posts with chronological ordering and search capability (client-side JavaScript filtering).
Integration with Site Navigation
Each primary domain (queenofsandiego.com, sailjada.com, dangerouscentaur.com, burialsatseasandiego.com) has its "Ship's Papers" or main navigation menu updated to include a "Technical Blog" link pointing to the appropriate tech subdomain. This allows stakeholders like Sergio to access detailed technical documentation without leaving the main site context.
Key Decisions
- Wildcard Certificates Over Individual Certs: Leveraging existing wildcard certificates for queenofsandiego.com and sailjada.com reduced provisioning time and complexity. This decision required careful planning to ensure all tech subdomains fall within the wildcard scope.
- CloudFront Invalidation Strategy: Each blog post upload triggers CloudFront cache invalidation on
/posts/*pattern only, not the entire distribution. This balances consistency (readers always see latest posts) with cost efficiency (wildcard invalidations incur charges). - Granular Filtering vs. Privacy: Rather than redacting entire sections, the system surgically removes only credential patterns while preserving architectural context. This allows readers to understand exactly what infrastructure changes