```html

Building SMS Digest Automation: Extracting Conversation Threads and Delivering Summaries via SES

During a recent development session, we implemented a workflow to extract SMS conversation threads from the JADA business line and deliver curated digests via Amazon SES. This post covers the technical architecture, tooling decisions, and implementation patterns that enable near-real-time SMS intelligence delivery.

What Was Done

We built a multi-stage pipeline that:

  • Locates and parses SMS export files from the JADA infrastructure
  • Extracts specific conversation threads by phone number matching
  • Filters messages by date range (April 25–29 window)
  • Compiles conversation headers and message bodies into a structured digest
  • Delivers the compiled digest via Amazon SES to stakeholders

The workflow was triggered manually but is designed for integration into automated monitoring pipelines or scheduled jobs.

Technical Architecture

SMS Data Source

Rather than relying on Twilio SDK calls (which require `TWILIO_ACCOUNT_SID` and `TWILIO_AUTH_TOKEN` stored in environment variables), we discovered an existing SMS export file maintained by the voice agent infrastructure. This export is the authoritative record of all inbound/outbound SMS for the JADA business line (+16199867344).

File location: The SMS export is located in the voice agent repository and contains structured conversation logs with metadata headers (sender phone, timestamp, conversation ID) followed by message bodies.

Conversation Parsing Strategy

SMS conversations in the export follow a consistent structure:


--- Conversation Header ---
Phone: +1XXXXXXXXXX
Last Activity: YYYY-MM-DD HH:MM:SS
Conversation ID: [system-generated]

Message 1 [timestamp]: body text
Message 2 [timestamp]: body text

Our extraction logic:

  • Reads the export file sequentially
  • Identifies conversation boundaries using regex patterns matching phone number headers
  • Filters conversations by target phone number (e.g., +15302623442, +16194164690, +16193650804)
  • Applies date range filters to isolate relevant message windows
  • Preserves conversation context by keeping headers intact

Line-Range Based Extraction

When dealing with large SMS export files (which can exceed 50,000+ lines), searching for specific conversation end-markers is more efficient than full-file parsing. We used line-range queries to:


# Example: Extract Sergio's thread from line 8942 to 9187
sed -n '8942,9187p' /path/to/sms_export.txt

This approach avoids loading the entire file into memory and provides surgical precision when conversation boundaries are known.

Infrastructure & Integration Points

Amazon SES for Digest Delivery

Compiled digests are delivered via Amazon SES (Simple Email Service) to stakeholder inboxes. The SES configuration requires:

  • Verified sender address: Must be a domain or email verified in the SES console
  • IAM role permissions: The service running the digest job needs `ses:SendEmail` and `ses:SendRawEmail` permissions
  • Recipient validation: In sandbox mode, recipients must also be verified; production deployments can skip this

The digest is formatted as plain-text email with clear section headers and bullet points, making it scannable on mobile devices and compatible with all email clients.

Voice Agent Integration

The SMS export is maintained by the voice agent service, which handles:

  • Inbound SMS routing to the JADA business line
  • Conversation state management and persistence
  • Export file updates on each new message

The voice agent exports SMS data to a persistent file (not a database query), which allows the digest pipeline to operate independently without requiring Twilio API credentials or rate-limit considerations.

Key Technical Decisions

Why File-Based Export Over Live API Calls

The initial approach assumed we'd query Twilio's REST API in real-time. However, Twilio credentials were not stored in the shared `repos.env` configuration—they're injected only into the voice agent runtime. Rather than introducing credential management complexity, we leveraged the existing SMS export file, which provides:

  • Zero API call overhead: No rate limits, no authentication delays
  • Reliable archival: Export file is the source of truth for audit trails
  • Offline operation: Digest generation works even if Twilio API is temporarily unavailable
  • Simplified secrets management: No additional credentials to rotate

Digest Compilation vs. Real-Time Streaming

We chose batch digest compilation (collecting 4–5 days of messages and sending once) over real-time SMS notifications because:

  • Reduces notification fatigue for high-volume conversations
  • Allows human curation: extracting "hot items" and actionable summaries from raw message volume
  • Enables cross-conversation context: understanding broader patterns (e.g., payment confirmations from multiple clients)
  • Fits the business rhythm: daily or weekly schedules aligned with operational decisions

The digest email becomes a single source of truth for SMS activity rather than 50+ individual notifications.

Workflow Example: Sergio's SMS Thread Extraction

To illustrate the pipeline, here's how we extracted and delivered Sergio's conversation:

  1. Locate export file: Found in voice agent tools directory
  2. Search for conversation header: Regex search for Sergio's phone number in headers
  3. Identify line ranges: Determined that Sergio's thread spans lines 8942–9187
  4. Extract with preservation: Used sed to pull exact line range, preserving conversation context
  5. Compile summary: Human review of raw thread generated actionable items:
    • Payment confirmations (two successful transfers)
    • Equipment needs (compressor replacement, truck/trailer maintenance)
    • Business opportunity (24/7 tire roadside service partnership)
    • Logistics (trailer consolidation at Carlos's warehouse)
  6. Format for email: Structured as executive summary + financial metrics + action items
  7. Send via SES: Delivered to c.b.ladd@gmail.com with conversation reference IDs for traceability

What's Next

This manual workflow is a foundation for automation:

  • Scheduled digest job: Integrate into a Lambda function triggered daily at 6 AM via EventBridge, automating the extraction and SES delivery
  • Conversation clustering: Use NLP to automatically group related threads (e.g., "maintenance requests," "payment confirmations") for faster scanning
  • Alert thresholds: Implement keyword detection to surface urgent items (e.g., "broken," "urgent," "ASAP") with higher priority flags