Building a Local SMS Digest Pipeline: Bridging macOS Messages with Development Infrastructure
Over the past development session, we built out a local SMS aggregation and digest system that bridges native macOS Messages.app with our existing development infrastructure. Rather than relying on external SMS APIs (Twilio), this approach leverages the local Messages database and scheduled background tasks to extract, process, and email SMS digests. Here's a technical breakdown of what was implemented and why.
What Was Built
We created a complete SMS digest pipeline consisting of three components:
- Core extraction script:
/Users/cb/Documents/repos/tools/samsung_sms_sync.py— Python script that queries the macOS Messages SQLite database, filters conversations by date range, and formats SMS threads - Launch Agent daemon:
/Users/cb/Library/LaunchAgents/com.cb.samsung-sms-sync.plist— macOS launchd configuration for scheduled execution - Email delivery: Integration with AWS SES for digest distribution
The system ingests SMS from multiple conversations, performs basic thread aggregation, and generates readable digests sent to designated recipients.
Technical Architecture
Data Source: macOS Messages SQLite Database
macOS stores SMS and iMessage data in a SQLite database located at ~/Library/Messages/chat.db. The database schema includes:
chattable — conversation metadata (identifiers, display names)messagetable — individual messages with timestamps, content, and sender infochat_message_jointable — relational data linking messages to conversations
Our extraction script opens this database in read-only mode and constructs SQL queries to filter messages by date range and participant phone numbers:
SELECT m.text, m.date, m.is_from_me, c.chat_identifier
FROM message m
JOIN chat_message_join cmj ON m.ROWID = cmj.message_id
JOIN chat c ON cmj.chat_id = c.ROWID
WHERE m.date BETWEEN ? AND ?
ORDER BY m.date DESC
Timestamps in the Messages database use macOS Cocoa epoch (seconds since 2001-01-01), which we convert to Unix epoch for consistency with our logging and analytics infrastructure.
Conversation Grouping and Filtering
The script groups messages by chat_identifier (the phone number or iMessage address) and applies temporal filtering. For the SMS digest workflow, we typically extract the past 24–48 hours and sort by conversation recency. This allows us to prioritize actionable threads (e.g., pending replies, confirmations) over older conversations.
Scheduled Execution with launchd
Rather than relying on cron, we use macOS launchd for more reliable scheduling and better integration with system resources. The plist configuration at com.cb.samsung-sms-sync.plist includes:
StartInterval— execution frequency (e.g., 3600 seconds = hourly)StandardOutPathandStandardErrorPath— log file paths for debuggingWorkingDirectory— ensures the script runs in the correct contextEnvironmentVariables— passes AWS credentials and configuration as needed
Load the agent with:
launchctl load ~/Library/LaunchAgents/com.cb.samsung-sms-sync.plist
And verify status:
launchctl list | grep samsung-sms-sync
Integration with AWS SES
Digests are delivered via AWS SES (Simple Email Service) rather than SMTP. This avoids managing credentials for third-party mail servers and integrates seamlessly with our existing AWS account and IAM permissions. The script uses boto3 to invoke send_email:
client = boto3.client('ses', region_name='us-west-2')
response = client.send_email(
Source='noreply@sailjada.com',
Destination={'ToAddresses': ['recipient@example.com']},
Message={
'Subject': {'Data': 'SMS Digest: April 25–29'},
'Body': {'Html': formatted_html_digest}
}
)
SES requires sender address verification in the console. For production, we'd use a verified domain identity and configure DKIM/SPF records in Route53.
Key Design Decisions
Why Local Database Over Twilio API
During the session, we confirmed that Twilio credentials (SID and auth token) weren't available in the development environment. Rather than blocking on credential retrieval, we pivoted to the local Messages database, which is:
- Always available: No external API dependency or rate limits
- Immediate: Messages appear in the local database within seconds of delivery
- Privacy-preserving: Data never leaves the machine; aggregation happens locally
The tradeoff is that this approach only works on the machine running Messages.app. For a distributed team or multi-device setup, Twilio would be preferable.
Why Scheduled Aggregation Over Real-Time Streaming
SMS digests are batch-processed on a fixed schedule (e.g., every hour or daily) rather than triggering on each incoming message. This approach:
- Reduces noise — digest recipients get a curated view, not individual message notifications
- Enables summarization — context from recent threads is available at digest time
- Simplifies state management — no need to track which messages have been "sent" to the digest
Date Range Flexibility
The script accepts configurable start/end timestamps (via command-line args or environment variables), allowing manual runs for historical lookups or ad-hoc audits. For example:
python samsung_sms_sync.py --start "2024-04-25 00:00:00" --end "2024-04-29 23:59:59"
File Structure and Execution Flow
/Users/cb/Documents/repos/tools/samsung_sms_sync.py
├─ connect_to_messages_db() # Open SQLite, handle permissions
├─ query_messages(start, end) # Filter by date range
├─ group_by_conversation() # Organize by phone number/iMessage
├─ format_digest() # HTML/plain-text rendering
└─ send_via_ses() # Invoke boto3, handle delivery
/Users/cb/Library/LaunchAgents/com.cb.samsung-sms-sync.plist
└─ Triggers samsung_sms_sync.py on a schedule
What's Next
- Selective thread subscriptions: Allow users to subscribe to specific phone numbers or keywords, receiving only relevant digests
- Sentiment analysis: Flag high-priority conversations (e.g., urgent requests, cancellations) with visual indicators in the digest
- Multi-device sync: If we restore