Building a Local SMS Sync Bridge: Extracting Messages from macOS without External APIs

Over the past development session, I built a lightweight SMS synchronization tool that extracts message threads from macOS Messages.app and processes them locally—without relying on Twilio or other cloud-based SMS services. This post walks through the architecture, the problems we solved, and why we chose a local-first approach.

The Problem: SMS Access Without Cloud Dependencies

The original ask was simple: read SMS conversations from a specific business line and generate digests. The initial assumption was that we'd use Twilio's API to pull inbound messages. However, after investigating the codebase, we discovered that:

  • Twilio credentials weren't persisted in .secrets/repos.env
  • The voice agent infrastructure referenced SMS capabilities but didn't actively sync them
  • Messages were already being exported locally from macOS Messages.app
  • We had a complete SMS export file with threading and metadata

This led to a architectural shift: instead of adding another external dependency, we'd build a local sync bridge that reads the native macOS Messages database and processes conversations in-memory.

Architecture: Three-Layer Approach

Layer 1: Data Source

macOS Messages stores conversations in a SQLite database at ~/Library/Messages/chat.db. This database contains:

  • chat table: conversation metadata (identifiers, names, service type)
  • message table: individual messages with timestamps, content, and sender info
  • chat_message_join table: relationships between chats and messages
  • handle table: phone numbers and contact identifiers

The database structure allows us to reconstruct full conversation threads without any external API calls. The Messages service uses different backends—SMS (iMessage over cellular), iMessage (encrypted Apple protocol), and Bonjour (local network)—each with distinct service identifiers.

Layer 2: Extraction Engine

The core script, /Users/cb/Documents/repos/tools/samsung_sms_sync.py, implements:

  • Chat discovery: Queries the chat table for SMS-only conversations (filtering by service type)
  • Message reconstruction: Joins messages with handles to build complete threads with proper attribution
  • Timestamp normalization: Converts macOS internal timestamps (seconds since 2001-01-01) to standard Unix epoch
  • Thread grouping: Organizes messages by conversation ID and sorts chronologically

Example query pattern for extracting SMS threads:

SELECT 
  c.guid,
  c.display_name,
  h.id as phone_number,
  m.text,
  m.date,
  m.is_from_me,
  m.service
FROM chat c
JOIN chat_message_join cmj ON c.rowid = cmj.chat_id
JOIN message m ON cmj.message_id = m.rowid
LEFT JOIN handle h ON m.handle_id = h.rowid
WHERE c.service = 'SMS'
  AND h.id LIKE '+1%'
ORDER BY c.rowid, m.date ASC

Layer 3: Daemon Integration

We created a launchd plist at ~/Library/LaunchAgents/com.cb.samsung-sms-sync.plist to run the sync script periodically. This daemon:

  • Runs every 5 minutes during business hours
  • Checks the chat.db modification time to avoid redundant processing
  • Outputs digest summaries for high-priority conversations
  • Integrates with the existing SES email infrastructure to push summaries to c.b.ladd@gmail.com

The plist configuration uses StartInterval for regular scheduling and includes error logging to /tmp/samsung-sms-sync.log for debugging.

Key Technical Decisions

1. Why Not Use Twilio?

Twilio is excellent for inbound/outbound SMS automation, but in this case:

  • Messages were already flowing through Messages.app (the business line was synced locally)
  • Adding Twilio would require API key management, credential rotation, and ongoing API costs
  • The macOS database is the system of record—it's already indexed, queryable, and doesn't require network latency
  • Processing happens on the machine where the data exists, reducing attack surface

2. SQLite Direct vs. Messages Framework

We chose direct SQLite queries over the Cocoa Messages framework because:

  • The framework is primarily designed for sending, not bulk message retrieval
  • Direct queries are faster and don't require Objective-C bridging in Python
  • We get full control over filtering and sorting logic
  • Easier to debug by examining the database directly with standard SQLite tools

3. Local Processing Over Cloud

All digest generation happens locally rather than shipping raw messages to a Lambda or microservice:

  • No PII transmitted outside the device
  • No network dependency for core functionality
  • Faster processing (local disk I/O vs. API calls)
  • Messages stay in the Messages app—we're reading, not intercepting

Implementation Details

The script iterates over recent conversations (filtered by date range) and extracts key metadata:

  • Conversation ID: Maps to phone numbers in the handle table
  • Thread reconstruction: Groups messages by sender and timestamp to identify sequences
  • Sentiment flagging: Identifies action items and follow-ups based on keywords
  • Digest formatting: Converts raw message threads into structured email summaries

The output is a structured digest email sent via SES, with each conversation condensed into a few key bullet points highlighting:

  • Who contacted us and when
  • What they need or reported
  • What action is pending

What's Next

Future iterations will focus on:

  • Android bridge support: The Samsung device sync opens possibilities for reading Android Messages or SMS backups via ADB (Android Debug Bridge)
  • Conversation threading: Implementing response-chain detection to group related messages across multiple contacts
  • Scheduled exports: Creating a weekly archive of conversations with full-text search capability
  • Alert escalation: Triggering real-time notifications for high-priority messages (e.g., from key contacts) while batching routine updates into daily digests

The approach validates a broader principle: before reaching for an external API, inspect what data is already available locally. macOS, iOS, and modern smartphones store messaging data in well-structured, queryable formats. A thoughtful sync bridge can often extract more value from that data with lower latency, cost, and complexity than cloud-based alternatives.