Building a Carrier-Grade SMS Relay for Multi-Tenant Dispatch: QuickDumpNow's Twilio Integration
What We Built
QuickDumpNow's dispatch system required a critical capability: cascading SMS forwarding that respects both operational hierarchy and carrier constraints. When a customer texts the primary dispatch number, the message must route through a chain (primary operator → Sergio's main line → backup operator) without hitting carrier-level limitations on call forwarding. This demanded a purpose-built Twilio relay layer.
The infrastructure now supports:
- Incoming SMS to QDN's published customer-facing number
- Server-side routing logic that implements operational hierarchy
- Automatic failover to backup operators when primary is unavailable
- Audit trail of all message relays (customer ID, timestamp, operator, routing decision)
- Zero changes to customer experience or existing dispatch workflows
Architecture & Technical Details
API Gateway & Lambda Foundation
The routing logic lives in a new Lambda function deployed to the dashboard.quickdumpnow.com stack. We added four new POST routes to the API Gateway:
POST /api/sms/incoming — receives webhooks from Twilio
POST /api/sms/forward — internal relay to next operator
POST /api/sms/audit — logs routing decisions to maintenance.json
POST /api/sms/status — health check for monitoring
Each endpoint has CORS headers pre-flight support (OPTIONS method) to allow cross-origin calls from the dashboard frontend.
Twilio Webhook Integration
When a customer texts QDN's Twilio number, Twilio immediately POSTs to /api/sms/incoming with:
From— customer's phone numberTo— QDN's published Twilio numberBody— message contentMessageSid— Twilio's unique message ID
The Lambda handler parses this payload, looks up the customer in our database by phone number, and determines the current primary operator from the job state machine. It then calls the Twilio REST API to send an SMS to that operator's personal number with context: 📦 [Customer Name]: [Original Message].
Routing Logic & Failover
The operator chain is defined in a configuration block within the Lambda environment:
DISPATCH_CHAIN = {
"primary_operator_id": "sergio_main_number",
"backup_operators": [
"sergio_backup_number",
"emergency_contact_number"
]
}
When the primary operator doesn't acknowledge receipt within a configurable window (currently 60 seconds), the Lambda automatically attempts the first backup number. This is achieved by:
- Writing a message record with
status: "pending_ack"to DynamoDB - Scheduling a CloudWatch Events rule that invokes a "retry" Lambda after 60 seconds
- If no acknowledgment record exists, the retry Lambda calls Twilio to send to the backup number
Audit & Compliance
Every relay is logged to /maintenance/data/maintenance.json (S3 key: s3://qdn-maintenance/maintenance.json) with the schema:
{
"timestamp": "2024-01-15T14:23:45Z",
"message_sid": "SM1234...",
"customer_id": "cust_5678",
"customer_name": "Acme Waste Inc.",
"original_from": "+16195551234",
"routed_to_operator": "sergio_main",
"routing_reason": "primary_operator_assigned_to_job",
"acknowledged": true,
"ack_timestamp": "2024-01-15T14:23:52Z"
}
This log enables operator performance tracking, customer service recovery (replay message history), and compliance audits.
Infrastructure & Deployment
S3 & CloudFront Updates
The dashboard Lambda functions are packaged via AWS SAM and deployed to the dashboard.quickdumpnow.com CloudFront distribution. The distribution ID is referenced in the deployment pipeline to invalidate cache paths /api/* whenever code changes.
A new CloudFront function was added to the default behavior to rewrite incoming /track requests to /track/index.html, ensuring the customer tracking page loads correctly without explicit extension.
DNS & Domain Wiring
The QDN domain (quickdumpnow.com) has a Route53 A record pointing to the CloudFront distribution. The Twilio-side configuration uses dashboard.quickdumpnow.com/api/sms/incoming as the webhook URL, so Twilio's infrastructure can reach the Lambda through the public CloudFront edge network.
Why CloudFront? It provides DDoS protection, geographic load distribution, and caching for static assets (job tracking page). The API endpoints use Cache-Control: no-cache headers to bypass CloudFront's cache layer while still benefiting from edge security.
Environment & Credentials
Twilio credentials are stored in /Users/cb/Documents/repos/.secrets/repos.env (file mode 600, owned by deployment user) with the format:
TWILIO_ACCOUNT_SID=AC5260bca...
TWILIO_AUTH_TOKEN=aa3355b73...
TWILIO_API_KEY=SK7c7a1c0...
TWILIO_API_SECRET=u2d5U6646...
The Lambda execution role (IAM policy: qdn-lambda-sms-relay-role) has permissions to:
- Read from
arn:aws:s3:::qdn-maintenance/maintenance.json - Write audit records (PutObject)
- Invoke CloudWatch Events for scheduled retries
- No direct S3 or DynamoDB write permissions (API Gateway enforces auth)
Key Decisions & Trade-offs
Webhook vs. Polling
Decision: Twilio webhooks (push) instead of Lambda polling Twilio's message API.
Why: Webhooks are sub-second latency, scale to zero cost when idle, and match Twilio's design patterns. Polling would require continuous Lambda invocations and DynamoDB queries, increasing cost by ~3x with no latency improvement.
CloudWatch Events for Failover
Decision: Use CloudWatch Events (now EventBridge) to schedule retry Lambdas rather than embedding sleep/retry logic in the handler.
Why: Lambda's 15-minute timeout is plenty for handler execution, but keeping a connection open for 60 seconds wastes execution time and incurs per-100ms billing. EventBridge decouples the retry concern, allows easy tuning of backoff windows, and can be visualized in the AWS console.