```html

Building a Carrier-Grade SMS Relay for Multi-Tenant Dispatch: QuickDumpNow's Twilio Integration

What We Built

QuickDumpNow's dispatch system required a critical capability: cascading SMS forwarding that respects both operational hierarchy and carrier constraints. When a customer texts the primary dispatch number, the message must route through a chain (primary operator → Sergio's main line → backup operator) without hitting carrier-level limitations on call forwarding. This demanded a purpose-built Twilio relay layer.

The infrastructure now supports:

  • Incoming SMS to QDN's published customer-facing number
  • Server-side routing logic that implements operational hierarchy
  • Automatic failover to backup operators when primary is unavailable
  • Audit trail of all message relays (customer ID, timestamp, operator, routing decision)
  • Zero changes to customer experience or existing dispatch workflows

Architecture & Technical Details

API Gateway & Lambda Foundation

The routing logic lives in a new Lambda function deployed to the dashboard.quickdumpnow.com stack. We added four new POST routes to the API Gateway:

POST /api/sms/incoming      — receives webhooks from Twilio
POST /api/sms/forward       — internal relay to next operator
POST /api/sms/audit         — logs routing decisions to maintenance.json
POST /api/sms/status        — health check for monitoring

Each endpoint has CORS headers pre-flight support (OPTIONS method) to allow cross-origin calls from the dashboard frontend.

Twilio Webhook Integration

When a customer texts QDN's Twilio number, Twilio immediately POSTs to /api/sms/incoming with:

  • From — customer's phone number
  • To — QDN's published Twilio number
  • Body — message content
  • MessageSid — Twilio's unique message ID

The Lambda handler parses this payload, looks up the customer in our database by phone number, and determines the current primary operator from the job state machine. It then calls the Twilio REST API to send an SMS to that operator's personal number with context: 📦 [Customer Name]: [Original Message].

Routing Logic & Failover

The operator chain is defined in a configuration block within the Lambda environment:

DISPATCH_CHAIN = {
  "primary_operator_id": "sergio_main_number",
  "backup_operators": [
    "sergio_backup_number",
    "emergency_contact_number"
  ]
}

When the primary operator doesn't acknowledge receipt within a configurable window (currently 60 seconds), the Lambda automatically attempts the first backup number. This is achieved by:

  1. Writing a message record with status: "pending_ack" to DynamoDB
  2. Scheduling a CloudWatch Events rule that invokes a "retry" Lambda after 60 seconds
  3. If no acknowledgment record exists, the retry Lambda calls Twilio to send to the backup number

Audit & Compliance

Every relay is logged to /maintenance/data/maintenance.json (S3 key: s3://qdn-maintenance/maintenance.json) with the schema:

{
  "timestamp": "2024-01-15T14:23:45Z",
  "message_sid": "SM1234...",
  "customer_id": "cust_5678",
  "customer_name": "Acme Waste Inc.",
  "original_from": "+16195551234",
  "routed_to_operator": "sergio_main",
  "routing_reason": "primary_operator_assigned_to_job",
  "acknowledged": true,
  "ack_timestamp": "2024-01-15T14:23:52Z"
}

This log enables operator performance tracking, customer service recovery (replay message history), and compliance audits.

Infrastructure & Deployment

S3 & CloudFront Updates

The dashboard Lambda functions are packaged via AWS SAM and deployed to the dashboard.quickdumpnow.com CloudFront distribution. The distribution ID is referenced in the deployment pipeline to invalidate cache paths /api/* whenever code changes.

A new CloudFront function was added to the default behavior to rewrite incoming /track requests to /track/index.html, ensuring the customer tracking page loads correctly without explicit extension.

DNS & Domain Wiring

The QDN domain (quickdumpnow.com) has a Route53 A record pointing to the CloudFront distribution. The Twilio-side configuration uses dashboard.quickdumpnow.com/api/sms/incoming as the webhook URL, so Twilio's infrastructure can reach the Lambda through the public CloudFront edge network.

Why CloudFront? It provides DDoS protection, geographic load distribution, and caching for static assets (job tracking page). The API endpoints use Cache-Control: no-cache headers to bypass CloudFront's cache layer while still benefiting from edge security.

Environment & Credentials

Twilio credentials are stored in /Users/cb/Documents/repos/.secrets/repos.env (file mode 600, owned by deployment user) with the format:

TWILIO_ACCOUNT_SID=AC5260bca...
TWILIO_AUTH_TOKEN=aa3355b73...
TWILIO_API_KEY=SK7c7a1c0...
TWILIO_API_SECRET=u2d5U6646...

The Lambda execution role (IAM policy: qdn-lambda-sms-relay-role) has permissions to:

  • Read from arn:aws:s3:::qdn-maintenance/maintenance.json
  • Write audit records (PutObject)
  • Invoke CloudWatch Events for scheduled retries
  • No direct S3 or DynamoDB write permissions (API Gateway enforces auth)

Key Decisions & Trade-offs

Webhook vs. Polling

Decision: Twilio webhooks (push) instead of Lambda polling Twilio's message API.

Why: Webhooks are sub-second latency, scale to zero cost when idle, and match Twilio's design patterns. Polling would require continuous Lambda invocations and DynamoDB queries, increasing cost by ~3x with no latency improvement.

CloudWatch Events for Failover

Decision: Use CloudWatch Events (now EventBridge) to schedule retry Lambdas rather than embedding sleep/retry logic in the handler.

Why: Lambda's 15-minute timeout is plenty for handler execution, but keeping a connection open for 60 seconds wastes execution time and incurs per-100ms billing. EventBridge decouples the retry concern, allows easy tuning of backoff windows, and can be visualized in the AWS console.