Building a Multi-Carrier SMS Relay for QuickDumpNow: Twilio Integration & Lambda Architecture
What Was Done
We implemented a cascading SMS relay system for QuickDumpNow (QDN) to route customer notifications through multiple carriers when the primary line becomes unavailable. The system integrates Twilio as the relay backbone, connects to existing AWS Lambda functions, and uses CloudFront for API routing. This solves a critical operational gap: previously, if Sergio's main Twilio number went down, there was no automatic failover path to his backup carrier.
The Problem
QDN's dispatch workflow requires real-time customer notifications (job status updates, arrival alerts, completion confirmations). The original architecture sent all SMS through a single Twilio number. When that number or its carrier encountered issues, customers couldn't receive updates. Carrier-level failover wasn't possible—we needed application-layer relay logic.
Technical Architecture
Lambda Function Updates
The core relay logic lives in /Users/cb/Documents/repos/sites/dashboard.quickdumpnow.com/lambda/lambda_function.py. We extended the existing message-append function with relay fallback:
def send_notification(job_id, message_text, phone, primary_carrier=None):
"""
Send customer notification with carrier failover.
- Try primary Twilio number first
- On failure, route through backup via secondary carrier
- Log all attempts to maintenance.json for audit trail
"""
The function now:
- Accepts an optional
primary_carrierparameter to route through Twilio's API - Catches carrier-level failures and retries through backup routes
- Persists all relay attempts to the
maintenance.jsonstate file in S3 - Returns a structured response indicating which path succeeded
API Gateway Routes
Four new endpoints were added to API Gateway for the QDN distribution (resource ID varies per environment):
POST /notify/sms— Send customer SMS with automatic failoverPOST /notify/voice— Send voice notification (future capability)POST /relay/status— Query relay health and routing stateOPTIONS /notify/*— CORS preflight for browser-based calls
Each route maps to the same Lambda function with different action path parameters. CORS headers are configured to allow dashboard.quickdumpnow.com and the customer-facing quickdumpnow.com origin.
CloudFront Distribution Setup
The QDN CloudFront distribution (ID: varies by env) now includes:
- A Lambda@Edge function attached to the default behavior, rewriting
/trackrequests to/track/index.html - Two origins: one pointing to the S3 bucket for static assets, one for the API Gateway (origin path:
/prod) - Cache behavior for
/api/*with TTL set to 0 (no caching for API responses)
The tracking page lives at /Users/cb/Documents/repos/sites/quickdumpnow.com/track/index.html and is deployed to the S3 bucket used by the CloudFront distribution.
Data Flow & State Management
The maintenance.json file (seeded at /tmp/maintenance_seed.json before S3 upload) serves as the relay audit log. Every notification attempt is logged with:
- Timestamp and job ID
- Primary carrier attempt (success/failure reason)
- Fallback carrier attempt (if triggered)
- Final delivery status
This file is stored in the QDN data bucket and can be queried via the POST /relay/status endpoint to monitor relay health over time.
Twilio Integration Details
Twilio credentials are stored securely in /Users/cb/Documents/repos/.secrets/repos.env (mode 600, not in version control):
TWILIO_ACCOUNT_SID— Used for admin operations (webhook setup, number provisioning)TWILIO_AUTH_TOKEN— Used for SDK runtime (sending SMS/voice)- API key pair — For programmatic access to rate-limiting and analytics endpoints
The Lambda function imports the Twilio Python SDK and initializes the client at function startup. All SMS sends include metadata headers (job ID, customer name) for routing and analytics.
Deployment Process
Lambda deployment uses a multi-step process:
# Package function + dependencies
zip -r lambda_function.zip lambda_function.py twilio/ boto3/ ...
# Upload to S3 staging bucket
aws s3 cp lambda_function.zip s3://qdn-lambda-staging/
# Update Lambda function code
aws lambda update-function-code \
--function-name qdn-data-crud \
--s3-bucket qdn-lambda-staging \
--s3-key lambda_function.zip
# Verify deployment
aws lambda invoke --function-name qdn-data-crud response.json
After updating the Lambda, API Gateway automatically picks up the new code. CloudFront's cache is invalidated for /api/* patterns to ensure edge nodes fetch fresh code.
Key Decisions
- Lambda@Edge for URL rewriting: The
/track→/track/index.htmlrewrite happens at the edge (us-east-1 CloudFront function), avoiding repeated S3404responses and improving perceived latency. - Synchronous relay with timeout: The
send_notificationfunction uses synchronous calls (not async queues) to provide immediate feedback to the dashboard. A 5-second timeout ensures the failover attempt doesn't block job creation indefinitely. - maintenance.json as single source of truth: Rather than logging to CloudWatch or a separate database, relay state is persisted to the same S3 bucket where job and customer data live. This simplifies audit queries and keeps the data model unified.
- CORS on API Gateway, not Lambda: CORS headers are configured at the Gateway level, not in Lambda response headers. This ensures consistent CORS behavior across all environments and simplifies local testing.
What's Next
With the relay infrastructure in place, the following steps are agent-actionable:
- Provision backup phone numbers: Claim secondary Twilio numbers (one per backup carrier) and test failover with synthetic job events.
- Smoke test end-to-end: Create a test job in the QDN dashboard, trigger a notification, verify it reaches the customer through the primary number, then simulate carrier failure and verify failover.
- Set up CloudWatch alarms: Monitor relay failure rates via the
POST /relay/statusendpoint. Alert Sergio when primary carrier failover is triggered more than 3 times in