```html

Building a Multi-Carrier SMS Relay for QuickDumpNow: Twilio Integration & Lambda Architecture

What Was Done

We implemented a cascading SMS relay system for QuickDumpNow (QDN) to route customer notifications through multiple carriers when the primary line becomes unavailable. The system integrates Twilio as the relay backbone, connects to existing AWS Lambda functions, and uses CloudFront for API routing. This solves a critical operational gap: previously, if Sergio's main Twilio number went down, there was no automatic failover path to his backup carrier.

The Problem

QDN's dispatch workflow requires real-time customer notifications (job status updates, arrival alerts, completion confirmations). The original architecture sent all SMS through a single Twilio number. When that number or its carrier encountered issues, customers couldn't receive updates. Carrier-level failover wasn't possible—we needed application-layer relay logic.

Technical Architecture

Lambda Function Updates

The core relay logic lives in /Users/cb/Documents/repos/sites/dashboard.quickdumpnow.com/lambda/lambda_function.py. We extended the existing message-append function with relay fallback:

def send_notification(job_id, message_text, phone, primary_carrier=None):
    """
    Send customer notification with carrier failover.
    
    - Try primary Twilio number first
    - On failure, route through backup via secondary carrier
    - Log all attempts to maintenance.json for audit trail
    """

The function now:

  • Accepts an optional primary_carrier parameter to route through Twilio's API
  • Catches carrier-level failures and retries through backup routes
  • Persists all relay attempts to the maintenance.json state file in S3
  • Returns a structured response indicating which path succeeded

API Gateway Routes

Four new endpoints were added to API Gateway for the QDN distribution (resource ID varies per environment):

  • POST /notify/sms — Send customer SMS with automatic failover
  • POST /notify/voice — Send voice notification (future capability)
  • POST /relay/status — Query relay health and routing state
  • OPTIONS /notify/* — CORS preflight for browser-based calls

Each route maps to the same Lambda function with different action path parameters. CORS headers are configured to allow dashboard.quickdumpnow.com and the customer-facing quickdumpnow.com origin.

CloudFront Distribution Setup

The QDN CloudFront distribution (ID: varies by env) now includes:

  • A Lambda@Edge function attached to the default behavior, rewriting /track requests to /track/index.html
  • Two origins: one pointing to the S3 bucket for static assets, one for the API Gateway (origin path: /prod)
  • Cache behavior for /api/* with TTL set to 0 (no caching for API responses)

The tracking page lives at /Users/cb/Documents/repos/sites/quickdumpnow.com/track/index.html and is deployed to the S3 bucket used by the CloudFront distribution.

Data Flow & State Management

The maintenance.json file (seeded at /tmp/maintenance_seed.json before S3 upload) serves as the relay audit log. Every notification attempt is logged with:

  • Timestamp and job ID
  • Primary carrier attempt (success/failure reason)
  • Fallback carrier attempt (if triggered)
  • Final delivery status

This file is stored in the QDN data bucket and can be queried via the POST /relay/status endpoint to monitor relay health over time.

Twilio Integration Details

Twilio credentials are stored securely in /Users/cb/Documents/repos/.secrets/repos.env (mode 600, not in version control):

  • TWILIO_ACCOUNT_SID — Used for admin operations (webhook setup, number provisioning)
  • TWILIO_AUTH_TOKEN — Used for SDK runtime (sending SMS/voice)
  • API key pair — For programmatic access to rate-limiting and analytics endpoints

The Lambda function imports the Twilio Python SDK and initializes the client at function startup. All SMS sends include metadata headers (job ID, customer name) for routing and analytics.

Deployment Process

Lambda deployment uses a multi-step process:

# Package function + dependencies
zip -r lambda_function.zip lambda_function.py twilio/ boto3/ ...

# Upload to S3 staging bucket
aws s3 cp lambda_function.zip s3://qdn-lambda-staging/

# Update Lambda function code
aws lambda update-function-code \
  --function-name qdn-data-crud \
  --s3-bucket qdn-lambda-staging \
  --s3-key lambda_function.zip

# Verify deployment
aws lambda invoke --function-name qdn-data-crud response.json

After updating the Lambda, API Gateway automatically picks up the new code. CloudFront's cache is invalidated for /api/* patterns to ensure edge nodes fetch fresh code.

Key Decisions

  • Lambda@Edge for URL rewriting: The /track/track/index.html rewrite happens at the edge (us-east-1 CloudFront function), avoiding repeated S3 404 responses and improving perceived latency.
  • Synchronous relay with timeout: The send_notification function uses synchronous calls (not async queues) to provide immediate feedback to the dashboard. A 5-second timeout ensures the failover attempt doesn't block job creation indefinitely.
  • maintenance.json as single source of truth: Rather than logging to CloudWatch or a separate database, relay state is persisted to the same S3 bucket where job and customer data live. This simplifies audit queries and keeps the data model unified.
  • CORS on API Gateway, not Lambda: CORS headers are configured at the Gateway level, not in Lambda response headers. This ensures consistent CORS behavior across all environments and simplifies local testing.

What's Next

With the relay infrastructure in place, the following steps are agent-actionable:

  • Provision backup phone numbers: Claim secondary Twilio numbers (one per backup carrier) and test failover with synthetic job events.
  • Smoke test end-to-end: Create a test job in the QDN dashboard, trigger a notification, verify it reaches the customer through the primary number, then simulate carrier failure and verify failover.
  • Set up CloudWatch alarms: Monitor relay failure rates via the POST /relay/status endpoint. Alert Sergio when primary carrier failover is triggered more than 3 times in