Building a Twilio Relay for Multi-Carrier SMS Cascading: Infrastructure & Integration Patterns
What Was Done
We integrated Twilio's SMS relay infrastructure into the Quick Dump Now (QDN) dispatch workflow to enable carrier-level call forwarding that the underlying telecom infrastructure couldn't support natively. The problem: QDN's primary dispatch line needed to cascade through Sergio's main number and then to a backup carrier line (858-335-4807), but the carrier couldn't implement this at their level. Twilio's programmable SMS and voice APIs provided the abstraction layer needed.
The implementation centered on three moves:
- Credential management: Stored Twilio Account SID, Auth Token, and API key/secret pairs in
/Users/cb/Documents/repos/.secrets/repos.env(mode 600) with a reference memory file to guide future sessions on which credential type to use when. - Lambda integration point: Identified the
qdn-data-crudLambda as the operational anchor for dispatch state mutations, positioning Twilio calls within that execution boundary. - State machine alignment: Mapped the QDN job status state machine to understand where relay triggers should live—specifically at transitions where external notification becomes necessary.
Technical Details: Credential Strategy
Twilio provisions credentials in distinct flavors, and choosing wrong causes runtime failures or security antipatterns:
- Account SID + Auth Token: The master authentication pair. Grants full account access. Appropriate for administrative setup, credential provisioning, and number management via REST API. Example use: listing provisioned numbers, fetching account details.
- API Key + Secret: Scoped, rotatable credentials suitable for application runtime. Better than embedding Account SID + Token in Lambda environment variables or application code. Preferred for SDK initialization within the data-crud Lambda.
We stored both pairs in repos.env so that:
- Local dev/audit scripts can use Account SID + Token for introspection and provisioning.
- Lambda execution assumes the API Key + Secret from its environment, following the principle of least privilege.
- Reference memory (
/Users/cb/.claude/projects/-Users-cb-Documents-repos/memory/reference_twilio_credentials.md) documents this split for clarity in future sessions.
Infrastructure: State Machine & Lambda Positioning
QDN's job lifecycle is tracked by maintenance.json (synced to S3, served via CloudFront). Job states include pending, assigned, in-progress, completed, and cancelled. The dispatch relay should trigger when:
- A job transitions to
assigned(SMS to crew with job details and magic-link for claim/handoff). - A job transitions to
completed(confirmation SMS to stakeholders). - Exception states (e.g., crew abandonment, timeout) require escalation SMS.
The qdn-data-crud Lambda (source pulled into the repo during audit) already handles maintenance.json mutations. Adding Twilio relay logic here means:
- No new Lambda needed; extend the existing function's execution role to include Twilio API calls.
- Use the Twilio SDK for Python to initialize a client with the API Key + Secret at Lambda startup.
- Wrap relay calls in try/except to prevent job mutation rollback if Twilio is transiently unavailable (fire-and-forget with logging).
Example pseudocode for the integration point:
# In qdn-data-crud Lambda handler
import os
from twilio.rest import Client
twilio_api_key = os.environ['TWILIO_API_KEY']
twilio_api_secret = os.environ['TWILIO_API_SECRET']
twilio_account_sid = os.environ['TWILIO_ACCOUNT_SID']
client = Client(twilio_account_sid, twilio_api_key, twilio_api_secret)
def update_job_and_notify(job_id, new_state, crew_phone):
# Mutate job in maintenance.json
job = update_maintenance_json(job_id, new_state)
# Attempt relay (non-blocking)
try:
if new_state == 'assigned':
message = client.messages.create(
body=f"Job {job_id} assigned. Claim here: {magic_link}",
from_=os.environ['TWILIO_FROM_NUMBER'],
to=crew_phone
)
log_relay_event(job_id, 'sms_sent', message.sid)
except Exception as e:
log_relay_event(job_id, 'sms_failed', str(e))
# Job mutation succeeds; relay failure is logged but doesn't block
Key Decisions
Why Twilio over direct carrier API: Carriers offer limited programmatic control; Twilio abstracts across multiple carriers and handles failover. For a cascading forward scenario, Twilio's programmable forwarding and webhook support let us define arbitrary logic (e.g., "try Sergio's number for 15 seconds, then fall back to 858-335-4807") without carrier-specific contracts.
Why API Key + Secret for Lambda, not Account SID + Token: Account credentials are master keys; embedding them in Lambda environment variables violates least-privilege. API Keys can be rotated independently and can be scoped to specific operations (though Twilio's scoping is coarse). In a future iteration, we'd use AWS Secrets Manager to rotate these credentials without Lambda redeployment.
Why fire-and-forget for relay: QDN's job mutation (writing to S3) is authoritative and must not depend on Twilio availability. Inverting the dependency—relay is best-effort, logged separately—keeps the dispatch state machine resilient.
Why reference memory for credentials: Credentials are facts that don't change but are easy to forget or misuse. Storing a structured reference in the project memory ensures the next engineer (or future-you) doesn't guess which credential type to use or where to find it.
What's Next
The blocking work:
- Lambda environment setup: Add
TWILIO_API_KEY,TWILIO_API_SECRET,TWILIO_ACCOUNT_SID, andTWILIO_FROM_NUMBERto theqdn-data-crudLambda's environment variables (via CloudFormation or Terraform, not console UI). - IAM role extension: Confirm the Lambda's execution role allows
secretsmanager:GetSecretValueif we migrate credentials to AWS Secrets Manager (recommended for production). - State machine integration: Extend
update_job_and_notify()to handle all job state transitions, with unit tests covering relay success/failure paths. - Smoke test: E2E test: create a QDN job, watch for SMS delivery to a test phone, verify CloudWatch Logs show relay events.
- Carrier cascading config: Once relay is live, define the Twilio forwarding rules for QDN's primary number (Sergio's number + fallback 858-335-4807). This lives in a Lambda authorizer or separate Twilio webhook handler.
With credentials in place and the integration point identified, the Twilio relay build is unblocked and can be scaffolded