Building a Task Notification System for Distributed Maintenance Operations
The Queen of San Diego maintenance tool (maintenance.queenofsandiego.com) needed a critical capability: surfacing new tasks added by crew members and notifying operations leadership in real-time. This post documents the architecture, implementation decisions, and deployment strategy for a multi-component notification system that prioritizes tasks by criticality.
The Problem Statement
Travis was adding maintenance tasks to the system, but there was no mechanism for:
- Surfacing newly added tasks to the primary user interface
- Notifying Sergio and other operations personnel when tasks were added
- Determining notification cadence based on task criticality
- Maintaining clean separation between staging and production environments
The existing maintenance tool at /Users/cb/Documents/repos/sites/queenofsandiego.com/tools/maintenance/staging-index.html was a static interface without persistence or notification capabilities.
Architecture Overview
We implemented a three-tier solution:
- Persistence Layer:
MaintenancePersistence.gs— Google Apps Script module handling CloudFirestore operations - Notification Layer: Lambda function triggered on task creation events
- UI Integration Layer: Modified staging HTML with task polling and real-time updates
Technical Implementation Details
1. Google Apps Script Persistence Module
Created MaintenancePersistence.gs to abstract Firestore operations:
// Function signatures for task persistence
function saveMaintenance Task(taskData) {
// Validates task criticality enum
// Stores in Firestore with timestamp
// Returns document ID for reference
}
function getMaintenance Tasks(filters) {
// Queries by status, criticality, date range
// Returns sorted array with metadata
}
function updateTaskStatus(taskId, newStatus) {
// Atomic update operation
// Triggers notification handler via doPost
}
This module was added to the existing GAS project managing queenofsandiego.com. The script ID is referenced in .clasp.json and deployed to the bound Google Sheet that backs the booking and maintenance systems.
2. Routing in BookingAutomation.gs
Extended the existing doPost handler in BookingAutomation.gs to route maintenance-related actions:
function doPost(e) {
const action = e.parameter.action;
if (action === 'log_maintenance') {
return handleMaintenance LogCreation(e);
} else if (action === 'get_maintenance_tasks') {
return handleGetMaintenanceTasks(e);
} else if (action === 'update_task_status') {
return handleTaskStatusUpdate(e);
}
// ... other routing
}
This routing approach maintains backward compatibility while adding new maintenance endpoints. Each handler validates inputs and calls the appropriate MaintenancePersistence.gs function.
3. Notification Strategy
Research from high-performing DevOps and maintenance teams (referencing patterns from AWS maintenance windows, Google SRE practices, and incident management systems) informed our notification approach:
- Critical Tasks (P1): Immediate email to jadasailing@gmail.com
- High Priority Tasks (P2): Digest email every 4 hours during operational windows
- Standard Tasks (P3): Daily digest at end of day (6 PM PT)
- Low Priority Tasks (P4): Weekly summary
This tiered approach prevents alert fatigue while ensuring critical issues surface immediately—a pattern documented in Google's SRE book and adopted by teams managing continuous operations.
4. Lambda Deployment for Notifications
Rather than handle all notifications synchronously in GAS, we deployed a Lambda function triggered by CloudFirestore events. This follows the AWS Well-Architected Framework principle of decoupling components:
// Lambda trigger configuration
// Event source: Firestore collection "maintenance_tasks"
// Trigger type: Document create
// Function timeout: 60 seconds
// Memory: 256 MB
// Runtime: Node.js 18.x
// Function executes:
// 1. Extract task criticality from event
// 2. Query email roster from DynamoDB table
// 3. Build email template with task details
// 4. Send via SES to appropriate recipients
// 5. Log to CloudWatch for audit trail
The IAM role used references the same execution role as the tips-box Lambda function already deployed in the account, ensuring consistent permissions and auditability.
Frontend UI Modifications
Updated staging-index.html with JavaScript to poll for new tasks and display them prominently:
- Added task polling interval (30-second check for new additions)
- Implemented "new tasks" banner that appears above the fold when tasks are added
- Added task criticality color coding (red for P1, orange for P2, yellow for P3, gray for P4)
- Integrated with existing authentication system to verify user permissions before displaying sensitive tasks
The staging HTML was deployed to the S3 bucket backing the CloudFront distribution for maintenance.queenofsandiego.com at path /staging/index.html, with CloudFront cache invalidation immediately after deployment.
Staging vs. Production Strategy
Until formal staging/production separation is implemented, testing follows this pattern:
- All email notifications during testing phase are sent to
jadasailing@gmail.com - A configuration parameter in GAS controls the "environment" flag
- Lambda checks this flag before sending to the full recipient list
- Staging UI is accessible at a separate path with staging Lambda function
This provides test isolation without requiring infrastructure duplication during the pilot phase.
Calendar Integration
Created a shared Google Calendar, "Jada Maintenance," accessible to jadasailing@gmail.com`. The notification Lambda automatically creates calendar events for all P1 and P2 tasks, providing:
- Visual representation in calendar interfaces
- Integration with ops team's existing calendar workflows
- Automatic reminders 2 hours before estimated start time
This leverages Google Calendar's native sharing and notification capabilities rather than building custom calendar logic.
Deployment and Rollout
The implementation was deployed in phases:
# 1. Push GAS changes
clasp push
# 2. Deploy staging HTML
aws s3 cp staging-index.html \
s3://jada-maintenance-staging/staging/index.html
# 3. Invalidate CloudFront cache
aws cloudfront create-invalidation \
--distribution-id [STAGING_DIST_ID] \
--paths "/staging/*"
# 4. Deploy Lambda function
# (via existing CI/CD pipeline integration)
Key Decision: Why Lambda Instead of GAS for Notifications
GAS has a 6-minute execution timeout and quotas on emails sent per