Notification System Design — Silent SMS Soft-Ban Traps
HTTP 200 from Twilio doesn't mean delivery.
20+ years shipping large-scale distributed systems. Lessons pulled from things that broke in production.
- Decoupled architecture: event producer → message broker → notification service → channel providers
- Fan-out strategies: direct per-user, batch per channel, or tiered routing based on importance
- Rate limiting per channel (e.g., SMS caps) prevents provider bans and ensures fair use
- Deduplication via idempotency keys avoids sending the same notification twice
- Monitoring delivery receipts is essential; a silent failure (e.g., email throttled) won't appear in logs
Imagine a school secretary who gets a single message — 'School is closed tomorrow' — and then has to call every parent by phone, send a text, AND stick a note in every backpack. She can't do all three at once, so she builds a list, hands jobs to helpers, and tracks who actually got the message so nobody is called twice. A notification system is exactly that secretary — it takes one event, figures out who needs to know and how, then farms the work out reliably at millions-of-messages scale.
Every time you get a payment confirmation from your bank, a 'your package shipped' email from Amazon, or a red badge on your Instagram icon, a notification system fired behind the scenes. These systems are invisible when they work and catastrophic when they don't — a failed OTP SMS locks a user out of their account, a duplicate push notification at 3 AM turns a loyal customer into a one-star reviewer. Notification systems are deceptively simple on the surface and brutally hard in production.
The core problem a notification system solves is decoupling event producers from delivery channels. Your payment service shouldn't need to know whether a user prefers SMS, email, or push — and it definitely shouldn't block waiting for Twilio to respond. The notification system absorbs that complexity: it stores user preferences, throttles sends, fans out to multiple channels, retries failures, and records delivery receipts — all asynchronously and at scale.
By the end of this article you'll be able to design a notification system that handles 10 million daily active users across email, SMS, and push — including the fan-out architecture, idempotency strategy, rate limiting design, and the five production edge cases that trip up even experienced engineers. You'll also have the vocabulary and depth to walk through this confidently in a senior system design interview.
What Notification System Design Actually Means
A notification system delivers messages across channels (push, SMS, email, in-app) to users or services. The core mechanic is a publish-subscribe pipeline: a producer emits an event, a router determines delivery rules, and a channel adapter sends the message. The system must handle variable latency, channel failures, and rate limits without dropping messages.
In practice, the design hinges on three properties: reliability (at-least-once delivery via retry queues), idempotency (deduplication keys to prevent double sends), and channel abstraction (each channel has its own failure semantics—SMS can soft-ban, push tokens expire). A typical pipeline uses a message broker (Kafka, SQS) with a dead-letter queue for failed deliveries.
You need this pattern when your application must notify users asynchronously without blocking the request path. It matters because naive inline sends (e.g., calling Twilio in a web request) introduce latency spikes and partial failures. A dedicated notification service isolates concerns, allows throttling, and provides observability into delivery rates.
Functional & Non-Functional Requirements
Before designing any system, define what it must do and how well it must do it. For a notification system, functional requirements include the ability to send notifications via email, SMS, and push; support multiple languages and templates; respect user preferences (opt-in/out); and track delivery status. Non-functional requirements are where senior engineers focus: latency (under 1 second for critical OTPs, up to 5 minutes for promotional emails), throughput (10 million notifications/day), availability (99.99% uptime), and durability (never lose a notification once accepted). The biggest mistake? Treating all notifications with the same priority. An OTP failure is a revenue and security incident; a marketing email failure is a minor miss. Your architecture must distinguish between them.
- Define SLAs per notification type: OTP < 1s, push < 5s, email < 5 min.
- Route critical notifications to a separate high-priority queue with dedicated workers.
- Use a priority queue (e.g., RabbitMQ with priority) or two separate Kafka topics.
- Bulk email can be batched and sent in hourly chunks to avoid rate limiting.
Capacity Estimation: Daily Volume, Peak QPS, and Storage
Before building, estimate the load. For 10 million daily active users (DAU) with an average of 5 notifications per user per day, the total daily volume is 50 million notifications. Assuming 60% email, 25% push, 15% SMS. Peak traffic occurs during business hours (8 AM–10 PM = 14 hours), but marketing campaigns spike activity. Assume peak hour is 2x average: average hourly volume = 50M / 24 ≈ 2.08M/hour; peak hour = 4.16M/hour ≈ 1,156 notifications/second. For critical notifications (OTPs), peak QPS might double again during flash events: 2,300 QPS. Storage: Each notification_event record (event metadata) ~500 bytes, plus delivery_receipts (status, timestamps) ~200 bytes per attempt. With 3 retry attempts on failures (~15% of all deliveries), daily storage ingestion: notification_events: 50M 500B = 25 GB; delivery_receipts: 50M (1 + 0.153) 200B ≈ 50M 1.45 200B ≈ 14.5 GB. Total ~40 GB/day, ~1.2 TB/month. Plan for hot data retention (30 days) in MySQL/PostgreSQL (~1.2 TB), cold archival to S3 for longer. Also, cache user preferences for ~100 bytes per user in Redis: 10M * 100B = 1 GB (easily fits). The key takeaway: storage grows fast; use time-series partitioning and consider data lifecycle policies.
High-Level Architecture: Event Ingestion to Delivery
A notification system can be broken into four layers: ingestion, processing, delivery, and tracking. Ingestion receives an event from any service (e.g., PaymentService sends a 'payment_confirmed' event). The event enters a message broker like Kafka. The processing layer reads events, enriches them with user preferences and templates, decides which channels to use, and applies rate limiting. Then it creates individual notification tasks per channel and pushes them to channel-specific queues. The delivery layer consumes those tasks, calls third-party providers (Twilio for SMS, SendGrid for email, Firebase for push), and stores the result. Tracking layer captures delivery receipts, opens, clicks, and bounces. This architecture decouples producers from delivery, allows independent scaling of each layer, and provides a single place to add monitoring, retries, and compliance.
End-to-End Architecture Diagram
The diagram below shows the four layers of the notification system and the data flow between them. Each layer is independently scalable. The ingestion layer accepts events from any internal service and publishes them to a Kafka topic. The processing layer consumes events, enriches them (user preferences, templates), and fan-outs to channel-specific queues. The delivery layer workers pick up tasks and call third-party providers. Finally, the tracking layer receives delivery receipts via callbacks and updates the database. A monitoring dashboard aggregates metrics from all layers.
Database Schema: notification_events and delivery_receipts
Two core tables store notification data. The notification_events table captures the original event before fan-out. It includes a unique event_id (UUID), user identifier, notification type, channel, template ID, payload (JSON), and creation timestamp. This table is useful for auditing, replay, and deduplication. The delivery_receipts table records the outcome of each delivery attempt per channel. It links to notification_events via event_id, stores the channel, provider response code, delivery status (sent, delivered, bounced, failed), provider-specific message ID, attempt number, and timestamps. Index on event_id and user_id for fast lookups. For 50M daily events, partition delivery_receipts by created_at (monthly) to keep query performance high. Use a write-optimized engine (RocksDB / InnoDB with proper sizing). Also consider a separate table for user notification preferences with columns: user_id, channel, opt_in, last_suppression_window. Cache preferences in Redis.
delivery_receipts with proper indexing reduced reads from 800ms to 5ms.Fan-Out Strategies: Direct, Batch, and Tiered
Fan-out is how one event becomes many deliveries. The simplest approach is direct fan-out: for each event, the processing layer queries the user preference store and creates one notification task per subscribed channel. This works for low volumes but becomes expensive at scale — you're doing N database lookups per event. Batch fan-out groups users by channel and preference, reducing lookups. For example, a 'promotion' event might define a target segment (e.g., all users in tier 'premium'), and the system generates email tasks in bulk without individual lookups. Tiered fan-out combines both: critical events use direct (fast, per-user), bulk events use batch. The trade-off is latency vs. throughput. Direct fan-out is O(1) per event per user; batch fan-out is O(1) per segment. Choose based on the notification type's SLA.
Reliability, Retries, and Dead-Letter Queues
Third-party providers fail — they return 5xx, throttle you, or go down. Your notification system must handle these failures gracefully. The standard pattern: exponential backoff with jitter (e.g., 1s, 4s, 16s, 64s max), configurable max retries per channel and per priority. After exhausting retries, move the task to a dead-letter queue (DLQ). The DLQ stores failed notifications for manual inspection or later reprocessing. Crucially, retries must be idempotent: the same retry should not send two SMS messages. Use a unique idempotency key (e.g., eventId + channel) stored in Redis with a TTL longer than the retry window. For example, if max retries took 5 minutes, set Redis TTL to 30 minutes. Also, the DLQ must trigger an alert — a silent DLQ means you're losing notifications.
Production Gotchas: Rate Limiting, Deduplication, and Channel Failures
Three non-obvious issues that bring down notification systems:
- Rate limiting at the provider level: SMS providers like Twilio impose per-second and per-day limits. Exceeding them triggers soft bans (HTTP 200 but no delivery). Solution: implement a token bucket rate limiter per provider and per channel. Monitor the token consumption against the limit.
- Deduplication across retries and chains: If a user receives a notification that triggers another notification (e.g., 'payment received' triggers 'transaction alert'), you need cross-event deduplication. Use a configurable suppression window (e.g., don't send same type within 5 minutes). Store suppression keys in Redis with TTL.
- Channel failures: Email domain may have strict SPF/DKIM, causing bounces. Push notification certificates expire. SMS aggregator may be down in a region. Solution: have a fallback channel strategy (e.g., if email fails, send SMS). Define fallback rules per notification type. Monitor channel health with synthetic probes.
Notification Channel Comparison: Push, Email, SMS
Choosing the right channel for each notification type depends on cost, latency, and delivery reliability. Push notifications are cheap (volume pricing from FCM/APNs), low latency (< 1s), but require an installed app and user opt-in. Email is moderate cost (e.g., SendGrid ~$0.03 per 1000), latency of seconds to minutes due to SMTP delays, but offers rich formatting and tracking. SMS is expensive ($0.0075–$0.02 per message), low latency (< 5s), but high reliability since almost every phone receives SMS. Each channel has different provider options and SLAs. The table below summarizes key characteristics.
APNs vs FCM: Push Notification Provider Comparison
For push notifications, two major providers dominate: Apple Push Notification service (APNs) for iOS and Firebase Cloud Messaging (FCM) for Android (and also supported on iOS). While both provide HTTP/2 endpoints for sending, key differences exist in delivery semantics, payload size limits, and topic/subscription models. APNs requires a TLS connection with a certificate or token-based authentication; each notification has a unique device token. FCM uses a JSON request with a target (device token, topic, or condition) and can batch messages. APNs provides immediate feedback about invalid tokens; FCM delivers feedback via Cloud Messaging API responses. Payload limits: APNs 4 KB for regular notifications (up to 5 KB for VoIP), FCM 4 KB. Both support silent data notifications, but APNs throttles background updates. For production systems, you'll likely send through both using a unified gateway that routes based on OS.
Rate Limiter, Notification Handler, and User Preferences: The Holy Trinity of Surviving Peak Load
Your fan-out engine does the heavy lifting, but without a rate limiter and a preferences filter you're just a spam cannon. Rate limiting isn't just about API quotas—it’s per-user, per-channel, per-rule. Five push notifications per customer per day, period. Otherwise your e-commerce platform gets banned by FCM and your user uninstalls everything.
The Notification Handler sits behind the rate limiter. It reads user preferences before any channel touches the wire. That user who opted out of promotional SMS at 2 AM? Respect it. You don't send. You don't retry. You log and move on. This reduces channel costs by 18-30% in production.
Why this order? Rate limiting before preferences prevents wasted compute on messages that will be discarded anyway. Every millisecond counts at 100K QPS. Cache preferences in Redis with a TTL of five minutes. If Redis is down, default to send—downtime is worse than a minor preference miss.
Notification Validator and Prioritizer: The Bouncer That Saves Your Queue
Most architects skip validation—they assume every incoming notification is well-formed. Then a developer accidentally sends an empty payload with a mismatched user ID, and your entire pipeline chokes trying to parse garbage JSON. A notification validator sits at the ingest boundary. It checks: does the message have a template ID? Is the recipient list non-empty? Are channel flags valid? If not, reject it in under 5ms with a clear error code. Never let bad data touch your fan-out.
The prioritizer runs right after validation. Not all notifications are equal. An OTP has a hard SLA of 30 seconds. A promotional email can wait three hours. Tag each message with a priority level—P0, P1, P2. Route P0 messages to a dedicated high-priority queue (Redis Streams with consumer groups). P2 goes to Kafka with longer batch intervals. This ensures critical messages skip the backlog when there's a traffic spike.
In production, I've seen a 40% drop in P0 SLA violations just by adding a fast validation layer and a dedicated priority queue. Cheap win. Do it.
Why Pipelines Beat Monolithic Handlers
A monolithic notification handler does everything: templating, batching, rate limiting, and sending. When one step fails, the entire message drops. Pipelines decompose processing into ordered stages connected by queues. Each stage runs independently, scales separately, and fails without losing the message. The scheduler pushes raw events into the first stage; then a templater merges user data; a prioritizer reorders the batch; a rate limiter throttles per channel; and a sender finalizes delivery. If the email channel slows down, the sender stage backs up but the templater keeps working. Dead-letter queues catch only the stage that errored, not the whole pipeline. Why this matters: pipelines let you add or remove stages—like a spam filter or a deduplication step—without rewriting the system. You also get natural backpressure: downstream queue length signals upstream producers to slow down. In production, never tie pipeline stage count to the number of channels. Use a configurable topology where stages are pluggable workers. Each stage writes metrics: input rate, output rate, error count. That single dashboard tells you exactly where the system is choking.
The Scheduler Is Your Demand Governor
Notification systems fail not because they cannot send, but because they send too much at the wrong time. The scheduler sits before all processing pipelines, not after. Its job: delay, batch, and reorder events based on three signals—user timezone, channel capacity, and event priority. Why this order: if a user is asleep (e.g., 3 AM their local time), hold the push notification until 8 AM. If the email API has a 10 TPS limit, batch pending emails into 100-message groups and release them at exactly 10 per second. If a password-reset alert has priority 10 and a marketing blast has priority 1, the scheduler always picks the password reset first. The scheduler uses a priority queue keyed by scheduled_timestamp, not arrival time. Every new event runs through a UserPreferences lookup that returns timezone + quiet_hours. Then the scheduler computes the earliest safe delivery time. That timestamp becomes the sort key. Workers pull only from the head of the queue, so the system never sends during quiet hours. In production, measure scheduler queue depth per channel. A growing queue means either capacity is too low or you need to throttle upstream producers. Never let the scheduler buffer more than 10 minutes of peak traffic—otherwise memory blows.
The Silent SMS Block: How Unlimited Retries Brought Down a Payment Pipeline
- HTTP 200 from a third-party provider does NOT mean delivery — always monitor delivery receipts.
- Retry policies must respect provider rate limits and include exponential backoff with jitter.
- Silent failures are the most dangerous: create synthetic probes that actually verify end-to-end delivery.
kubectl rollout restart -n notifications deployment/notification-workercurl localhost:8080/actuator/health | jq .Key takeaways
Common mistakes to avoid
4 patternsAssuming HTTP 200 from provider means delivery
No rate limiting per channel
Using a single queue for all notification types
Missing idempotency on retries
Interview Questions on This Topic
How would you design a notification system that handles 10 million daily active users across SMS, email, and push? Discuss the key architectural components and trade-offs.
Frequently Asked Questions
20+ years shipping large-scale distributed systems. Lessons pulled from things that broke in production.
That's Real World. Mark it forged?
11 min read · try the examples if you haven't