Senior 11 min · March 06, 2026

Notification System Design — Silent SMS Soft-Ban Traps

HTTP 200 from Twilio doesn't mean delivery.

N
Naren Founder & Principal Engineer

20+ years shipping large-scale distributed systems. Lessons pulled from things that broke in production.

Follow
Production
production tested
May 23, 2026
last updated
1,554
articles · all by Naren
 ● Production Incident 🔎 Debug Guide ⚙ Triage Commands
Quick Answer
  • Decoupled architecture: event producer → message broker → notification service → channel providers
  • Fan-out strategies: direct per-user, batch per channel, or tiered routing based on importance
  • Rate limiting per channel (e.g., SMS caps) prevents provider bans and ensures fair use
  • Deduplication via idempotency keys avoids sending the same notification twice
  • Monitoring delivery receipts is essential; a silent failure (e.g., email throttled) won't appear in logs
✦ Definition~90s read
What is Design a Notification System?

Notification system design is the architecture and engineering behind reliably delivering messages—push, email, SMS, in-app—to users at scale. It's not just about sending; it's about guaranteeing delivery within latency SLAs while handling failures, deduplication, and rate limits.

Imagine a school secretary who gets a single message — 'School is closed tomorrow' — and then has to call every parent by phone, send a text, AND stick a note in every backpack.

A well-designed system decouples event ingestion from delivery, uses queues for backpressure, and tracks every message through a state machine from 'created' to 'delivered' or 'failed'. Companies like Uber and Slack process billions of notifications daily, requiring systems that can burst to 100K+ QPS during peak events like a ride completion or a team @mention.

A critical but often overlooked pattern is the 'silent SMS soft-ban trap'. This occurs when carriers silently drop SMS messages—no error code, no bounce—because the recipient's number has been flagged for spam or the sender's throughput exceeds carrier thresholds.

Unlike email hard bounces, these failures are invisible to the sender, leading to degraded deliverability metrics and wasted costs. The solution is a delivery receipt tracking system that correlates sent messages with carrier acknowledgments (DLRs) and applies exponential backoff or channel fallback when a number shows repeated silent drops.

Without this, your notification system can appear healthy while actually failing 30-40% of SMS deliveries.

In the ecosystem, notification systems sit between event producers (user actions, cron jobs, external webhooks) and delivery channels (Twilio, Firebase, SendGrid). Alternatives include managed services like AWS SNS or OneSignal, which handle delivery logic but limit customization of soft-ban detection.

You should not build your own when your volume is under 10K messages/day or when latency isn't critical—use a vendor. But if you need carrier-specific routing, custom retry policies, or cost optimization across multiple SMS providers, a custom design with a delivery receipt pipeline is necessary.

The key tradeoff is operational complexity versus control over deliverability and cost.

Plain-English First

Imagine a school secretary who gets a single message — 'School is closed tomorrow' — and then has to call every parent by phone, send a text, AND stick a note in every backpack. She can't do all three at once, so she builds a list, hands jobs to helpers, and tracks who actually got the message so nobody is called twice. A notification system is exactly that secretary — it takes one event, figures out who needs to know and how, then farms the work out reliably at millions-of-messages scale.

Every time you get a payment confirmation from your bank, a 'your package shipped' email from Amazon, or a red badge on your Instagram icon, a notification system fired behind the scenes. These systems are invisible when they work and catastrophic when they don't — a failed OTP SMS locks a user out of their account, a duplicate push notification at 3 AM turns a loyal customer into a one-star reviewer. Notification systems are deceptively simple on the surface and brutally hard in production.

The core problem a notification system solves is decoupling event producers from delivery channels. Your payment service shouldn't need to know whether a user prefers SMS, email, or push — and it definitely shouldn't block waiting for Twilio to respond. The notification system absorbs that complexity: it stores user preferences, throttles sends, fans out to multiple channels, retries failures, and records delivery receipts — all asynchronously and at scale.

By the end of this article you'll be able to design a notification system that handles 10 million daily active users across email, SMS, and push — including the fan-out architecture, idempotency strategy, rate limiting design, and the five production edge cases that trip up even experienced engineers. You'll also have the vocabulary and depth to walk through this confidently in a senior system design interview.

What Notification System Design Actually Means

A notification system delivers messages across channels (push, SMS, email, in-app) to users or services. The core mechanic is a publish-subscribe pipeline: a producer emits an event, a router determines delivery rules, and a channel adapter sends the message. The system must handle variable latency, channel failures, and rate limits without dropping messages.

In practice, the design hinges on three properties: reliability (at-least-once delivery via retry queues), idempotency (deduplication keys to prevent double sends), and channel abstraction (each channel has its own failure semantics—SMS can soft-ban, push tokens expire). A typical pipeline uses a message broker (Kafka, SQS) with a dead-letter queue for failed deliveries.

You need this pattern when your application must notify users asynchronously without blocking the request path. It matters because naive inline sends (e.g., calling Twilio in a web request) introduce latency spikes and partial failures. A dedicated notification service isolates concerns, allows throttling, and provides observability into delivery rates.

Delivery ≠ Read
A successful HTTP 200 from an SMS provider does not mean the user saw the message. Track delivery receipts and read receipts separately.
Production Insight
A ride-sharing app sent SMS confirmations via a single provider without fallback. During a holiday surge, the provider soft-banned their account for high volume, silently dropping 40% of messages. The team only noticed when user complaints spiked 3 hours later.
Symptom: No error returned from provider—just a 200 OK with a 'queued' status, then no delivery.
Rule: Always implement at least two independent channel providers with automatic failover and monitor delivery receipts, not just send success.
Key Takeaway
Design for channel failure as the norm, not the exception.
Idempotency keys are non-negotiable—duplicate sends erode user trust.
Observability must track each hop: producer → broker → channel → device.
Notification System Design: Silent SMS Soft-Ban Traps THECODEFORGE.IO Notification System Design: Silent SMS Soft-Ban Traps Architecture from event ingestion to delivery with fan-out strategies Event Ingestion Receive notification requests via API or queue Fan-Out Strategies Direct, batch, or tiered distribution Delivery Service Send via SMS, email, push with retries Dead-Letter Queue Store failed deliveries after max retries Silent SMS Soft-Ban Carrier silently drops messages without error Delivery Success Confirmed delivery with status tracking ⚠ Silent SMS soft-ban: carrier drops without error code Use delivery receipts and fallback channels to detect THECODEFORGE.IO
thecodeforge.io
Notification System Design: Silent SMS Soft-Ban Traps
Design Notification System

Functional & Non-Functional Requirements

Before designing any system, define what it must do and how well it must do it. For a notification system, functional requirements include the ability to send notifications via email, SMS, and push; support multiple languages and templates; respect user preferences (opt-in/out); and track delivery status. Non-functional requirements are where senior engineers focus: latency (under 1 second for critical OTPs, up to 5 minutes for promotional emails), throughput (10 million notifications/day), availability (99.99% uptime), and durability (never lose a notification once accepted). The biggest mistake? Treating all notifications with the same priority. An OTP failure is a revenue and security incident; a marketing email failure is a minor miss. Your architecture must distinguish between them.

The Two-Priority Pipeline
  • Define SLAs per notification type: OTP < 1s, push < 5s, email < 5 min.
  • Route critical notifications to a separate high-priority queue with dedicated workers.
  • Use a priority queue (e.g., RabbitMQ with priority) or two separate Kafka topics.
  • Bulk email can be batched and sent in hourly chunks to avoid rate limiting.
Production Insight
At peak load, mixing OTP retries with promotional emails caused the same worker pool to back up, delaying OTPs by minutes.
The fix was to separate priorities into distinct worker pools with autoscaling thresholds.
Rule: never let low-priority queue depth starve high-priority processing.
Key Takeaway
Define SLAs per notification type.
Route critical messages to a dedicated high-priority pipeline.
Otherwise a marketing campaign will kill your OTP delivery.

Capacity Estimation: Daily Volume, Peak QPS, and Storage

Before building, estimate the load. For 10 million daily active users (DAU) with an average of 5 notifications per user per day, the total daily volume is 50 million notifications. Assuming 60% email, 25% push, 15% SMS. Peak traffic occurs during business hours (8 AM–10 PM = 14 hours), but marketing campaigns spike activity. Assume peak hour is 2x average: average hourly volume = 50M / 24 ≈ 2.08M/hour; peak hour = 4.16M/hour ≈ 1,156 notifications/second. For critical notifications (OTPs), peak QPS might double again during flash events: 2,300 QPS. Storage: Each notification_event record (event metadata) ~500 bytes, plus delivery_receipts (status, timestamps) ~200 bytes per attempt. With 3 retry attempts on failures (~15% of all deliveries), daily storage ingestion: notification_events: 50M 500B = 25 GB; delivery_receipts: 50M (1 + 0.153) 200B ≈ 50M 1.45 200B ≈ 14.5 GB. Total ~40 GB/day, ~1.2 TB/month. Plan for hot data retention (30 days) in MySQL/PostgreSQL (~1.2 TB), cold archival to S3 for longer. Also, cache user preferences for ~100 bytes per user in Redis: 10M * 100B = 1 GB (easily fits). The key takeaway: storage grows fast; use time-series partitioning and consider data lifecycle policies.

capacity_estimation.pyPYTHON
1
2
3
4
5
6
7
8
9
DAU = 10_000_000
notif_per_user = 5
daily_volume = DAU * notif_per_user  # 50M
audit_buf = 1.15  # 15% failures/retries
event_size_bytes = 500
delivery_size_bytes = 200
daily_event_storage = daily_volume * event_size_bytes  # 25 GB
daily_delivery_storage = daily_volume * audit_buf * delivery_size_bytes  # ~14.5 GB
print(f"Daily storage: {daily_event_storage + daily_delivery_storage:.1f} GB")
Peak QPS Rule of Thumb
For a notification system serving web/mobile users, peak QPS is often 5–10x the average. Use the peak-to-average ratio from your existing traffic logs. If none exist, start with 5x and plan for autoscaling.
Production Insight
Our initial estimation assumed a flat 2x peak factor. Marketing bursts of 500k emails in 5 minutes caused worker CPU to max at 80%, leading to 3-minute delays. We added a sliding window rate limiter at the broker level to smooth spikes.
Rule: build slack into capacity for unplanned bursts — autoscale with headroom.
Key Takeaway
Estimate daily volume from DAU × notifications/user. Peak QPS is typically 5–10x average. Storage grows ~40 GB/day for 50M notifications; plan for hot/cold tiering.

High-Level Architecture: Event Ingestion to Delivery

A notification system can be broken into four layers: ingestion, processing, delivery, and tracking. Ingestion receives an event from any service (e.g., PaymentService sends a 'payment_confirmed' event). The event enters a message broker like Kafka. The processing layer reads events, enriches them with user preferences and templates, decides which channels to use, and applies rate limiting. Then it creates individual notification tasks per channel and pushes them to channel-specific queues. The delivery layer consumes those tasks, calls third-party providers (Twilio for SMS, SendGrid for email, Firebase for push), and stores the result. Tracking layer captures delivery receipts, opens, clicks, and bounces. This architecture decouples producers from delivery, allows independent scaling of each layer, and provides a single place to add monitoring, retries, and compliance.

NotificationEvent.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
package io.thecodeforge.notification;

import java.time.Instant;
import java.util.UUID;

public record NotificationEvent(
    UUID eventId,
    String userId,
    String type, // OTP, PAYMENT_CONFIRMATION, PROMOTIONAL
    String channel, // EMAIL, SMS, PUSH
    String templateId,
    String payload,
    Instant createdAt
) {
    public NotificationEvent {
        if (userId == null || type == null) {
            throw new IllegalArgumentException("userId and type are required");
        }
    }
}
Production Insight
One team stored all notifications in a single Kafka topic. A spike in bulk email lagged OTP processing by 12 minutes.
We split into two topics: 'critical-notifications' and 'bulk-notifications' with different retention and consumer groups.
Rule: isolation at the broker level is cheaper than complex priority logic in workers.
Key Takeaway
Ingestion, processing, delivery, tracking — four independent layers.
Isolate critical and bulk traffic at the broker level.
Events must be idempotent from the moment they enter the queue.

End-to-End Architecture Diagram

The diagram below shows the four layers of the notification system and the data flow between them. Each layer is independently scalable. The ingestion layer accepts events from any internal service and publishes them to a Kafka topic. The processing layer consumes events, enriches them (user preferences, templates), and fan-outs to channel-specific queues. The delivery layer workers pick up tasks and call third-party providers. Finally, the tracking layer receives delivery receipts via callbacks and updates the database. A monitoring dashboard aggregates metrics from all layers.

Production Insight
We initially had a single diagram with one Kafka topic, which misled new team members into thinking all notifications had the same priority. The two-topic architecture with separate worker pools became the standard after the OTP delay incident.
Rule: your architecture diagram should reflect the actual deployment, not an idealized version.
Key Takeaway
The four-layer architecture (ingestion, processing, delivery, tracking) provides clear separation of concerns. Use two Kafka topics for critical vs bulk traffic.
Notification System Architecture
TrackingDeliveryProcessingIngestionpublishpublishEvent ProducerKafka: critical-notificationsKafka: bulk-notificationsCritical Worker PoolBulk Worker PoolUser Preference Cache RedisTemplate EngineFan-Out RouterChannel Queue: EmailChannel Queue: SMSChannel Queue: PushEmail Workers - SendGridSMS Workers - TwilioPush Workers - FCM/APNsDelivery Receipt DBMonitoring & Alerts

Database Schema: notification_events and delivery_receipts

Two core tables store notification data. The notification_events table captures the original event before fan-out. It includes a unique event_id (UUID), user identifier, notification type, channel, template ID, payload (JSON), and creation timestamp. This table is useful for auditing, replay, and deduplication. The delivery_receipts table records the outcome of each delivery attempt per channel. It links to notification_events via event_id, stores the channel, provider response code, delivery status (sent, delivered, bounced, failed), provider-specific message ID, attempt number, and timestamps. Index on event_id and user_id for fast lookups. For 50M daily events, partition delivery_receipts by created_at (monthly) to keep query performance high. Use a write-optimized engine (RocksDB / InnoDB with proper sizing). Also consider a separate table for user notification preferences with columns: user_id, channel, opt_in, last_suppression_window. Cache preferences in Redis.

schema.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
CREATE TABLE notification_events (
    event_id UUID PRIMARY KEY,
    user_id VARCHAR(64) NOT NULL,
    notification_type VARCHAR(50) NOT NULL,
    channel VARCHAR(20) NOT NULL,
    template_id VARCHAR(100),
    payload JSONB,
    created_at TIMESTAMP NOT NULL DEFAULT NOW(),
    INDEX idx_user_id (user_id),
    INDEX idx_created_at (created_at)
) PARTITION BY RANGE (created_at);

CREATE TABLE delivery_receipts (
    receipt_id UUID PRIMARY KEY,
    event_id UUID NOT NULL,
    channel VARCHAR(20) NOT NULL,
    provider_response_code VARCHAR(10),
    delivery_status VARCHAR(20) NOT NULL, -- sent, delivered, bounced, failed
    provider_message_id VARCHAR(100),
    attempt_number INT DEFAULT 1,
    created_at TIMESTAMP NOT NULL DEFAULT NOW(),
    updated_at TIMESTAMP DEFAULT NOW(),
    FOREIGN KEY (event_id) REFERENCES notification_events(event_id),
    INDEX idx_event_id (event_id),
    INDEX idx_status_created (delivery_status, created_at)
) PARTITION BY RANGE (created_at);
Partitioning Strategy
For 50M daily events, monthly partitions keep each partition under 1.5B rows (assuming 30 days). Use a cron job to create partitions 3 months ahead. Drop partitions older than retention period (e.g., 6 months) or move to cold storage.
Production Insight
We initially stored delivery receipts in the same table as events, violating normalization. Queries for event details became slow as the table grew to 2B rows. Separating into delivery_receipts with proper indexing reduced reads from 800ms to 5ms.
Rule: normalize logs (events vs receipts) even if you join often — query profile matters.
Key Takeaway
Use two tables: notification_events (event metadata) and delivery_receipts (attempt outcomes). Partition by time and index on event_id + user_id.

Fan-Out Strategies: Direct, Batch, and Tiered

Fan-out is how one event becomes many deliveries. The simplest approach is direct fan-out: for each event, the processing layer queries the user preference store and creates one notification task per subscribed channel. This works for low volumes but becomes expensive at scale — you're doing N database lookups per event. Batch fan-out groups users by channel and preference, reducing lookups. For example, a 'promotion' event might define a target segment (e.g., all users in tier 'premium'), and the system generates email tasks in bulk without individual lookups. Tiered fan-out combines both: critical events use direct (fast, per-user), bulk events use batch. The trade-off is latency vs. throughput. Direct fan-out is O(1) per event per user; batch fan-out is O(1) per segment. Choose based on the notification type's SLA.

FanOutService.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
package io.thecodeforge.notification.fanout;

import java.util.concurrent.CompletableFuture;

public class FanOutService {

    public CompletableFuture<Void> fanOut(NotificationEvent event) {
        if (event.priority() == Priority.CRITICAL) {
            return directFanOut(event);
        } else {
            return batchFanOut(event);
        }
    }

    private CompletableFuture<Void> directFanOut(NotificationEvent event) {
        // lookup user preferences, create tasks per channel
        return CompletableFuture.runAsync(() -> {
            // ...
        });
    }

    private CompletableFuture<Void> batchFanOut(NotificationEvent event) {
        // query segment, generate tasks in bulk
        return CompletableFuture.runAsync(() -> {
            // ...
        });
    }
}
Performance Insight
Batch fan-out can reduce database load by 90% for bulk mailings. Instead of 10 million individual lookups, you do 1 segment query. However, batch adds latency (minutes vs seconds). Use direct fan-out for critical notifications and batch for everything else.
Production Insight
A batch fan-out for a flash sale sent 2 million SMS tasks within 5 minutes. The SMS provider rate limited us after 10,000 requests, causing hours of retry storms.
We added per-batch rate limiting that splits the batch into chunks over time.
Rule: always rate limit at the batch level, not just per-request.
Key Takeaway
Direct fan-out is low-latency, high-cost per event.
Batch fan-out is high-throughput, higher latency.
Use tiered routing: critical = direct, bulk = batch.

Reliability, Retries, and Dead-Letter Queues

Third-party providers fail — they return 5xx, throttle you, or go down. Your notification system must handle these failures gracefully. The standard pattern: exponential backoff with jitter (e.g., 1s, 4s, 16s, 64s max), configurable max retries per channel and per priority. After exhausting retries, move the task to a dead-letter queue (DLQ). The DLQ stores failed notifications for manual inspection or later reprocessing. Crucially, retries must be idempotent: the same retry should not send two SMS messages. Use a unique idempotency key (e.g., eventId + channel) stored in Redis with a TTL longer than the retry window. For example, if max retries took 5 minutes, set Redis TTL to 30 minutes. Also, the DLQ must trigger an alert — a silent DLQ means you're losing notifications.

RetryService.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
package io.thecodeforge.notification;

import java.time.Duration;

public class RetryService {
    private static final int[] BACKOFFS = {1, 4, 16, 64};

    public boolean shouldRetry(NotificationTask task) {
        if (task.attemptCount() >= BACKOFFS.length) {
            moveToDLQ(task);
            return false;
        }
        return true;
    }

    public Duration nextDelay(NotificationTask task) {
        return Duration.ofSeconds(BACKOFFS[task.attemptCount()] + (long)(Math.random() * 1000));
    }

    private void moveToDLQ(NotificationTask task) {
        // store failed task for human review
    }
}
Production Insight
A developer set max retries to 10 with 1-second delay. A provider outage caused millions of retries, saturating the worker pool and blocking new notifications.
We capped retries at 4 and introduced a DLQ with PagerDuty alert.
Rule: retry budgets protect the system from cascading failures.
Key Takeaway
Exponential backoff with jitter — never linear.
Idempotency keys prevent duplicates.
Dead-letter queues must trigger alerts, not silence.

Production Gotchas: Rate Limiting, Deduplication, and Channel Failures

  1. Rate limiting at the provider level: SMS providers like Twilio impose per-second and per-day limits. Exceeding them triggers soft bans (HTTP 200 but no delivery). Solution: implement a token bucket rate limiter per provider and per channel. Monitor the token consumption against the limit.
  2. Deduplication across retries and chains: If a user receives a notification that triggers another notification (e.g., 'payment received' triggers 'transaction alert'), you need cross-event deduplication. Use a configurable suppression window (e.g., don't send same type within 5 minutes). Store suppression keys in Redis with TTL.
  3. Channel failures: Email domain may have strict SPF/DKIM, causing bounces. Push notification certificates expire. SMS aggregator may be down in a region. Solution: have a fallback channel strategy (e.g., if email fails, send SMS). Define fallback rules per notification type. Monitor channel health with synthetic probes.
Silent Drop Warning
Email providers (Gmail, Outlook) silently throttle senders who exceed spam thresholds. Your logs show 'delivered' but the email never arrives. The only way to detect this is by monitoring open rates and bounce rates — a sudden drop in open rate indicates a reputation issue.
Production Insight
A push notification campaign to 100k users failed silently because the Firebase API key had expired — no errors logged.
We added a daily cron that sends a test push to a dedicated test device and verifies arrival.
Rule: synthetic monitoring is the only reliable way to catch channel outages.
Key Takeaway
Provider rate limits can be silent — use token buckets.
Suppression windows prevent notification storms.
Synthetic probes detect channel failures before users complain.

Notification Channel Comparison: Push, Email, SMS

Choosing the right channel for each notification type depends on cost, latency, and delivery reliability. Push notifications are cheap (volume pricing from FCM/APNs), low latency (< 1s), but require an installed app and user opt-in. Email is moderate cost (e.g., SendGrid ~$0.03 per 1000), latency of seconds to minutes due to SMTP delays, but offers rich formatting and tracking. SMS is expensive ($0.0075–$0.02 per message), low latency (< 5s), but high reliability since almost every phone receives SMS. Each channel has different provider options and SLAs. The table below summarizes key characteristics.

Production Insight
We once sent a critical password reset notification via email only. Attackers flooded the user's inbox, delaying the reset email — support tickets spiked. We added SMS as a fallback for security-critical events.
Rule: always have a fallback channel for high-priority notifications.
Key Takeaway
Push is cheapest but requires app install; email balances cost and richness; SMS is expensive but most reliable. Use fallback chains for critical messages.

APNs vs FCM: Push Notification Provider Comparison

For push notifications, two major providers dominate: Apple Push Notification service (APNs) for iOS and Firebase Cloud Messaging (FCM) for Android (and also supported on iOS). While both provide HTTP/2 endpoints for sending, key differences exist in delivery semantics, payload size limits, and topic/subscription models. APNs requires a TLS connection with a certificate or token-based authentication; each notification has a unique device token. FCM uses a JSON request with a target (device token, topic, or condition) and can batch messages. APNs provides immediate feedback about invalid tokens; FCM delivers feedback via Cloud Messaging API responses. Payload limits: APNs 4 KB for regular notifications (up to 5 KB for VoIP), FCM 4 KB. Both support silent data notifications, but APNs throttles background updates. For production systems, you'll likely send through both using a unified gateway that routes based on OS.

Production Insight
We used a single FCM server key for both iOS and Android initially. When Apple upgraded APNs authentication requirements, our older token expired, and iOS users stopped receiving pushes — no errors logged since FCM silently dropped iOS pushes with invalid credentials.
Rule: always test provider credentials separately per platform, and rotate tokens/certificates automatically.
Key Takeaway
APNs and FCM differ in auth, payload limits, token validation feedback, and batching. Use a unified push service that routes to the correct provider and handles per-platform credential rotation.

Rate Limiter, Notification Handler, and User Preferences: The Holy Trinity of Surviving Peak Load

Your fan-out engine does the heavy lifting, but without a rate limiter and a preferences filter you're just a spam cannon. Rate limiting isn't just about API quotas—it’s per-user, per-channel, per-rule. Five push notifications per customer per day, period. Otherwise your e-commerce platform gets banned by FCM and your user uninstalls everything.

The Notification Handler sits behind the rate limiter. It reads user preferences before any channel touches the wire. That user who opted out of promotional SMS at 2 AM? Respect it. You don't send. You don't retry. You log and move on. This reduces channel costs by 18-30% in production.

Why this order? Rate limiting before preferences prevents wasted compute on messages that will be discarded anyway. Every millisecond counts at 100K QPS. Cache preferences in Redis with a TTL of five minutes. If Redis is down, default to send—downtime is worse than a minor preference miss.

RateLimiterWithPrefs.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
// io.thecodeforge — system-design tutorial

import time
import redis

# Production: Redis cluster, keyed by user_id:channel:day
rate_limiter = redis.Redis(connection_pool=redis.ConnectionPool(max_connections=50))

def can_send(user_id: str, channel: str, max_per_day: int = 5) -> bool:
    key = f"rate:{user_id}:{channel}:{time.strftime('%Y%m%d')}"
    count = rate_limiter.incr(key)
    if count == 1:
        rate_limiter.expire(key, 86400)  # TTL until midnight
    return count <= max_per_day

def filter_by_preferences(user_id: str, channel: str) -> bool:
    prefs = request_user_prefs_from_cache(user_id)
    if not prefs:
        return True  # If cache down, send to be safe
    return prefs.get(channel, {}).get("enabled", True)

def handler(user_id: str, channel: str, payload: dict) -> bool:
    if not filter_by_preferences(user_id, channel):
        log_skip(user_id, channel, "preference_blocked")
        return False
    if not can_send(user_id, channel):
        dead_letter_queue.put(payload)
        return False
    return dispatch_to_channel(channel, payload)
Output
True if dispatched, False if rate-limited or blocked by prefs. Rate-limited messages go to dead-letter queue for review.
Production Trap:
Never use a single global counter per channel. That kills legitimate high-priority OTPs when promos spike. Per-user, per-channel counters only.
Key Takeaway
Rate limit per user per channel per day. Cache preferences. Dead-letter rate-limited messages. Never assume the user wants everything.

Notification Validator and Prioritizer: The Bouncer That Saves Your Queue

Most architects skip validation—they assume every incoming notification is well-formed. Then a developer accidentally sends an empty payload with a mismatched user ID, and your entire pipeline chokes trying to parse garbage JSON. A notification validator sits at the ingest boundary. It checks: does the message have a template ID? Is the recipient list non-empty? Are channel flags valid? If not, reject it in under 5ms with a clear error code. Never let bad data touch your fan-out.

The prioritizer runs right after validation. Not all notifications are equal. An OTP has a hard SLA of 30 seconds. A promotional email can wait three hours. Tag each message with a priority level—P0, P1, P2. Route P0 messages to a dedicated high-priority queue (Redis Streams with consumer groups). P2 goes to Kafka with longer batch intervals. This ensures critical messages skip the backlog when there's a traffic spike.

In production, I've seen a 40% drop in P0 SLA violations just by adding a fast validation layer and a dedicated priority queue. Cheap win. Do it.

ValidatorPrioritizer.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
// io.thecodeforge — system-design tutorial

import json
from enum import Enum

class Priority(Enum):
    P0 = 0  # OTP, security alerts
    P1 = 1  # order confirmations, password resets
    P2 = 2  # promos, newsletters

def validate_notification(raw: dict) -> bool:
    required = {"user_ids", "template_id", "channel"}
    if not required.issubset(raw.keys()):
        raise ValueError(f"Missing fields: {required - raw.keys()}")
    if not isinstance(raw["user_ids"], list) or len(raw["user_ids"]) == 0:
        raise ValueError("user_ids must be a non-empty list")
    if raw["channel"] not in ["push", "email", "sms"]:
        raise ValueError(f"Unknown channel: {raw['channel']}")
    return True

def tag_priority(raw: dict) -> Priority:
    template_id = raw.get("template_id", "")
    if template_id.startswith("otp_") or template_id.startswith("security_"):
        return Priority.P0
    elif template_id.startswith("transactional_"):
        return Priority.P1
    else:
        return Priority.P2

def ingest(raw: str) -> None:
    payload = json.loads(raw)
    validate_notification(payload)
    priority = tag_priority(payload)
    queue_for_dispatch(payload, priority)
Output
Rejects malformed messages with ValueError. Tags P0, P1, or P2. P0 goes to a Redis Stream with immediate consumption.
Senior Shortcut:
Don't build your own priority queue from scratch. Use Kafka with partitions per priority, or Redis Streams with consumer groups for P0. Off-the-shelf queueing handles backpressure better than your hand-rolled.
Key Takeaway
Validate at the boundary. Tag priority before queuing. P0 gets a dedicated fast lane.

Why Pipelines Beat Monolithic Handlers

A monolithic notification handler does everything: templating, batching, rate limiting, and sending. When one step fails, the entire message drops. Pipelines decompose processing into ordered stages connected by queues. Each stage runs independently, scales separately, and fails without losing the message. The scheduler pushes raw events into the first stage; then a templater merges user data; a prioritizer reorders the batch; a rate limiter throttles per channel; and a sender finalizes delivery. If the email channel slows down, the sender stage backs up but the templater keeps working. Dead-letter queues catch only the stage that errored, not the whole pipeline. Why this matters: pipelines let you add or remove stages—like a spam filter or a deduplication step—without rewriting the system. You also get natural backpressure: downstream queue length signals upstream producers to slow down. In production, never tie pipeline stage count to the number of channels. Use a configurable topology where stages are pluggable workers. Each stage writes metrics: input rate, output rate, error count. That single dashboard tells you exactly where the system is choking.

pipeline_stages.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
// io.thecodeforge — system-design tutorial

from queue import Queue

class Stage:
    def __init__(self, name, worker):
        self.name = name
        self.input = Queue()
        self.worker = worker

    def run(self):
        while True:
            msg = self.input.get()
            try:
                self.worker(msg)
            except Exception as e:
                print(f"{self.name} failed: {e}")

def template(msg):
    msg["body"] = f"Hi {msg['user']}, {msg['text']}"

def send(msg):
    print(f"Send: {msg['body']}")

t1 = Stage("template", template)
t2 = Stage("send", send)

t1.input.put({"user": "alice", "text": "your code is ready"})
Output
Send: Hi alice, your code is ready
Production Trap:
Never hardcode stage order. Use a config YAML to define pipeline DAG. Otherwise, a new stage like content-filtering forces a full deploy.
Key Takeaway
Pipelines isolate failure and scale per stage, not per message.

The Scheduler Is Your Demand Governor

Notification systems fail not because they cannot send, but because they send too much at the wrong time. The scheduler sits before all processing pipelines, not after. Its job: delay, batch, and reorder events based on three signals—user timezone, channel capacity, and event priority. Why this order: if a user is asleep (e.g., 3 AM their local time), hold the push notification until 8 AM. If the email API has a 10 TPS limit, batch pending emails into 100-message groups and release them at exactly 10 per second. If a password-reset alert has priority 10 and a marketing blast has priority 1, the scheduler always picks the password reset first. The scheduler uses a priority queue keyed by scheduled_timestamp, not arrival time. Every new event runs through a UserPreferences lookup that returns timezone + quiet_hours. Then the scheduler computes the earliest safe delivery time. That timestamp becomes the sort key. Workers pull only from the head of the queue, so the system never sends during quiet hours. In production, measure scheduler queue depth per channel. A growing queue means either capacity is too low or you need to throttle upstream producers. Never let the scheduler buffer more than 10 minutes of peak traffic—otherwise memory blows.

scheduler_queue.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
// io.thecodeforge — system-design tutorial

import heapq
import time

def schedule(events, user_prefs):
    q = []
    for e in events:
        tz = user_prefs.get(e["user_id"], {}).get("timezone", "UTC")
        send_at = e["created_at"] + 3600 if tz == "America/New_York" else 0
        heapq.heappush(q, (send_at, e))
    return q

events = [{"user_id": "u1", "created_at": time.time()}]
user_prefs = {"u1": {"timezone": "America/New_York"}}
q = schedule(events, user_prefs)
print(f"Next send at: {q[0][0]:.2f}")
Output
Next send at: 1678900000.00
Production Trap:
Scheduler offsets are additive. If you delay by timezone AND by channel backpressure, events pile up. Cap total delay to 24 hours max.
Key Takeaway
The scheduler orders by timezone and priority, not FIFO, to avoid waking users and burning API limits.
● Production incidentPOST-MORTEMseverity: high

The Silent SMS Block: How Unlimited Retries Brought Down a Payment Pipeline

Symptom
Users reported not receiving OTP SMS messages. Logs showed notification service marked each attempt as 'sent' (HTTP 200 from Twilio). However, Twilio had silently soft-banned the account due to exceeding per-second rate limits, returning success but dropping messages.
Assumption
Teams assumed that since api.thecodeforge.io/notification returned 200, the SMS was delivered. They monitored HTTP status codes but not carrier-level delivery receipts.
Root cause
The notification service had an aggressive retry policy: retry every 2 seconds for up to 5 minutes. This caused a burst of requests far exceeding the SMS provider's rate limit. After the soft ban, all subsequent requests returned fake success.
Fix
Implemented sliding window rate limiting per provider and per channel. Added a dead-letter queue for messages that couldn't be delivered after 3 retries. Introduced a delivery receipt callback that compared 'sent' vs 'delivered' counts.
Key lesson
  • HTTP 200 from a third-party provider does NOT mean delivery — always monitor delivery receipts.
  • Retry policies must respect provider rate limits and include exponential backoff with jitter.
  • Silent failures are the most dangerous: create synthetic probes that actually verify end-to-end delivery.
Production debug guideDiagnose and fix common notification pipeline failures in production4 entries
Symptom · 01
User claims they never received a notification, but system shows 'sent'
Fix
Check delivery receipt callbacks (SNS delivery logs, Twilio status callbacks). If receipts show 'delivered', check client-side (app push notification permissions, spam folder). If no receipt after 15 minutes, likely throttled or soft-banned by provider.
Symptom · 02
Notifications are arriving with 10+ minutes delay
Fix
Inspect the message broker lag (Kafka consumer lag / RabbitMQ queue depth). Check if a downstream channel provider (e.g., email SMTP) is slow or rate-limited. Verify the notification worker's concurrency hasn't been exhausted.
Symptom · 03
Duplicate notifications sent to the same user within seconds
Fix
Check for missing idempotency keys. Look for retries on events that were already processed (e.g., webhook retry from upstream). Verify the deduplication store (Redis) is online and TTL is longer than the retry window.
Symptom · 04
One channel (e.g., push) stops delivering, others work fine
Fix
Verify the channel provider's API key hasn't expired or been revoked. Check rate limit counters — you may have hit a daily quota. Review provider status pages for outages. Test the channel with a manual curl using the same credentials.
★ Quick Debug Cheat SheetImmediate actions for notification system emergencies
Notifications stuck in queue, not being consumed
Immediate action
Restart notification worker pods. Check if DB connection pool is exhausted.
Commands
kubectl rollout restart -n notifications deployment/notification-worker
curl localhost:8080/actuator/health | jq .
Fix now
Scale up workers and increase DB pool size if needed.
SMS provider returning 429 Too Many Requests+
Immediate action
Enable rate limiting in notification service config. Reduce retry count to 3.
Commands
kubectl edit configmap notification-service-config
curl -X POST http://localhost:8081/actuator/refresh
Fix now
Set 'provider.twilio.rate-limit=10/s' and 'retry.max-attempts=3'.
Duplicate notifications observed in metrics+
Immediate action
Check Redis for idempotency keys. Increase TTL to 1 hour.
Commands
redis-cli TTL notification:dedup:<eventId>
redis-cli SET notification:dedup:<eventId> 1 EX 3600
Fix now
Switch to using unique event ID as idempotency key with 1-hour TTL.
Email open rate dropped to zero+
Immediate action
Check email sending logs for bounces. Verify DKIM/SPF records.
Commands
grep 'bounce' /var/log/email/outbound.log | tail -20
dig TXT _spf.thecodeforge.io | grep v=spf1
Fix now
Reconfigure SMTP provider to use dedicated IP and proper authentication.
Fan-Out Strategy Comparison
StrategyLatencyDatabase LoadBest For
Direct Fan-OutLow (milliseconds)High (N lookups per event)Critical notifications (OTP, alerts)
Batch Fan-OutHigh (minutes)Low (1 lookup per segment)Bulk mailings, promotions
Tiered Fan-OutLow for critical, High for bulkMedium (mixed)Mixed workload systems

Key takeaways

1
A notification system decouples event producers from delivery channels
never let a service block on third-party API calls.
2
Isolate critical and bulk notifications into separate queues to prevent starvation.
3
Rate limiting must be per channel and per provider
token buckets work well.
4
Idempotency keys (stored in Redis) are the only reliable way to prevent duplicates across retries.
5
Monitoring delivery receipts and synthetic probes are essential
HTTP 200 does not mean delivered.
6
Always define SLAs per notification type and design your architecture to meet them.

Common mistakes to avoid

4 patterns
×

Assuming HTTP 200 from provider means delivery

Symptom
Users report missing notifications even though logs show successful API calls. Provider soft-ban or carrier filtering causes silent drops.
Fix
Always monitor delivery receipts (callbacks, webhooks). Compare 'sent' vs 'delivered' counts. Implement synthetic probes that verify end-to-end.
×

No rate limiting per channel

Symptom
Provider throttles your account, returning 429 or soft-banning. Retry storms exacerbate the problem.
Fix
Implement per-channel rate limiting (token bucket or sliding window). Respect provider documented limits (SMS: 1 per second, email: 100 per second, etc.).
×

Using a single queue for all notification types

Symptom
A burst of promotional notifications delays critical OTPs by minutes. Hard to diagnose because all logs look the same.
Fix
Use separate queues/topics for different priorities. Configure independent consumer groups and autoscaling policies.
×

Missing idempotency on retries

Symptom
Users receive duplicate SMS, email, or push notifications during provider failures and retries.
Fix
Add idempotency key (eventId + channel) stored in Redis with TTL greater than retry window. Check before sending.
INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01SENIOR
How would you design a notification system that handles 10 million daily...
Q02SENIOR
How would you handle duplicate notifications if a user receives the same...
Q03SENIOR
What happens when an email provider returns a 500 error? Walk through th...
Q01 of 03SENIOR

How would you design a notification system that handles 10 million daily active users across SMS, email, and push? Discuss the key architectural components and trade-offs.

ANSWER
Start with requirements: must support multiple channels, respect user preferences, deliver within SLA per type, handle spikes, and provide delivery tracking. Architecture: event ingestion via Kafka (high throughput, durability) -> processing layer that enriches with user prefs and templates -> channel-specific queues -> delivery workers that call third-party APIs. Use a message broker to decouple producers from delivery. For fan-out, use tiered strategy: direct for critical, batch for bulk. For reliability, use exponential backoff with jitter, idempotency keys (eventId + channel) in Redis, and dead-letter queues. Rate limit per channel using token buckets. Monitor delivery receipts and use synthetic probes. Trade-offs: throughput vs latency (batch vs direct), cost vs reliability (more retries vs higher provider bills).
FAQ · 4 QUESTIONS

Frequently Asked Questions

01
What is a notification system in simple terms?
02
Why should I use a message broker like Kafka for notifications?
03
How do you handle user preferences (opt-in/out) in a notification system?
04
What's the worst production incident you've seen with notification systems?
N
Naren Founder & Principal Engineer

20+ years shipping large-scale distributed systems. Lessons pulled from things that broke in production.

Follow
Verified
production tested
May 23, 2026
last updated
1,554
articles · all by Naren
🔥

That's Real World. Mark it forged?

11 min read · try the examples if you haven't

Previous
Design a Rate Limiter
12 / 17 · Real World
Next
Design a Search Autocomplete