AWS SQS vs SNS — Silent Loss Under Lambda Throttling
SNS retries throttled Lambda twice then drops messages permanently — no DLQ, no alert.
- SQS = queue. One message, one consumer. Durable storage (up to 14 days). Use for async task processing, rate limiting, back-pressure.
- SNS = pub/sub topic. One message, all subscribers. No storage. Use for event broadcasting, fan-out, decoupling producers from consumers.
- SNS+SQS fan-out = production standard. SNS broadcasts to multiple SQS queues. Each queue durably stores its copy. Never subscribe Lambda directly to SNS in production — SQS in between absorbs throttling.
- Long polling: always set WaitTimeSeconds=20 in receive_message. Cuts API calls by 95%, drops costs.
- Dead-letter queue (DLQ): maxReceiveCount=3. Messages that fail processing go to DLQ, not infinite retry. Monitor DLQ depth → that's your bug signal.
- Cost trap: SNS to Lambda subscriptions retry twice then drop messages on throttle. SQS queues hold messages safely for days.
Imagine a busy pizza restaurant. SNS is the manager who shouts 'Order 42 is ready!' — every station (kitchen, cashier, delivery) that cares about that announcement hears it at the same time. SQS is the ticket rail above the grill — each chef grabs one ticket, works through it at their own pace, and the ticket is gone once it's done. One broadcasts, one queues. That's the whole mental model.
Modern applications rarely do one thing at a time. A user places an order and suddenly you need to charge their card, send a confirmation email, update inventory, notify the warehouse, and log an audit trail — all reliably, even if your email service crashes at 2 a.m. That's the problem AWS SQS and SNS were built to solve. They decouple the parts of your system so that a failure in one place doesn't cascade everywhere.
Without messaging services like these, you'd wire services together with direct HTTP calls. Service A calls Service B, which calls Service C. If B is slow, A waits. If C is down, the whole chain breaks. SQS introduces a buffer — a durable queue that holds messages until a consumer is ready to process them. SNS takes a different angle: it lets one event instantly fan out to dozens of subscribers without the publisher needing to know who they are.
By the end of this article you'll know the architectural difference between a queue and a publish-subscribe topic, how to wire SQS and SNS together for a real fan-out pattern, what dead-letter queues are and why you desperately need them, and exactly when to reach for each service in your next cloud project.
SQS — The Durable Message Queue That Saves Your System at 2 A.M.
SQS (Simple Queue Service) is a fully managed message queue. A producer drops a message into the queue, and one or more consumers poll the queue and process messages at their own pace. The key word is 'one' — by default each message is delivered to exactly one consumer. This is point-to-point messaging.
Why does that matter? Because it gives you back-pressure handling for free. If your order-processing service is overwhelmed, messages just pile up in the queue safely. The queue acts as a shock absorber between the part of your system that generates work and the part that does the work.
There are two flavours. Standard queues give you maximum throughput with at-least-once delivery and best-effort ordering — meaning a message might appear twice (rare, but plan for it). FIFO queues guarantee exactly-once processing and strict order, but cap you at 3,000 messages per second with batching. Choose FIFO when order actually matters — financial transactions, state machines. Choose Standard everywhere else.
Messages live in the queue for up to 14 days. The visibility timeout is the other critical setting: after a consumer picks up a message, it becomes invisible to other consumers for that window. If your Lambda or EC2 worker crashes mid-process, the message reappears and gets retried. That's your built-in retry mechanism.
Long polling is the single most impactful cost-saving setting. Always set WaitTimeSeconds=20 in your receive_message calls. Without it, your consumer uses short polling — it returns immediately even if no messages exist. You pay per API call, so a quiet queue will cost you thousands of empty calls. Long polling holds the connection for up to 20 seconds, waiting for a message. On a queue that gets one message per minute, long polling cuts your API calls by 95%.
SNS — The Pub/Sub Megaphone That Notifies Everyone at Once
SNS (Simple Notification Service) works on the publish-subscribe model. You publish one message to a Topic, and SNS fans it out simultaneously to every subscriber — SQS queues, Lambda functions, HTTP endpoints, email addresses, mobile push notifications. The publisher has zero knowledge of who's listening. Adding a new subscriber doesn't touch the publisher at all.
This is the architectural superpower. Imagine your user-signup event needs to trigger a welcome email, a CRM record creation, a Slack notification to your growth team, and an analytics event. With SNS, your Auth service publishes one 'UserRegistered' message and walks away. Four independent services consume it in parallel.
Message filtering is what takes SNS from useful to essential. Instead of every subscriber receiving every message on a topic, you attach a filter policy to a subscription. Your EU payments service can subscribe to the 'transactions' topic but only receive messages where region=EU. This keeps each service focused on what it actually cares about.
SNS does not store messages. If a subscriber is down when the message arrives, that message is gone unless the subscriber is an SQS queue (which durably stores it). That's the most important SNS limitation to internalise — and it leads directly to the most powerful pattern: SNS + SQS fan-out.
SNS delivery logging is a critical debugging tool that most teams don't enable. It publishes delivery attempts, failures, and throttling events to CloudWatch Logs. When messages go missing, this is the first place to look. Enable it on every production SNS topic.
Dead-Letter Queues and the SNS+SQS Fan-Out Pattern — Production Essentials
Two patterns separate a toy cloud setup from a production-grade one: dead-letter queues (DLQs) and the SNS+SQS fan-out architecture. You need to understand both.
A DLQ is just another SQS queue. You configure it on your main queue and set maxReceiveCount — say, 3. If a message fails processing 3 times, SQS automatically moves it to the DLQ instead of retrying forever or silently dropping it. Your team gets alerted, investigates the poisoned message, and the rest of your queue keeps flowing normally. Without a DLQ, one bad message can block your queue or create an infinite retry storm.
The fan-out pattern solves SNS's biggest weakness — no durability. The rule is: never subscribe a Lambda directly to SNS in production if message loss is unacceptable. Instead, subscribe an SQS queue to the SNS topic. The queue durably catches every message. Your Lambda then polls the queue. You get SNS's broadcasting power AND SQS's durability and retry logic together. This is the architectural backbone of most event-driven AWS systems.
The code example below wires both patterns together with IaC-style Boto3 calls, showing exactly how a DLQ connects to a main queue.
Monitoring the DLQ: Set a CloudWatch alarm on ApproximateNumberOfMessages > 0 on your DLQ. A message in the DLQ means your consumer failed to process it after maxReceiveCount attempts. That's a bug — it shouldn't be ignored or silently deleted. Your on-call should get a page.
SQS vs SNS vs EventBridge — A Three-Way Decision Table
When you need more than basic pub/sub or queuing, AWS EventBridge enters the picture. It's not a replacement for SQS or SNS — it sits above them, offering a central event bus with advanced routing, schema registry, and integration with third-party SaaS events. Here's a decision table to clarify when to pick each:
| Feature | SQS | SNS | EventBridge |
|---|---|---|---|
| Messaging model | Point-to-point queue | Pub/sub topic | Event bus (pub/sub + routing) |
| Durability | Yes, up to 14 days | No (unless subscriber is SQS) | Yes, 24-hour default, configurable up to 3 days |
| Throughput | Unlimited (Standard) / 3,000 msg/s (FIFO) | 300 publishes/s (default, adjustable) | 5,000 events/s per bus (adjustable) |
| Filtering | Consumer-side | Server-side subscription filters (attribute-based) | Rich content-based filtering (JSONPath, prefix, suffix, anything, exists) |
| Ordering | Best-effort / FIFO | No ordering | No ordering (use replay or custom) |
| Payload size | Up to 256 KB | Up to 256 KB | Up to 256 KB |
| Pricing | Pay per request & data transfer | Pay per publish & delivery attempts | Pay per event ingested & delivered (higher per-event cost) |
| Third-party integrations | None native | Email, SMS, mobile push | SaaS apps (Zendesk, Datadog, PagerDuty, 200+ built-in sources) |
| Schema registry | No | No | Yes — schema discovery & code generation |
| Replay events | No | No | Yes — archive and replay events up to 14 days |
When to pick EventBridge over SNS: - You need complex content-based filtering (e.g., "order.total > 100 and order.region != 'US'") - You want to ingest events from third-party SaaS providers (GitHub, Shopify, etc.) - You need event replay for debugging or disaster recovery - You want automatic schema discovery to generate strongly typed code
When to stick with SNS+SQS: - You need FIFO ordering or exactly-once processing (EventBridge doesn't support FIFO) - Your throughput is very high and you want the lowest per-message cost - You need 14-day message retention (EventBridge max is 3 days for custom events) - You need the simplicity of a direct queue (SQS) without event bus complexity
The rule of thumb: SNS+SQS covers 80% of event-driven use cases. EventBridge is worth the extra cost and complexity when you need its advanced routing, third-party integration, or replay capabilities.
Pros and Cons of SQS and SNS
Every service has trade-offs. Here's a clear-eyed look at what SQS and SNS do well, and where they fall short.
SQS — Advantages - Durable 14-day message storage with automatic retries via visibility timeout - At-least-once delivery (Standard) or exactly-once (FIFO) - Unlimited throughput with Standard queues - Built-in dead-letter queue support - Low cost per API request, especially with long polling - Supports batch operations (up to 10 messages per receive, 10 per send)
SQS — Disadvantages - No built-in fan-out one-to-many (you need SNS for that) - Consumer must poll the queue, adding latency and cost if not optimized - Max message size 256 KB (need S3 large-payload solution for bigger) - Ordering only guaranteed with FIFO (limited throughput) - No content-based server-side filtering
SNS — Advantages - Instant pub/sub fan-out — one message reaches all subscribers simultaneously - Multiple subscriber types (SQS, Lambda, HTTP, email, SMS, mobile push) - Server-side filter policies reduce unnecessary deliveries - No polling overhead for subscribers (push-based) - Simple pricing per publish, not per subscriber
SNS — Disadvantages - Messages are NOT stored — if subscriber is down, message is lost - Limited retries (3 attempts only for HTTP/Lambda subscribers) - No ordering guarantees - Max message size 256 KB - No DLQ for failed deliveries (must use SQS subscriber to get retries) - Filtering limited to message attributes (not body content)
The real insight: SNS's biggest disadvantage (no durability) is also its greatest advantage when paired with SQS. The combination covers each service's weakness. Never use SNS alone for critical events always buffer with SQS.
Pricing Comparison — Standard vs FIFO, Per-Million Request Costs
Understanding cost at scale is critical. Here's the pricing breakdown for SQS, SNS, and the common patterns.
SQS Pricing (as of 2026)
| Queue Type | Request Pricing | Data Transfer Pricing | Free Tier |
|---|---|---|---|
| Standard | $0.40 per million requests | $0.09 per GB after first 1 GB/month | 1 million requests free per month |
| FIFO | $0.50 per million requests | $0.09 per GB after first 1 GB/month | 1 million requests free per month |
Notes on SQS requests: - A “request” is any API call: SendMessage, ReceiveMessage, DeleteMessage, ChangeMessageVisibility, etc. - Long polling (WaitTimeSeconds=20) counts as one request per 20-second call, even if the response is empty. - Batch operations count as one request per batch of up to 10 messages.
SNS Pricing (as of 2026)
| Topic Type | Publish Pricing | Delivery Pricing | Free Tier |
|---|---|---|---|
| Standard | $0.50 per million publishes | $0.50 per million deliveries across all subscribers | 1 million publishes free per month |
| FIFO | $1.10 per million publishes | $0.50 per million deliveries across all subscribers | Not included in free tier (pay per use) |
Notes on SNS delivery: - Each subscriber receives a copy of the message. If you have 5 SQS subscribers and publish 1 million messages, you pay for 1 million publishes + 5 million deliveries = $3.00 ($0.50 + $2.50). - For FIFO topics, the higher publish price reflects the ordering guarantee.
Comparison Scenario: Fan-out to 3 SQS queues Assume 10 million messages per month.
- SNS+SQS fan-out: 10M publishes = $5.00. 30M deliveries = $15.00. SQS request cost: 10M sends to each queue = 30M send requests = $12.00. Consumer polling: ~3.6M receive calls with long polling (10M messages / 10 batch size * 3.6 polls per message) = $1.44. Total: ~$33.44.
- Direct Lambda subscriptions: 10M publishes = $5.00. Lambda invocations free if using async invocation? Actually, SNS to Lambda is free beyond the publish cost. But you risk message loss (see production incident above). Not recommended for critical data.
- EventBridge bus: 10M events ingested = $10.00. 30M deliveries to 3 rules = $30.00. Total: $40.00.
Cost-saving tips: 1. Always use long polling on SQS consumers to reduce empty receive requests. 2. Batch send messages to SQS (up to 10 per request) to cut send costs by 90%. 3. Use SNS standard for high-volume fan-out; FIFO only when ordering is required. 4. Monitor SQS API usage with CloudWatch metrics. Set budgets and alarms.
Pricing is subject to change. Check the official AWS pricing page for the latest figures.
When to Consider Amazon MQ Instead of SQS or SNS
Amazon MQ is a fully managed message broker service that supports industry-standard protocols: MQTT, AMQP, STOMP, OpenWire, and JMS. It's the cloud version of Apache ActiveMQ and RabbitMQ. When should you reach for it instead of SQS/SNS?
Amazon MQ is the right choice when: - You're migrating an existing on-premises application that already uses JMS, AMQP, or MQTT. Rewriting everything to use SQS/SNS would be too risky or time-consuming. - You need advanced message routing beyond what SNS offers — like topics with wildcards, virtual topics, or message selectors (JMS). - You need transactional messaging across multiple queues (e.g., send to queue A and queue B atomically). - You need lower latency for real-time communication. SQS has a polling model that introduces latency; Amazon MQ's push-based delivery can be faster for time-sensitive workloads. - Your application requires specific features like scheduled messages, delayed delivery, message groups with flexible ordering, or custom dead-letter strategies at the broker level.
Amazon MQ is NOT the right choice when: - You want fully serverless, no infrastructure management. Amazon MQ requires you to manage broker instances (though it automates patching and failover). SQS and SNS are fully serverless. - Your throughput needs are extremely high. SQS Standard scales to unlimited. Amazon MQ is limited by the broker instance size (max 1000+ connections per broker, but instance types have limits). - Your payload size is small (under 256 KB). SQS and SNS handle this natively without needing message chunking. - You need exactly-once processing. SQS FIFO provides this; Amazon MQ requires idempotent consumers and broker-level deduplication, which is not as straightforward.
Cost comparison: Amazon MQ instances start at around $30/month for a small broker and increase with size. In contrast, SQS and SNS pay-per-use costs are negligible for low volume but grow linearly. Above about 100 million messages per month, Amazon MQ may become cheaper than SQS's API request costs, but only if you use the broker efficiently.
Decision rule: Use SQS/SNS for cloud-native applications where serverless is a priority. Use Amazon MQ for migrations, when you need JMS/AMQP compatibility, or when you require advanced broker-level features. If you're starting from scratch and don't have a legacy protocol requirement, SQS+SNS is almost always the simpler, more cost-effective choice.
Infrastructure as Code — Terraform SNS+SQS Fan-Out Setup
Lambda Throttling + SNS = 10,000 Lost Order Events
ApproximateAgeOfOldestMessage > 5 minutes on the SQS queue
Rule: If you can't lose the message, never subscribe Lambda directly to SNS in production. Always use SQS as the durable buffer.- SNS to Lambda = at-most-once delivery when throttling hits.
- SNS to SQS = at-least-once delivery + durable storage.
- Add DLQ to every SQS queue. maxReceiveCount=3.
- Monitor DLQ depth. A message in DLQ is a service bug.
- SQS ApproximateAgeOfOldestMessage alarm = early warning system.
delete_message() call. Also check VisibilityTimeout: if processing takes longer than timeout, message reappears. Increase timeout or send heartbeat.That's Cloud. Mark it forged?
12 min read · try the examples if you haven't