Backpressure: Stop Your System From Drowning in Requests
Backpressure explained with real production patterns.
20+ years shipping large-scale distributed systems. Notes here come from systems that actually shipped.
Backpressure is how a system tells its upstream, 'I'm full, stop sending.' It's implemented via bounded queues, reactive streams (e.g., Java Flow API, ReactiveX), or explicit throttling signals (e.g., HTTP 429, TCP window scaling). Without it, you get OOM errors, connection pool exhaustion, and silent data loss.
Imagine you're a bartender pouring shots as fast as you can. If customers order faster than you can pour, you either spill drinks everywhere (crash) or you tell them to wait. Backpressure is that 'wait' signal. It's the bartender saying, 'Hold up, I'm still on the last round.' In software, it's the downstream service telling upstream, 'I'm at capacity, back off.'
You've seen it happen. A sudden traffic spike hits your service. Latency climbs. Then throughput flatlines. Then the whole thing falls over with an OOM killer message at 3am. The root cause? No backpressure. Your system kept accepting work it couldn't handle, buffers grew unbounded, and the JVM choked. This isn't a theory problem — it's the #1 cause of cascading failures in microservices.
Backpressure is the mechanism that prevents this. It's not optional in any system that processes async data streams — message queues, event pipelines, HTTP servers, database connection pools. Without it, you're gambling that your peak load never exceeds your capacity. That's a bet you'll lose.
By the end of this, you'll know how to implement backpressure in your async pipelines, what patterns actually work in production, and — more importantly — which ones will burn you. You'll be able to diagnose backpressure failures from logs and metrics, and you'll have a mental model for designing systems that degrade gracefully instead of catastrophically.
Why Backpressure Exists: The Problem of Unbounded Buffers
Every async system has a buffer somewhere — a queue, a channel, a buffer in memory. Buffers smooth out load spikes. But they also hide the fact that downstream can't keep up. When the buffer is unbounded, it grows until it eats all memory. Then the process OOMs. Then the load shifts to the next service, which OOMs too. That's a cascading failure.
Without backpressure, you have two failure modes: either you drop data silently (if you have a bounded buffer with no backpressure signal) or you crash (if the buffer is unbounded). Both are bad. Backpressure gives you a third option: slow down the producer so the system stays stable.
In production, the most common symptom of missing backpressure is the 'hockey stick' latency graph. Latency stays flat until some threshold, then shoots to infinity. That's the buffer filling up. The fix isn't more memory — it's backpressure.
Executors.newFixedThreadPool() uses an unbounded LinkedBlockingQueue. Under load, it will OOM. Always use a bounded queue with a rejection policy. The error message is 'Java heap space' — not 'queue full'.Bounded Queues: The First Line of Defense
The simplest backpressure mechanism is a bounded queue. You set a maximum capacity. When the queue is full, the producer must either block, drop, or throw. This forces the producer to slow down.
In Java, ArrayBlockingQueue is your friend. It's a fixed-size array-based queue. When full, the put() method blocks until space is available. That blocking is the backpressure signal — it propagates upstream, eventually slowing the source.
But blocking has trade-offs. If the producer is a network thread, blocking it can starve other connections. That's why you need to think about where the backpressure propagates. In a web server, blocking the request thread is fine — the client will wait. In a Kafka consumer, blocking the poll loop will cause rebalances. Know your context.
Reactive Streams: The Demand-Driven Approach
Bounded queues are imperative — they push back when full. Reactive Streams (ReactiveX, Java Flow API) are declarative: the consumer tells the producer how much it can handle. This is called 'demand signaling'.
In Reactive Streams, the Subscriber calls request(n) on the Subscription to indicate it's ready for n items. The Publisher must not send more than requested. This is backpressure built into the protocol.
This pattern shines in data pipelines where you have multiple stages. Each stage requests only what it can process. The backpressure propagates all the way to the source. No buffers overflow because no stage sends more than the next stage can consume.
The downside? Complexity. Reactive code is harder to debug. Stack traces are useless. And if any stage forgets to request(), the pipeline stalls silently. I've seen production incidents where a misconfigured buffer caused a 30-minute data delay because a downstream subscriber requested too few items.
Backpressure in HTTP: 429 Too Many Requests
In HTTP services, backpressure is often implemented as rate limiting with 429 status code. The server tells the client 'slow down' by returning a Retry-After header. This is explicit backpressure at the application layer.
But 429 is a blunt instrument. It works well for external APIs. For internal microservices, you want something more nuanced — like circuit breakers or bulkheads. 429 can cause clients to retry aggressively, making things worse. Always implement exponential backoff with jitter on the client side.
A better pattern for internal services is to use a bounded queue with a rejection policy that returns 503 (Service Unavailable) when the queue is full. This signals the caller to back off, and the load balancer can route to another instance. Combine with circuit breakers to prevent cascading.
Backpressure in Message Queues: Kafka, RabbitMQ, SQS
Message brokers handle backpressure differently. Kafka consumers control their own pace — they poll at their own rate. The broker doesn't push. So backpressure is implicit: if you poll slowly, you consume slowly. The problem is that the consumer's internal processing pipeline might still overflow if it buffers messages internally.
RabbitMQ uses consumer prefetch. Set a prefetch count to limit how many unacknowledged messages a consumer can have. This is explicit backpressure. If your consumer processes slowly, RabbitMQ stops sending more. But if your consumer crashes, messages can be redelivered, causing duplicates.
SQS has no built-in backpressure. You must implement it yourself. The consumer polls messages, processes them, and deletes them. If processing is slow, you can reduce the polling frequency or use a circuit breaker. But SQS will keep delivering messages as long as you poll. This is a common source of unbounded growth in serverless architectures.
When Backpressure Breaks: Anti-Patterns and Gotchas
Backpressure isn't a silver bullet. Here are the ways it fails in production.
Deadlock with blocking queues. If your producer and consumer share the same thread pool, blocking can cause deadlock. Example: a web server thread submits a task to a bounded queue, and the task tries to submit another task to the same queue. If the queue is full, the first task blocks, consuming a thread, and the second task never runs. Fix: use separate thread pools or non-blocking patterns.
Starvation with priority inversion. If a low-priority task holds a resource that a high-priority task needs, and the low-priority task is blocked by backpressure, the high-priority task starves. This is rare but nasty.
Backpressure amplification. If every service in a chain applies backpressure independently, the system can become overly conservative. A transient slowdown at the tail can cause the head to stall completely. Use circuit breakers with timeouts to break the chain.
Monitoring blind spots. Backpressure hides problems. If the queue is always full, you might think the system is healthy because it's not crashing. But latency is high. Monitor queue depth and processing latency, not just throughput.
Backpressure in Distributed Systems: Circuit Breakers and Bulkheads
In distributed systems, backpressure must be combined with circuit breakers and bulkheads. A circuit breaker monitors failure rates and opens when too many requests fail, preventing calls to a downstream that's already struggling. This is a form of backpressure — it stops the flow of requests to a failing service.
Bulkheads isolate resources. If one service's thread pool is exhausted, it doesn't affect others. This limits the blast radius of backpressure. For example, separate thread pools for different downstream services.
Together, these patterns create a system that degrades gracefully. When a downstream service slows down, the circuit breaker opens, requests are rejected fast (fail-fast), and the upstream doesn't accumulate work. This is better than letting backpressure propagate and stall everything.
The trade-off is complexity. You need to tune timeouts, thresholds, and pool sizes. Get it wrong and you'll have false positives (circuit breaker opens when it shouldn't) or false negatives (doesn't open when it should). Monitor and adjust.
Monitoring Backpressure: What to Watch For
You can't fix what you don't measure. Here are the key metrics for backpressure:
Queue depth. How many items are waiting? If it's consistently near capacity, you're at risk. Alert on queue depth > 80% of capacity.
Processing latency. The time from item arrival to processing start. If this grows, backpressure is building.
Rejection rate. How many requests are rejected due to full queues? A non-zero rate is okay — it means backpressure is working. But if it's high, you need more capacity.
Thread pool utilization. Are all threads busy? If yes, and queue is growing, you need more threads or better backpressure.
Circuit breaker state. Monitor how often circuits open and close. Frequent toggling indicates instability.
In production, I've seen teams ignore queue depth until it hits the limit and starts rejecting. By then, latency is already terrible. Set proactive alerts.
The 4GB Container That Kept Dying
- Unbounded queues are a ticking time bomb.
- Always bound your queues and decide what happens when they're full — blocking, dropping, or rejecting.
jcmd <pid> VM.native_memory summaryjstack <pid> | grep -A 20 'pool-'ThreadPoolExecutor.CallerRunsPolicy()Key takeaways
Interview Questions on This Topic
How does backpressure prevent cascading failures in a microservices architecture?
Frequently Asked Questions
20+ years shipping large-scale distributed systems. Notes here come from systems that actually shipped.
That's Async & Data Processing. Mark it forged?
6 min read · try the examples if you haven't