Senior 6 min · June 25, 2026

Backpressure: Stop Your System From Drowning in Requests

Q: What's the difference between backpressure and rate limiting?

Rate limiting is a proactive policy that restricts the rate of requests from a client, often based on a fixed quota. Backpressure is a reactive signal from the consumer to the producer when the consumer is overloaded. Rate limiting is configured ahead of time; backpressure emerges from system load. Use rate limiting for external APIs and backpressure for internal pipelines.

Q: How do I implement backpressure in a Kafka consumer?

Set max.poll.records to a low value (e.g., 10) to limit how many records are fetched per poll. Process records within the poll loop and commit after processing. Avoid putting records into an unbounded internal queue. If you must use an internal queue, use a bounded BlockingQueue with blocking put() to throttle the poll loop.

Q: Can backpressure cause deadlocks?

Yes, if a producer and consumer share the same thread pool and the queue is bounded. If a task in the pool tries to submit another task to the same pool and the queue is full, the task blocks, consuming a thread. If all threads are blocked, no tasks can complete, causing deadlock. Use separate thread pools or non-blocking patterns to avoid this.

Backpressure explained with real production patterns.

Naren Founder & Principal Engineer

20+ years shipping large-scale distributed systems. Notes here come from systems that actually shipped.

✓ Production

production tested

June 25, 2026

last updated

1,663

articles · all by Naren

● Production Incident 🔎 Debug Guide ⚙ Triage Commands

⚡Quick Answer

Backpressure is how a system tells its upstream, 'I'm full, stop sending.' It's implemented via bounded queues, reactive streams (e.g., Java Flow API, ReactiveX), or explicit throttling signals (e.g., HTTP 429, TCP window scaling). Without it, you get OOM errors, connection pool exhaustion, and silent data loss.

✦ Definition~90s read

What is Backpressure?

Backpressure is a flow control mechanism where a downstream component signals upstream to slow down or stop sending data when it can't keep up. It prevents buffer overflows, resource exhaustion, and cascading failures in distributed systems.

★

Imagine you're a bartender pouring shots as fast as you can.

Plain-English First

Imagine you're a bartender pouring shots as fast as you can. If customers order faster than you can pour, you either spill drinks everywhere (crash) or you tell them to wait. Backpressure is that 'wait' signal. It's the bartender saying, 'Hold up, I'm still on the last round.' In software, it's the downstream service telling upstream, 'I'm at capacity, back off.'

You've seen it happen. A sudden traffic spike hits your service. Latency climbs. Then throughput flatlines. Then the whole thing falls over with an OOM killer message at 3am. The root cause? No backpressure. Your system kept accepting work it couldn't handle, buffers grew unbounded, and the JVM choked. This isn't a theory problem — it's the #1 cause of cascading failures in microservices.

Backpressure is the mechanism that prevents this. It's not optional in any system that processes async data streams — message queues, event pipelines, HTTP servers, database connection pools. Without it, you're gambling that your peak load never exceeds your capacity. That's a bet you'll lose.

By the end of this, you'll know how to implement backpressure in your async pipelines, what patterns actually work in production, and — more importantly — which ones will burn you. You'll be able to diagnose backpressure failures from logs and metrics, and you'll have a mental model for designing systems that degrade gracefully instead of catastrophically.

Why Backpressure Exists: The Problem of Unbounded Buffers

Every async system has a buffer somewhere — a queue, a channel, a buffer in memory. Buffers smooth out load spikes. But they also hide the fact that downstream can't keep up. When the buffer is unbounded, it grows until it eats all memory. Then the process OOMs. Then the load shifts to the next service, which OOMs too. That's a cascading failure.

Without backpressure, you have two failure modes: either you drop data silently (if you have a bounded buffer with no backpressure signal) or you crash (if the buffer is unbounded). Both are bad. Backpressure gives you a third option: slow down the producer so the system stays stable.

In production, the most common symptom of missing backpressure is the 'hockey stick' latency graph. Latency stays flat until some threshold, then shoots to infinity. That's the buffer filling up. The fix isn't more memory — it's backpressure.

UnboundedQueueExample.javaJAVA

// io.thecodeforge — System Design tutorial

import java.util.concurrent.*;

public class UnboundedQueueExample {
    // BAD: unbounded queue — will OOM under load
    private static final ExecutorService executor = Executors.newFixedThreadPool(10);
    
    public static void main(String[] args) {
        // Simulate fast producer, slow consumer
        for (int i = 0; ; i++) {
            final int taskId = i;
            executor.submit(() -> {
                try {
                    Thread.sleep(1000); // slow consumer
                } catch (InterruptedException e) {}
                System.out.println("Processed " + taskId);
            });
        }
    }
}
// Output: eventually java.lang.OutOfMemoryError: Java heap space

Output

Eventually: Exception in thread "main" java.lang.OutOfMemoryError: Java heap space

Production Trap: Unbounded Queues

Executors.newFixedThreadPool() uses an unbounded LinkedBlockingQueue. Under load, it will OOM. Always use a bounded queue with a rejection policy. The error message is 'Java heap space' — not 'queue full'.

thecodeforge.io

Backpressure Flow: From Unbounded Load to System Stability

Backpressure

Bounded Queues: The First Line of Defense

The simplest backpressure mechanism is a bounded queue. You set a maximum capacity. When the queue is full, the producer must either block, drop, or throw. This forces the producer to slow down.

In Java, ArrayBlockingQueue is your friend. It's a fixed-size array-based queue. When full, the put() method blocks until space is available. That blocking is the backpressure signal — it propagates upstream, eventually slowing the source.

But blocking has trade-offs. If the producer is a network thread, blocking it can starve other connections. That's why you need to think about where the backpressure propagates. In a web server, blocking the request thread is fine — the client will wait. In a Kafka consumer, blocking the poll loop will cause rebalances. Know your context.

BoundedQueueExample.javaJAVA

// io.thecodeforge — System Design tutorial

import java.util.concurrent.*;

public class BoundedQueueExample {
    // GOOD: bounded queue with blocking backpressure
    private static final BlockingQueue<Runnable> queue = new ArrayBlockingQueue<>(100);
    private static final ExecutorService executor = new ThreadPoolExecutor(
            10, 10, 0L, TimeUnit.MILLISECONDS, queue);
    
    public static void main(String[] args) throws InterruptedException {
        // Producer blocks when queue is full
        for (int i = 0; ; i++) {
            final int taskId = i;
            // put() blocks until space available
            queue.put(() -> {
                try {
                    Thread.sleep(1000);
                } catch (InterruptedException e) {}
                System.out.println("Processed " + taskId);
            });
        }
    }
}
// Output: tasks processed at rate of consumer, producer blocks

Output

Processed 0

Processed 1

... (steady rate, no OOM)

Senior Shortcut: ThreadPoolExecutor with Bounded Queue

Use ThreadPoolExecutor constructor directly instead of Executors factory. You control queue type and rejection policy. The CallerRunsPolicy is often the best — it runs the task on the producer thread, naturally throttling the producer.

thecodeforge.io

Bounded Queue Backpressure Flow

Backpressure

Reactive Streams: The Demand-Driven Approach

Bounded queues are imperative — they push back when full. Reactive Streams (ReactiveX, Java Flow API) are declarative: the consumer tells the producer how much it can handle. This is called 'demand signaling'.

In Reactive Streams, the Subscriber calls request(n) on the Subscription to indicate it's ready for n items. The Publisher must not send more than requested. This is backpressure built into the protocol.

This pattern shines in data pipelines where you have multiple stages. Each stage requests only what it can process. The backpressure propagates all the way to the source. No buffers overflow because no stage sends more than the next stage can consume.

The downside? Complexity. Reactive code is harder to debug. Stack traces are useless. And if any stage forgets to request(), the pipeline stalls silently. I've seen production incidents where a misconfigured buffer caused a 30-minute data delay because a downstream subscriber requested too few items.

ReactiveBackpressureExample.javaJAVA

// io.thecodeforge — System Design tutorial

import java.util.concurrent.Flow.*;
import java.util.concurrent.SubmissionPublisher;

public class ReactiveBackpressureExample {
    public static void main(String[] args) throws InterruptedException {
        SubmissionPublisher<Integer> publisher = new SubmissionPublisher<>();
        
        Subscriber<Integer> subscriber = new Subscriber<>() {
            private Subscription subscription;
            private static final int REQUEST_SIZE = 5;
            
            @Override
            public void onSubscribe(Subscription subscription) {
                this.subscription = subscription;
                subscription.request(REQUEST_SIZE); // demand signal
            }
            
            @Override
            public void onNext(Integer item) {
                System.out.println("Processing " + item);
                try { Thread.sleep(200); } catch (InterruptedException e) {}
                subscription.request(1); // request one more after processing
            }
            
            @Override
            public void onError(Throwable t) { t.printStackTrace(); }
            
            @Override
            public void onComplete() { System.out.println("Done"); }
        };
        
        publisher.subscribe(subscriber);
        for (int i = 0; i < 20; i++) {
            publisher.submit(i);
        }
        publisher.close();
        Thread.sleep(5000);
    }
}
// Output: processes at consumer rate, never more than REQUEST_SIZE buffered

Output

Processing 0

Processing 1

... (steady rate, no overflow)

The Classic Bug: Forgetting to request()

thecodeforge.io

Reactive Streams vs Bounded Queues

Backpressure

Backpressure in HTTP: 429 Too Many Requests

In HTTP services, backpressure is often implemented as rate limiting with 429 status code. The server tells the client 'slow down' by returning a Retry-After header. This is explicit backpressure at the application layer.

But 429 is a blunt instrument. It works well for external APIs. For internal microservices, you want something more nuanced — like circuit breakers or bulkheads. 429 can cause clients to retry aggressively, making things worse. Always implement exponential backoff with jitter on the client side.

A better pattern for internal services is to use a bounded queue with a rejection policy that returns 503 (Service Unavailable) when the queue is full. This signals the caller to back off, and the load balancer can route to another instance. Combine with circuit breakers to prevent cascading.

HttpBackpressureExample.javaJAVA

// io.thecodeforge — System Design tutorial

import com.sun.net.httpserver.*;
import java.io.*;
import java.net.InetSocketAddress;
import java.util.concurrent.*;

public class HttpBackpressureExample {
    private static final BlockingQueue<Runnable> queue = new ArrayBlockingQueue<>(100);
    private static final ExecutorService executor = new ThreadPoolExecutor(
            10, 10, 0L, TimeUnit.MILLISECONDS, queue,
            new ThreadPoolExecutor.AbortPolicy()); // throws RejectedExecutionException
    
    public static void main(String[] args) throws IOException {
        HttpServer server = HttpServer.create(new InetSocketAddress(8080), 0);
        server.setExecutor(executor);
        server.createContext("/process", exchange -> {
            try {
                executor.submit(() -> {
                    try {
                        Thread.sleep(1000); // simulate work
                        String response = "Processed";
                        exchange.sendResponseHeaders(200, response.length());
                        exchange.getResponseBody().write(response.getBytes());
                    } catch (Exception e) {}
                });
            } catch (RejectedExecutionException e) {
                String response = "Too many requests";
                exchange.sendResponseHeaders(503, response.length());
                exchange.getResponseBody().write(response.getBytes());
            }
        });
        server.start();
        System.out.println("Server started on port 8080");
    }
}
// Output: returns 503 when queue full

Output

HTTP/1.1 503 Service Unavailable

Content-Length: 17

Too many requests

Interview Gold: 429 vs 503 for Backpressure

Backpressure in Message Queues: Kafka, RabbitMQ, SQS

Message brokers handle backpressure differently. Kafka consumers control their own pace — they poll at their own rate. The broker doesn't push. So backpressure is implicit: if you poll slowly, you consume slowly. The problem is that the consumer's internal processing pipeline might still overflow if it buffers messages internally.

RabbitMQ uses consumer prefetch. Set a prefetch count to limit how many unacknowledged messages a consumer can have. This is explicit backpressure. If your consumer processes slowly, RabbitMQ stops sending more. But if your consumer crashes, messages can be redelivered, causing duplicates.

SQS has no built-in backpressure. You must implement it yourself. The consumer polls messages, processes them, and deletes them. If processing is slow, you can reduce the polling frequency or use a circuit breaker. But SQS will keep delivering messages as long as you poll. This is a common source of unbounded growth in serverless architectures.

KafkaConsumerBackpressure.javaJAVA

// io.thecodeforge — System Design tutorial

import org.apache.kafka.clients.consumer.*;
import java.time.Duration;
import java.util.*;

public class KafkaConsumerBackpressure {
    public static void main(String[] args) {
        Properties props = new Properties();
        props.put("bootstrap.servers", "localhost:9092");
        props.put("group.id", "my-group");
        props.put("enable.auto.commit", "false");
        props.put("max.poll.records", "10"); // backpressure: limit per poll
        
        KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
        consumer.subscribe(Arrays.asList("my-topic"));
        
        while (true) {
            ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(100));
            for (ConsumerRecord<String, String> record : records) {
                process(record);
            }
            consumer.commitSync(); // commit after processing
        }
    }
    
    private static void process(ConsumerRecord<String, String> record) {
        // simulate processing
        try { Thread.sleep(100); } catch (InterruptedException e) {}
        System.out.println("Processed: " + record.value());
    }
}
// Output: processes at consumer pace, never more than 10 records in flight

Output

Processed: message-0

Processed: message-1

... (steady rate)

Never Do This: Unbounded Internal Buffer in Kafka Consumer

When Backpressure Breaks: Anti-Patterns and Gotchas

Backpressure isn't a silver bullet. Here are the ways it fails in production.

Deadlock with blocking queues. If your producer and consumer share the same thread pool, blocking can cause deadlock. Example: a web server thread submits a task to a bounded queue, and the task tries to submit another task to the same queue. If the queue is full, the first task blocks, consuming a thread, and the second task never runs. Fix: use separate thread pools or non-blocking patterns.

Starvation with priority inversion. If a low-priority task holds a resource that a high-priority task needs, and the low-priority task is blocked by backpressure, the high-priority task starves. This is rare but nasty.

Backpressure amplification. If every service in a chain applies backpressure independently, the system can become overly conservative. A transient slowdown at the tail can cause the head to stall completely. Use circuit breakers with timeouts to break the chain.

Monitoring blind spots. Backpressure hides problems. If the queue is always full, you might think the system is healthy because it's not crashing. But latency is high. Monitor queue depth and processing latency, not just throughput.

DeadlockExample.javaJAVA

// io.thecodeforge — System Design tutorial

import java.util.concurrent.*;

public class DeadlockExample {
    private static final BlockingQueue<Runnable> queue = new ArrayBlockingQueue<>(1);
    private static final ExecutorService executor = new ThreadPoolExecutor(
            1, 1, 0L, TimeUnit.MILLISECONDS, queue);
    
    public static void main(String[] args) throws InterruptedException {
        executor.submit(() -> {
            System.out.println("Task 1 started");
            try {
                // This will block because queue is full (size 1) and no threads available
                executor.submit(() -> System.out.println("Task 2")).get();
            } catch (Exception e) {
                System.out.println("Deadlock: " + e.getMessage());
            }
        });
        executor.shutdown();
    }
}
// Output: deadlock — Task 1 blocks forever

Output

Task 1 started

(program hangs)

Never Do This: Submitting to Same Executor Inside Task

Backpressure in Distributed Systems: Circuit Breakers and Bulkheads

In distributed systems, backpressure must be combined with circuit breakers and bulkheads. A circuit breaker monitors failure rates and opens when too many requests fail, preventing calls to a downstream that's already struggling. This is a form of backpressure — it stops the flow of requests to a failing service.

Bulkheads isolate resources. If one service's thread pool is exhausted, it doesn't affect others. This limits the blast radius of backpressure. For example, separate thread pools for different downstream services.

Together, these patterns create a system that degrades gracefully. When a downstream service slows down, the circuit breaker opens, requests are rejected fast (fail-fast), and the upstream doesn't accumulate work. This is better than letting backpressure propagate and stall everything.

The trade-off is complexity. You need to tune timeouts, thresholds, and pool sizes. Get it wrong and you'll have false positives (circuit breaker opens when it shouldn't) or false negatives (doesn't open when it should). Monitor and adjust.

CircuitBreakerExample.javaJAVA

// io.thecodeforge — System Design tutorial

import java.time.*;
import java.util.concurrent.atomic.*;

public class CircuitBreakerExample {
    enum State { CLOSED, OPEN, HALF_OPEN }
    private final AtomicReference<State> state = new AtomicReference<>(State.CLOSED);
    private final AtomicInteger failureCount = new AtomicInteger(0);
    private final int threshold = 5;
    private final long timeoutMillis = 10000;
    private volatile long lastFailureTime;
    
    public boolean call(Runnable operation) {
        if (state.get() == State.OPEN) {
            if (System.currentTimeMillis() - lastFailureTime > timeoutMillis) {
                state.compareAndSet(State.OPEN, State.HALF_OPEN);
            } else {
                return false; // fast fail
            }
        }
        try {
            operation.run();
            if (state.get() == State.HALF_OPEN) {
                state.set(State.CLOSED);
                failureCount.set(0);
            }
            return true;
        } catch (Exception e) {
            failureCount.incrementAndGet();
            lastFailureTime = System.currentTimeMillis();
            if (failureCount.get() >= threshold) {
                state.set(State.OPEN);
            }
            return false;
        }
    }
}
// Usage: circuitBreaker.call(() -> downstreamService.process(data));

Output

Returns false when circuit is open, preventing calls to downstream

Senior Shortcut: Use Resilience4j, Don't Roll Your Own

Monitoring Backpressure: What to Watch For

You can't fix what you don't measure. Here are the key metrics for backpressure:

Queue depth. How many items are waiting? If it's consistently near capacity, you're at risk. Alert on queue depth > 80% of capacity.

Processing latency. The time from item arrival to processing start. If this grows, backpressure is building.

Rejection rate. How many requests are rejected due to full queues? A non-zero rate is okay — it means backpressure is working. But if it's high, you need more capacity.

Thread pool utilization. Are all threads busy? If yes, and queue is growing, you need more threads or better backpressure.

Circuit breaker state. Monitor how often circuits open and close. Frequent toggling indicates instability.

In production, I've seen teams ignore queue depth until it hits the limit and starts rejecting. By then, latency is already terrible. Set proactive alerts.

MonitoringExample.javaJAVA

// io.thecodeforge — System Design tutorial

import java.lang.management.*;
import java.util.concurrent.*;

public class MonitoringExample {
    private static final BlockingQueue<Runnable> queue = new ArrayBlockingQueue<>(100);
    private static final ThreadPoolExecutor executor = new ThreadPoolExecutor(
            10, 10, 0L, TimeUnit.MILLISECONDS, queue);
    
    public static void main(String[] args) throws InterruptedException {
        // Simulate load
        for (int i = 0; i < 200; i++) {
            executor.submit(() -> {
                try { Thread.sleep(1000); } catch (InterruptedException e) {}
            });
        }
        
        // Monitor
        System.out.println("Queue size: " + executor.getQueue().size());
        System.out.println("Active threads: " + executor.getActiveCount());
        System.out.println("Completed tasks: " + executor.getCompletedTaskCount());
        System.out.println("Rejected tasks: " + executor.getRejectedExecutionHandler());
        
        executor.shutdown();
    }
}
// Output: shows queue depth, active threads, etc.

Output

Queue size: 100

Active threads: 10

Completed tasks: 0

Rejected tasks: java.util.concurrent.ThreadPoolExecutor$AbortPolicy@...

Production Trap: Ignoring Queue Depth

● Production incidentPOST-MORTEMseverity: high

The 4GB Container That Kept Dying

Symptom

A payment processing service would run fine for hours, then suddenly OOM. Heap dumps showed a LinkedBlockingQueue with millions of pending transaction objects. CPU was low, memory was gone.

Assumption

Team assumed a memory leak in the transaction object. Spent weeks profiling object allocations.

Root cause

The upstream Kafka consumer had no backpressure. It polled messages as fast as possible and offered them to an unbounded ExecutorService queue. When the downstream payment gateway slowed down (latency spikes), the queue grew unbounded until heap exhausted. The thread pool's work queue was a LinkedBlockingQueue with Integer.MAX_VALUE capacity.

Fix

Switched to a bounded queue with ArrayBlockingQueue(1000). Set RejectedExecutionHandler to CallerRunsPolicy, which throttles the Kafka consumer by blocking the poll thread. Also added a circuit breaker on the payment gateway client.

Key lesson

Unbounded queues are a ticking time bomb.
Always bound your queues and decide what happens when they're full — blocking, dropping, or rejecting.

Production debug guideSystematic recovery paths for the failure modes engineers actually hit.3 entries

Symptom · 01

java.lang.OutOfMemoryError: Java heap space in async processor

→

Fix

1. Check thread pool queue type — is it bounded? 2. Dump heap and look for large collections (LinkedBlockingQueue, ArrayList). 3. Set a hard limit on queue size. 4. Add rejection policy (CallerRunsPolicy or AbortPolicy). 5. Restart with new config.

Symptom · 02

Latency spikes then throughput drops to zero

→

Fix

1. Check thread pool queue depth via JMX or /actuator/metrics. 2. Check if threads are blocked (jstack). 3. Look for deadlock between producer and consumer threads. 4. Increase queue capacity or add more threads temporarily. 5. Implement backpressure with blocking put.

Symptom · 03

Circuit breaker toggling open/closed frequently

→

Fix

1. Check downstream latency and error rate. 2. Increase circuit breaker threshold or timeout. 3. Add bulkhead (separate thread pool) for that downstream. 4. If downstream is healthy, check for false positives due to transient spikes. 5. Tune sliding window size.

★ Backpressure Triage Cheat SheetFirst-response commands for when things go wrong — copy-paste ready.

OOM: `java.lang.OutOfMemoryError: Java heap space`−

Immediate action

Check if thread pool queue is unbounded

Commands

jcmd <pid> VM.native_memory summary

jstack <pid> | grep -A 20 'pool-'

Fix now

Set queue capacity: new ArrayBlockingQueue<>(1000). Set rejection policy: new ThreadPoolExecutor.CallerRunsPolicy()

High latency, low throughput+

Circuit breaker open+

Reactive stream stall+

Feature / Aspect	Bounded Queue (Blocking)	Reactive Streams	HTTP 429/503
Backpressure mechanism	Block producer when full	Demand signal (request(n))	Status code + Retry-After
Complexity	Low	High	Medium
Latency impact	Increases with queue depth	Controlled by demand	Immediate rejection
Best for	In-process async, thread pools	Data pipelines, streaming	HTTP APIs, external clients
Failure mode	Deadlock if not careful	Stall if `request()` forgotten	Client retry storms if no backoff

Key takeaways

Backpressure is not optional in async systems

unbounded buffers are a ticking time bomb that will OOM under load.

Always bound your queues and decide what happens when they're full

block, drop, or reject. Blocking is the simplest backpressure signal.

Reactive streams give fine-grained demand control but add complexity and debugging difficulty. Use them only when you need non-blocking pipelines.

Combine backpressure with circuit breakers and bulkheads to prevent cascading failures in distributed systems.

INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01SENIOR

How does backpressure prevent cascading failures in a microservices arch...

Q02SENIOR

When would you choose a bounded blocking queue over reactive streams for...

Q03SENIOR

What happens when a Kafka consumer's internal processing queue is unboun...

Q04JUNIOR

What is backpressure and why is it important in async systems?

Q05SENIOR

You notice a service's latency is spiking and throughput drops to zero. ...

Q06SENIOR

Design a system that processes a high-volume event stream with backpress...

Q01 of 06SENIOR

How does backpressure prevent cascading failures in a microservices architecture?

ANSWER

Backpressure limits the amount of work in flight. When a downstream service slows down, backpressure propagates upstream, causing the upstream to also slow down or reject requests. This prevents buffers from growing unbounded and avoids OOM. Combined with circuit breakers, it isolates failures so they don't cascade.

FAQ · 4 QUESTIONS

Frequently Asked Questions

What is backpressure in system design?

What's the difference between backpressure and rate limiting?

How do I implement backpressure in a Kafka consumer?

Can backpressure cause deadlocks?

Naren Founder & Principal Engineer

20+ years shipping large-scale distributed systems. Notes here come from systems that actually shipped.

✓ Verified

production tested

June 25, 2026

last updated

1,663

articles · all by Naren

🔥

That's Async & Data Processing. Mark it forged?

6 min read · try the examples if you haven't