Mid 9 min · May 23, 2026

Spring Boot At 10x Load: The Patterns That Survive Production

Stop guessing at Spring Boot performance.

N
Naren · Founder
Plain-English first. Then code. Then the interview question.
About
 ● Production Incident 🔎 Debug Guide ⚙ Triage Commands
Quick Answer
  • Thread pool sizing is a trap; measure blocking, not CPU cores
  • Connection pools must be tuned for your specific DB latency, not defaults
  • Reactive isn't always faster; it shifts the bottleneck, doesn't remove it
  • Caching is a hot path invariant, not an afterthought
  • Metrics without action are just expensive log files
✦ Definition~90s read
What is Spring Boot At 10x Load?

High traffic handling in Spring Boot isn't about scaling out to 50 pods. That's a band-aid. It's about making each pod handle 10x more. It's thread pool management, connection pooling, reactive vs imperative trade-offs, and knowing exactly where your blocking calls live. The JVM is fast. Your code is the bottleneck. Identify it. Kill it. Repeat.

Imagine a busy kitchen.

You can't fix performance by adding hardware. You fix it by removing waste. Every lock, every DB round trip, every serialization step — they all add milliseconds. At 10,000 RPS, those milliseconds become seconds of queue time. The difference between a Senior and a Junior is knowing which milliseconds to fight for and which to accept.

This isn't theory. I've seen the same patterns fail repeatedly. The default Tomcat thread pool of 200 will make your DB fall over. The default HikariCP of 10 will keep you waiting. And calling a REST API synchronously inside a request will turn your throughput into a single-lane road. Let's fix it.

Plain-English First

Imagine a busy kitchen. If you have one chef doing everything, orders pile up. Spring Boot is like that kitchen. High traffic handling is about having the right number of chefs (threads), the right ovens (databases), and knowing when to prep food in advance (caching) vs cooking on demand. Get it wrong, and customers leave angry. Get it right, and you serve thousands without breaking a sweat.

Thursday, 2:47 AM. PagerDuty screaming. 500 errors flooding in. Your customers can't check out. Your boss is calling. Your hands are sweating. Welcome to the club.

I've been there more times than I care to count. Every time, the root cause is the same: someone assumed default configuration would handle production load. It never does. Spring Boot defaults are for getting started, not for getting paid.

The worst part? The fix is usually small. A config change. A thread pool limit. A missing index. But those small things compound into catastrophic failures when traffic spikes. Black Friday. Product launch. A tweet from an influencer. 10x load in 30 seconds. Your app melts.

Here's the hard truth: most performance problems aren't bugs. They're design flaws exposed by load. Your code works fine at 100 RPS. At 1000 RPS, every sin shows up. Blocking calls on the main thread. Lazy initialization in request paths. Connection pools that assume 100ms queries but get 2-second ones under load.

You can't wing this. You need a strategy. You need to know your numbers. What's your average response time at idle? What's your P99 at 80% CPU? If you don't know those numbers, you're not engineering. You're hoping.

I'm writing this because I've seen too many teams burn out on performance fires that were predictable and preventable. This isn't a "best practices" list. This is a survival guide. Patterns that actually work in production. Trade-offs you need to make. Incidents that taught me painful lessons. Read it. Apply it. Stop getting paged.

Thread Pools: The First Thing That Breaks

Every junior thinks more threads = more speed. Wrong. Threads are not free. Each thread eats stack memory (default 1MB on 64-bit JVM). 200 threads = 200MB just for stacks. And that's before any object allocations. The real sin: threads fighting over locks. When your DB connection pool is 10 and you have 200 threads, 190 threads are doing nothing but spinning. They're not idle — they're burning CPU in park loops.

I once diagnosed a service where the P99 was 30 seconds. The team added more threads. It got worse. The fix: drop threads to 50, increase connection pool to 30. P99 dropped to 200ms. The lesson: measure queue depth, not thread count. Use Micrometer's tomcat.threads.busy metric. If it's close to config.max, you're not thread-starved. You're downstream-starved. The threads are waiting on something else (DB, API, cache). Adding more threads just makes that thing wait harder.

Virtual threads (Project Loom) change this equation. They're lightweight enough to have thousands. But they're not magic. If you block a virtual thread on a synchronized block, it pins the carrier thread. Monitor this with jdk.VirtualThreadPinned events. Virtual threads don't fix bad queries. They just let you wait more efficiently.

Rule of thumb: match your thread pool size to your connection pool size times some factor (1.5x-2x). Never exceed the number of connections. And always use a bounded queue. Unbounded queues in ephemeral thread pools will OOM your heap. I've seen it. It's not pretty.

Production Trap:
Never set server.tomcat.threads.max above your HikariCP maximum-pool-size. You'll create thread starvation disguised as DB slowness. Monitor tomcat.threads.busy — if it hits max, you've found your bottleneck.
Production Insight
Thread pool tuning is a lever, not a knob. Moving it without understanding the downstream load just shifts the bottleneck.
Key Takeaway
Threads are a proxy for concurrency, not parallelism. Tune for the bottleneck downstream.

Connection Pooling: The Silent Killer

HikariCP is the default. It's fast. But defaults will burn you. maximum-pool-size=10 is fine for a toy app. For production, you need to know your DB's max connections and your query latency. Formula: pool size = (peak TPS average query duration in seconds) / (number of app instances). For example: 1000 TPS 0.05s avg query = 50 concurrent queries. If you have 5 instances, each needs at least 10 connections. But real life isn't that clean. Add buffer for spikes. I usually target 1.5x the calculated value.

Here's the gotcha: connection pools are per-datasource. If you have read replicas, don't pool them the same way. Read replicas handle more concurrent connections, so you can pool higher. But no connection pool should exceed the DB's max_connections. Otherwise, you'll get the dreaded FATAL: sorry, too many clients already. Fix: set spring.datasource.hikari.maximum-pool-size=30 and spring.datasource.hikari.minimum-idle=5. The idle connections keep startup fast. The max prevents DB overload.

leak-detection-threshold is your friend. Set it to 60 seconds. If a connection is held longer than that, HikariCP logs a stack trace. You'll catch bugs like "forgot to close PreparedStatement" or "transaction never committed." I caught a memory leak in legacy code this way. 30 minutes of investigation saved an outage.

Connection timeout is critical. Don't set it too high. 30 seconds is the default. Under load, threads pile up waiting for connections that never come. That becomes a thread pool problem. Lower connectionTimeout to 5-10 seconds. Fail fast. Let the client retry. Don't let threads queue up waiting for a connection that's not coming.

Senior Shortcut:
Set spring.datasource.hikari.leak-detection-threshold=60000. It logs a stack trace when a connection is held too long. You'll find your slow queries and missing close() calls fast.
Production Insight
Connection pool sizing is a math problem, not an opinion. Calculate based on TPS and query latency. Guess and you lose.
Key Takeaway
Default pool sizes are for demos. Calculate yours based on actual load patterns.

Reactive vs Imperative: Pick Your Poison

Reactive (WebFlux) isn't faster. It's different. It trades thread-per-request for event-loop-driven processing. This makes sense when you have many I/O-bound operations (DB calls, HTTP calls) and you're hitting thread limits. But reactive has a cost: debugging is harder, stack traces are useless, and you need to be reactive all the way down. One blocking call in a reactive pipeline ruins everything.

I've seen teams adopt reactive because "it's more scalable." Then they spend weeks debugging why their reactive chain hangs. The root cause? A Thread.sleep() in a flatMap. Or a synchronized block. Or a legacy library that uses blocking I/O. Reactive is not a performance upgrade. It's a programming model shift. Do it for the right reasons: high concurrency with limited resources.

For most CRUD apps, imperative with virtual threads is the sweet spot. Virtual threads let you write blocking code without blocking a carrier thread. You get performance parity with reactive for 10% of the complexity. But beware: virtual threads pinned by synchronized or native frames. Profile with -Djdk.tracePinnedThreads=short. If you see pinned threads, refactor those synchronized blocks to ReentrantLock or use the concurrency utilities from java.util.concurrent.

Here's my rule: if your request handler makes more than 3 I/O calls, reactive might win. If it's 1-2 calls, virtual threads are simpler and faster to debug. If it's CPU-bound, neither helps — you need better algorithms. Measure, don't guess.

Interview Gold:
"Reactive is not faster. It's more concurrent per thread. The right choice depends on your bottleneck profile. Virtual threads blur the line." Use this answer. It shows depth.
Production Insight
Virtual threads made most reactive migrations unnecessary. I've deprecated two WebFlux services in favor of virtual threads. Same perf, half the bug count.
Key Takeaway
Reactive is a tool, not a religion. Choose based on bottleneck type, not hype.

Caching: The Only Free Lunch

Caching is the cheapest performance optimization you'll ever make. But most teams do it wrong. They throw a Redis cache in front of everything and hope for the best. That creates a new bottleneck: the cache itself. The real trick is caching at the right level. Data that changes rarely and is read often? Yes. Data that changes every request? No. And never cache without a TTL. Infinite TTL is infinite stale data.

Cache stampede is the production horror story I see most often. Multiple threads compute the same value simultaneously when a cache expires. This doubles or triples load on your DB or API. Spring's @Cacheable(sync=true) solves this. It uses a ReentrantLock per key. Only one thread computes, others wait for the cached value. Simple. Effective.

But `sync=true` has a downside: it serializes access to that cache key. If your computation takes 2 seconds, all other threads waiting on that key lock up. Solution: pre-warm your cache. Compute the value before requests arrive, or use a shorter TTL with background refresh. Redis has no built-in background refresh; you need a scheduled job or a separate thread pool. Caffeine (JCache) supports it natively via refreshAfterWrite.

Here's a trick for high-volume endpoints: use a local cache for data that's the same for all users (e.g., configuration, lookup tables). Caffeine in-memory cache with a short TTL (seconds). This avoids network round trips to Redis. Combined with Redis as a second level for consistency. But be careful — local caches don't invalidate across instances. TTL must be short enough to tolerate inconsistency.

Always measure cache hit ratio. A 50% hit ratio means half your requests still hit the DB. That's a waste of memory. Aim for >95%. If you can't get there, your caching strategy is wrong.

Never Do This:
Using @Cacheable without sync=true on a hot key. Under load, multiple threads will compute the same value simultaneously, thrashing your DB. Always use sync=true for mutable cache entries.
Production Insight
I reduced DB load by 90% on a landing page by adding a 10-second local Caffeine cache for a lookup table. Redis wasn't even involved.
Key Takeaway
Cache the hot path. Measure hit ratio. Use sync=true. Pre-warm. Everything else is decoration.

Asynchronous Processing: Not Just For Eventual Consistency

Synchronous request processing is simple. But it's also a throughput killer. Every request ties up a thread until the response is sent. If you can defer work to later, do it. Sending emails, generating reports, processing images — these should never block a user's request. Use @Async with a bounded executor. Never use the default SimpleAsyncTaskExecutor — it creates a new thread per task. It will OOM your heap.

Configure a proper thread pool for async tasks. Name it. Monitor it. Set rejection policies. If your async queue fills up, do you drop tasks or block the caller? The answer depends on your use case. For logging or metrics, dropping is fine. For order processing, you need to block or persist to a dead-letter queue. Use ThreadPoolTaskExecutor with a CallerBlocksPolicy for critical work. But beware: blocking the caller defeats the purpose of async. Better to use a message broker (RabbitMQ, Kafka) for work that must not be lost.

Spring's @Async works by proxying the bean. If you call an @Async method from within the same class, it doesn't work — the proxy isn't invoked. Dependency-inject the bean and call it from another bean. That's a common mistake. I've debugged it half a dozen times. Now I always test async behavior with a simple log statement.

For long-running tasks, use TaskExecutor with a bounded queue and DiscardPolicy for non-critical tasks. Log the discard. Then alert on it. If you're discarding tasks under load, you have a capacity problem. Async doesn't make capacity infinite. It just makes delays less visible. Address the root cause: scale out workers or reduce work per task.

Senior Shortcut:
Always name your Async executors. @Async("emailExecutor"). This makes debugging trivial. You see "email-async-1" in a thread dump and know exactly which task is stuck.
Production Insight
The worst async bug I've seen: @Async method called from the same class. It runs synchronously. No error. Just slower. Took 4 hours to find.
Key Takeaway
Async is for deferring work, not eliminating it. Monitor queue depth and rejection rates.

Monitoring: If It's Not Measured, It's Not Optimized

You can't fix what you don't see. Micrometer is your single pane of glass. Expose metrics via /actuator/prometheus. Grafana dashboards. Alerts on P99 latency, thread pool busy, connection pool active, GC pause time. If you're not measuring P99, you don't know how your users feel. Average latency hides pain. P99 reveals it.

The four golden signals: latency, traffic, errors, saturation. For Spring Boot, that translates to: - Latency: http.server.requests (Micrometer timer) - Traffic: tomcat.threads.busy (concurrent requests) - Errors: http.server.requests.status (5xx count) - Saturation: hikaricp.connections.active, jvm.memory.used

Set up alerts for P99 exceeding 80% of your SLO. Alert on thread pool usage > 70%. Alert on connection pool usage > 80%. These are leading indicators of failure. By the time you get 5xx errors, you're already down. Catch the saturation before it breaks.

A war story: We had a service that spiked every hour during a scheduled job. The job queried all users. The P99 went from 200ms to 10 seconds. No 5xx errors. Users didn't complain because it was internal. But the latency triggered my P99 alert. We discovered the job was running on the main thread pool, blocking user requests. Fixed by running the job on a separate executor. If we hadn't had that alert, we'd have had a full outage within weeks as the system saturated.

Distributed tracing (Spring Cloud Sleuth -> Micrometer Tracing) is non-negotiable for anything with inter-service calls. Without it, you can't tell if the 2-second latency is in your service or downstream. I once chased a DB query for 3 hours. Turned out the downstream API was slow. Tracing showed it in 5 minutes. Use it.

The Classic Bug:
Measuring average latency instead of percentiles. Averages hide p99 spikes. Always publish percentiles. Micrometer's publishPercentiles(0.5, 0.95, 0.99) is your default.
Production Insight
I've never regretted having too many metrics. I've regretted having too few exactly once: the day we couldn't explain why the system was slow.
Key Takeaway
Monitor the leading indicators of saturation, not just the trailing indicators of failure.
● Production incidentPOST-MORTEMseverity: high

The Thread Pool That Ate Our DB

Symptom
Gradually increasing response times, then sudden 500 errors. DB CPU at 100%. Connection pool timeout exceptions in logs.
Assumption
First thought: DB is slow. Maybe bad query. Maybe missing index. Double-checked query plans — all efficient.
Root cause
Default Tomcat thread pool of 200 threads all trying to acquire connections from a HikariCP pool of 10. Threads stack up waiting. Each waiting thread holds resources. Eventually, thread pool queue fills, Tomcat rejects requests. DB is fine. The thread pool is the bottleneck.
Fix
1. Set server.tomcat.threads.max=50 — match thread count to connection pool size. 2. Set spring.datasource.hikari.maximum-pool-size=30 — enough for 50 threads with some buffer. 3. Added metrics: micrometer:server.tomcat.threads.busy and hikaricp_connections_active. 4. Tested under load to verify no more errors.
Key lesson
  • Thread pool size must match connection pool size.
  • More threads doesn't mean more throughput.
  • It means more contention.
Production debug guideSymptom → root cause → fix for the failures that actually happen4 entries
Symptom · 01
Gradually increasing response times under load, then 503s
Fix
Check Tomcat thread pool metrics. server.tomcat.threads.busy vs max. If busy == max, you're thread-starved. Check HikariCP active connections. If that's also max, your DB queries are too slow or your pool is too small. Increase pool size or optimize queries. Never increase threads without increasing connections proportionally.
Symptom · 02
Intermittent 500s with 'Connection is not available, request timed out after 30000ms'
Fix
That's HikariCP timeout. Your threads are waiting for a connection longer than connectionTimeout. Check DB query performance under load. Look for slow queries (pg_stat_activity, SHOW PROCESSLIST). Also check if connection pool is too small. Increase maximum-pool-size or shorten query time. Don't just increase timeout — that hides the problem.
Symptom · 03
App crashes with OutOfMemoryError, but heap seems fine
Fix
Check off-heap memory. Netty direct buffers are a common culprit if you're using WebClient or reactive. Also check thread stack sizes — 200 threads at 1MB each is 200MB just for stacks. Use jcmd <pid> VM.native_memory summary to see off-heap allocations. Reduce thread count or switch to virtual threads.
Symptom · 04
Redis or other cache slow under load, app performance degrades
Fix
Check cache hit ratio. If it's low, your caching strategy is wrong. Check for cache stampede — multiple threads computing the same cache value simultaneously. Use @Cacheable(sync=true) for synchronized cache computation. Also check Redis INFO commandstats for slow commands. Use bulk operations (pipeline) instead of individual gets/sets.
★ Debug Cheat SheetCommands for fast diagnosis in production
Thread pool exhaustion
Immediate action
Check Tomcat thread metrics
Commands
curl -s localhost:8080/actuator/metrics/tomcat.threads.busy
curl -s localhost:8080/actuator/metrics/tomcat.threads.config.max
Fix now
Set server.tomcat.threads.max=100 in application.yml, match to connection pool size
Connection pool timeout+
Immediate action
Check HikariCP active connections
Commands
curl -s localhost:8080/actuator/metrics/hikaricp.connections.active
curl -s localhost:8080/actuator/metrics/hikaricp.connections.timeout.total
Fix now
Increase spring.datasource.hikari.maximum-pool-size=30 or optimize slow queries
High GC pauses under load+
Immediate action
Check GC logs and heap usage
Commands
jstat -gcutil <pid> 1000 10
jmap -histo <pid> | head -20
Fix now
Increase heap with -Xmx, or reduce object creation. Use thread-local pools for ephemeral objects.
Thread Model Comparison
AttributeImperative (Platform Threads)Reactive (WebFlux)Virtual Threads (Loom)
Thread per requestYes — one OS thread per requestNo — event loop handles many requestsYes — one virtual thread per request
Max concurrent requests (4GB heap)~100-200 (limited by OS threads)10,000+ (limited by memory)10,000+ (limited by memory)
DebuggingEasy — stack traces are linearHard — stack traces are asyncEasy — stack traces are linear
Blocking I/ONatural — just call the methodMust be wrapped in Mono.fromCallableNatural — just call the method
Pinned threadsN/A — all threads are carrierN/A — no carrier threadsYes — synchronized blocks pin carrier
ComplexityLowHigh — requires non-blocking everythingLow — same as imperative
Library supportAll librariesMust be reactive-compatibleAll libraries (with caveats)
Best forLow concurrency, CPU-boundHigh concurrency, I/O-boundHigh concurrency, I/O-bound

Key takeaways

1
Thread pool and connection pool sizes must be balanced. More threads doesn't mean more throughput.
2
Cache the hot path, measure hit ratio, and use sync=true to prevent stampede.
3
Virtual threads simplify high concurrency but watch for synchronized blocks that pin carrier threads.
4
Reactive is not universally faster. It's a different trade-off. Choose based on your bottleneck profile.
5
Monitor leading indicators (pool saturation, GC pauses), not just trailing ones (errors). Measure P99, not average.

Common mistakes to avoid

5 patterns
×

Using default Tomcat thread pool size (200) with default HikariCP pool size (10)

Symptom
High connection pool timeout errors, thread starvation, increasing response times under load
Fix
Reduce server.tomcat.threads.max to 50-100. Increase spring.datasource.hikari.maximum-pool-size to 20-40. Match thread count to connection capacity.
×

Using `@Cacheable` without `sync=true` on hot cache keys

Symptom
DB CPU spikes at TTL expiry, cache stampede, intermittent latency spikes
Fix
Add sync=true to @Cacheable. Use Caffeine with refreshAfterWrite for background population.
×

Calling `@Async` method from within the same class

Symptom
Method executes synchronously, no error, expected performance gain never materializes
Fix
Inject the service into a different bean. @Async only works through Spring AOP proxy. Self-invocation bypasses the proxy.
×

Setting `spring.jpa.open-in-view=true` in production

Symptom
Database connections held open for entire HTTP request cycle, connection pool exhaustion under load
Fix
Set spring.jpa.open-in-view=false. Use @Transactional explicitly where needed. This is disabled by default in Spring Boot 3.x, but many upgrades from 2.x carry this config.
×

Not setting `leak-detection-threshold` on HikariCP

Symptom
Gradual connection leak that surfaces as pool exhaustion under load after days of uptime
Fix
Set spring.datasource.hikari.leak-detection-threshold=60000. Investigate leaked connections logged with stack trace.
INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01SENIOR
How does the default Tomcat thread pool interact with the HikariCP conne...
Q02SENIOR
Explain the difference between `@Cacheable` with sync=true vs sync=false...
Q03SENIOR
What is the impact of setting `spring.jpa.open-in-view=true` in a high-t...
Q04SENIOR
How do virtual threads in Project Loom change the performance characteri...
Q05SENIOR
A Spring Boot service is experiencing increasing P99 latency under load,...
Q06SENIOR
What is the purpose of `leak-detection-threshold` in HikariCP, and how d...
Q07SENIOR
Compare and contrast using a local cache (Caffeine) vs a distributed cac...
Q08SENIOR
What is the danger of using `@Async` without a custom TaskExecutor? What...
Q01 of 08SENIOR

How does the default Tomcat thread pool interact with the HikariCP connection pool, and what happens if they are mismatched under load?

ANSWER
If Tomcat's max threads exceeds HikariCP's max pool size, threads will queue up waiting for connections. This creates thread starvation: requests are accepted but sit idle waiting for DB connections. Instead of 200 threads all trying to acquire 10 connections (190 threads blocked), you should reduce threads to match available connections. The metric to watch is tomcat.threads.busy vs hikaricp.connections.active — when busy hits max and active hits max, you've found the mismatch.
FAQ · 5 QUESTIONS

Frequently Asked Questions

01
What is the best thread pool size for a Spring Boot application?
02
Should I use WebFlux for all new Spring Boot services?
03
How do I debug a connection leak in HikariCP?
04
What is cache stampede and how do I prevent it?
05
How do I monitor Spring Boot performance in production?
🔥

That's Performance. Mark it forged?

9 min read · try the examples if you haven't

Previous
Deployment Rollback Strategies for Spring Boot
1 / 1 · Performance
Next
Spring Boot Real-world Debugging Scenarios