Spring Boot At 10x Load: The Patterns That Survive Production
Stop guessing at Spring Boot performance.
- Thread pool sizing is a trap; measure blocking, not CPU cores
- Connection pools must be tuned for your specific DB latency, not defaults
- Reactive isn't always faster; it shifts the bottleneck, doesn't remove it
- Caching is a hot path invariant, not an afterthought
- Metrics without action are just expensive log files
Imagine a busy kitchen. If you have one chef doing everything, orders pile up. Spring Boot is like that kitchen. High traffic handling is about having the right number of chefs (threads), the right ovens (databases), and knowing when to prep food in advance (caching) vs cooking on demand. Get it wrong, and customers leave angry. Get it right, and you serve thousands without breaking a sweat.
Thursday, 2:47 AM. PagerDuty screaming. 500 errors flooding in. Your customers can't check out. Your boss is calling. Your hands are sweating. Welcome to the club.
I've been there more times than I care to count. Every time, the root cause is the same: someone assumed default configuration would handle production load. It never does. Spring Boot defaults are for getting started, not for getting paid.
The worst part? The fix is usually small. A config change. A thread pool limit. A missing index. But those small things compound into catastrophic failures when traffic spikes. Black Friday. Product launch. A tweet from an influencer. 10x load in 30 seconds. Your app melts.
Here's the hard truth: most performance problems aren't bugs. They're design flaws exposed by load. Your code works fine at 100 RPS. At 1000 RPS, every sin shows up. Blocking calls on the main thread. Lazy initialization in request paths. Connection pools that assume 100ms queries but get 2-second ones under load.
You can't wing this. You need a strategy. You need to know your numbers. What's your average response time at idle? What's your P99 at 80% CPU? If you don't know those numbers, you're not engineering. You're hoping.
I'm writing this because I've seen too many teams burn out on performance fires that were predictable and preventable. This isn't a "best practices" list. This is a survival guide. Patterns that actually work in production. Trade-offs you need to make. Incidents that taught me painful lessons. Read it. Apply it. Stop getting paged.
Thread Pools: The First Thing That Breaks
Every junior thinks more threads = more speed. Wrong. Threads are not free. Each thread eats stack memory (default 1MB on 64-bit JVM). 200 threads = 200MB just for stacks. And that's before any object allocations. The real sin: threads fighting over locks. When your DB connection pool is 10 and you have 200 threads, 190 threads are doing nothing but spinning. They're not idle — they're burning CPU in park loops.
I once diagnosed a service where the P99 was 30 seconds. The team added more threads. It got worse. The fix: drop threads to 50, increase connection pool to 30. P99 dropped to 200ms. The lesson: measure queue depth, not thread count. Use Micrometer's tomcat.threads.busy metric. If it's close to config.max, you're not thread-starved. You're downstream-starved. The threads are waiting on something else (DB, API, cache). Adding more threads just makes that thing wait harder.
Virtual threads (Project Loom) change this equation. They're lightweight enough to have thousands. But they're not magic. If you block a virtual thread on a synchronized block, it pins the carrier thread. Monitor this with jdk.VirtualThreadPinned events. Virtual threads don't fix bad queries. They just let you wait more efficiently.
Rule of thumb: match your thread pool size to your connection pool size times some factor (1.5x-2x). Never exceed the number of connections. And always use a bounded queue. Unbounded queues in ephemeral thread pools will OOM your heap. I've seen it. It's not pretty.
server.tomcat.threads.max above your HikariCP maximum-pool-size. You'll create thread starvation disguised as DB slowness. Monitor tomcat.threads.busy — if it hits max, you've found your bottleneck.Connection Pooling: The Silent Killer
HikariCP is the default. It's fast. But defaults will burn you. maximum-pool-size=10 is fine for a toy app. For production, you need to know your DB's max connections and your query latency. Formula: pool size = (peak TPS average query duration in seconds) / (number of app instances). For example: 1000 TPS 0.05s avg query = 50 concurrent queries. If you have 5 instances, each needs at least 10 connections. But real life isn't that clean. Add buffer for spikes. I usually target 1.5x the calculated value.
Here's the gotcha: connection pools are per-datasource. If you have read replicas, don't pool them the same way. Read replicas handle more concurrent connections, so you can pool higher. But no connection pool should exceed the DB's max_connections. Otherwise, you'll get the dreaded FATAL: sorry, too many clients already. Fix: set spring.datasource.hikari.maximum-pool-size=30 and spring.datasource.hikari.minimum-idle=5. The idle connections keep startup fast. The max prevents DB overload.
leak-detection-threshold is your friend. Set it to 60 seconds. If a connection is held longer than that, HikariCP logs a stack trace. You'll catch bugs like "forgot to close PreparedStatement" or "transaction never committed." I caught a memory leak in legacy code this way. 30 minutes of investigation saved an outage.
Connection timeout is critical. Don't set it too high. 30 seconds is the default. Under load, threads pile up waiting for connections that never come. That becomes a thread pool problem. Lower connectionTimeout to 5-10 seconds. Fail fast. Let the client retry. Don't let threads queue up waiting for a connection that's not coming.
spring.datasource.hikari.leak-detection-threshold=60000. It logs a stack trace when a connection is held too long. You'll find your slow queries and missing close() calls fast.Reactive vs Imperative: Pick Your Poison
Reactive (WebFlux) isn't faster. It's different. It trades thread-per-request for event-loop-driven processing. This makes sense when you have many I/O-bound operations (DB calls, HTTP calls) and you're hitting thread limits. But reactive has a cost: debugging is harder, stack traces are useless, and you need to be reactive all the way down. One blocking call in a reactive pipeline ruins everything.
I've seen teams adopt reactive because "it's more scalable." Then they spend weeks debugging why their reactive chain hangs. The root cause? A Thread.sleep() in a flatMap. Or a synchronized block. Or a legacy library that uses blocking I/O. Reactive is not a performance upgrade. It's a programming model shift. Do it for the right reasons: high concurrency with limited resources.
For most CRUD apps, imperative with virtual threads is the sweet spot. Virtual threads let you write blocking code without blocking a carrier thread. You get performance parity with reactive for 10% of the complexity. But beware: virtual threads pinned by synchronized or native frames. Profile with -Djdk.tracePinnedThreads=short. If you see pinned threads, refactor those synchronized blocks to ReentrantLock or use the concurrency utilities from java.util.concurrent.
Here's my rule: if your request handler makes more than 3 I/O calls, reactive might win. If it's 1-2 calls, virtual threads are simpler and faster to debug. If it's CPU-bound, neither helps — you need better algorithms. Measure, don't guess.
Caching: The Only Free Lunch
Caching is the cheapest performance optimization you'll ever make. But most teams do it wrong. They throw a Redis cache in front of everything and hope for the best. That creates a new bottleneck: the cache itself. The real trick is caching at the right level. Data that changes rarely and is read often? Yes. Data that changes every request? No. And never cache without a TTL. Infinite TTL is infinite stale data.
Cache stampede is the production horror story I see most often. Multiple threads compute the same value simultaneously when a cache expires. This doubles or triples load on your DB or API. Spring's @Cacheable(sync=true) solves this. It uses a ReentrantLock per key. Only one thread computes, others wait for the cached value. Simple. Effective.
But `sync=true` has a downside: it serializes access to that cache key. If your computation takes 2 seconds, all other threads waiting on that key lock up. Solution: pre-warm your cache. Compute the value before requests arrive, or use a shorter TTL with background refresh. Redis has no built-in background refresh; you need a scheduled job or a separate thread pool. Caffeine (JCache) supports it natively via refreshAfterWrite.
Here's a trick for high-volume endpoints: use a local cache for data that's the same for all users (e.g., configuration, lookup tables). Caffeine in-memory cache with a short TTL (seconds). This avoids network round trips to Redis. Combined with Redis as a second level for consistency. But be careful — local caches don't invalidate across instances. TTL must be short enough to tolerate inconsistency.
Always measure cache hit ratio. A 50% hit ratio means half your requests still hit the DB. That's a waste of memory. Aim for >95%. If you can't get there, your caching strategy is wrong.
@Cacheable without sync=true on a hot key. Under load, multiple threads will compute the same value simultaneously, thrashing your DB. Always use sync=true for mutable cache entries.Asynchronous Processing: Not Just For Eventual Consistency
Synchronous request processing is simple. But it's also a throughput killer. Every request ties up a thread until the response is sent. If you can defer work to later, do it. Sending emails, generating reports, processing images — these should never block a user's request. Use @Async with a bounded executor. Never use the default SimpleAsyncTaskExecutor — it creates a new thread per task. It will OOM your heap.
Configure a proper thread pool for async tasks. Name it. Monitor it. Set rejection policies. If your async queue fills up, do you drop tasks or block the caller? The answer depends on your use case. For logging or metrics, dropping is fine. For order processing, you need to block or persist to a dead-letter queue. Use ThreadPoolTaskExecutor with a CallerBlocksPolicy for critical work. But beware: blocking the caller defeats the purpose of async. Better to use a message broker (RabbitMQ, Kafka) for work that must not be lost.
Spring's @Async works by proxying the bean. If you call an @Async method from within the same class, it doesn't work — the proxy isn't invoked. Dependency-inject the bean and call it from another bean. That's a common mistake. I've debugged it half a dozen times. Now I always test async behavior with a simple log statement.
For long-running tasks, use TaskExecutor with a bounded queue and DiscardPolicy for non-critical tasks. Log the discard. Then alert on it. If you're discarding tasks under load, you have a capacity problem. Async doesn't make capacity infinite. It just makes delays less visible. Address the root cause: scale out workers or reduce work per task.
@Async("emailExecutor"). This makes debugging trivial. You see "email-async-1" in a thread dump and know exactly which task is stuck.@Async method called from the same class. It runs synchronously. No error. Just slower. Took 4 hours to find.Monitoring: If It's Not Measured, It's Not Optimized
You can't fix what you don't see. Micrometer is your single pane of glass. Expose metrics via /actuator/prometheus. Grafana dashboards. Alerts on P99 latency, thread pool busy, connection pool active, GC pause time. If you're not measuring P99, you don't know how your users feel. Average latency hides pain. P99 reveals it.
The four golden signals: latency, traffic, errors, saturation. For Spring Boot, that translates to: - Latency: http.server.requests (Micrometer timer) - Traffic: tomcat.threads.busy (concurrent requests) - Errors: http.server.requests.status (5xx count) - Saturation: hikaricp.connections.active, jvm.memory.used
Set up alerts for P99 exceeding 80% of your SLO. Alert on thread pool usage > 70%. Alert on connection pool usage > 80%. These are leading indicators of failure. By the time you get 5xx errors, you're already down. Catch the saturation before it breaks.
A war story: We had a service that spiked every hour during a scheduled job. The job queried all users. The P99 went from 200ms to 10 seconds. No 5xx errors. Users didn't complain because it was internal. But the latency triggered my P99 alert. We discovered the job was running on the main thread pool, blocking user requests. Fixed by running the job on a separate executor. If we hadn't had that alert, we'd have had a full outage within weeks as the system saturated.
Distributed tracing (Spring Cloud Sleuth -> Micrometer Tracing) is non-negotiable for anything with inter-service calls. Without it, you can't tell if the 2-second latency is in your service or downstream. I once chased a DB query for 3 hours. Turned out the downstream API was slow. Tracing showed it in 5 minutes. Use it.
publishPercentiles(0.5, 0.95, 0.99) is your default.The Thread Pool That Ate Our DB
server.tomcat.threads.max=50 — match thread count to connection pool size. 2. Set spring.datasource.hikari.maximum-pool-size=30 — enough for 50 threads with some buffer. 3. Added metrics: micrometer:server.tomcat.threads.busy and hikaricp_connections_active. 4. Tested under load to verify no more errors.- Thread pool size must match connection pool size.
- More threads doesn't mean more throughput.
- It means more contention.
server.tomcat.threads.busy vs max. If busy == max, you're thread-starved. Check HikariCP active connections. If that's also max, your DB queries are too slow or your pool is too small. Increase pool size or optimize queries. Never increase threads without increasing connections proportionally.connectionTimeout. Check DB query performance under load. Look for slow queries (pg_stat_activity, SHOW PROCESSLIST). Also check if connection pool is too small. Increase maximum-pool-size or shorten query time. Don't just increase timeout — that hides the problem.jcmd <pid> VM.native_memory summary to see off-heap allocations. Reduce thread count or switch to virtual threads.@Cacheable(sync=true) for synchronized cache computation. Also check Redis INFO commandstats for slow commands. Use bulk operations (pipeline) instead of individual gets/sets.curl -s localhost:8080/actuator/metrics/tomcat.threads.busycurl -s localhost:8080/actuator/metrics/tomcat.threads.config.maxserver.tomcat.threads.max=100 in application.yml, match to connection pool sizeKey takeaways
Common mistakes to avoid
5 patternsUsing default Tomcat thread pool size (200) with default HikariCP pool size (10)
server.tomcat.threads.max to 50-100. Increase spring.datasource.hikari.maximum-pool-size to 20-40. Match thread count to connection capacity.Using `@Cacheable` without `sync=true` on hot cache keys
sync=true to @Cacheable. Use Caffeine with refreshAfterWrite for background population.Calling `@Async` method from within the same class
@Async only works through Spring AOP proxy. Self-invocation bypasses the proxy.Setting `spring.jpa.open-in-view=true` in production
spring.jpa.open-in-view=false. Use @Transactional explicitly where needed. This is disabled by default in Spring Boot 3.x, but many upgrades from 2.x carry this config.Not setting `leak-detection-threshold` on HikariCP
spring.datasource.hikari.leak-detection-threshold=60000. Investigate leaked connections logged with stack trace.Interview Questions on This Topic
How does the default Tomcat thread pool interact with the HikariCP connection pool, and what happens if they are mismatched under load?
tomcat.threads.busy vs hikaricp.connections.active — when busy hits max and active hits max, you've found the mismatch.Frequently Asked Questions
That's Performance. Mark it forged?
9 min read · try the examples if you haven't