Advanced 8 min · May 23, 2026

High Traffic Handling in Spring Boot

Spring Boot At 10x Load: The Patterns That Survive Production

Q: What is the best thread pool size for a Spring Boot application?

There's no single number. Match your thread pool to your connection pool size. A common starting point is 50 threads for a HikariCP pool of 30. Monitor `tomcat.threads.busy` — if it stays below 80% at peak, you're safe. If it hits 100%, you need more threads or, more likely, you need to fix a downstream bottleneck.

Q: Should I use WebFlux for all new Spring Boot services?

No. WebFlux adds complexity for marginal gain in most CRUD apps. Use virtual threads for simplicity. WebFlux is only justified when you need very high concurrency (10k+ concurrent requests) on limited hardware, or when integrating with other reactive systems like Spring Cloud Gateway or Kafka reactive. For most line-of-business apps, virtual threads with imperative code is superior.

Q: How do I debug a connection leak in HikariCP?

Set `spring.datasource.hikari.leak-detection-threshold=60000`. Under load, any connection held longer than 60 seconds logs a full stack trace. Analyze the stack trace to find which code path failed to close the connection. Common causes: missing `finally` blocks, `@Transactional` methods that call external APIs, or `Open Session In View` staying open during view rendering.

Q: What is cache stampede and how do I prevent it?

Cache stampede happens when multiple threads simultaneously try to compute the same expired cache entry. The DB or API gets hit with N concurrent requests instead of 1. Prevent it with `@Cacheable(sync=true)` in Spring, which synchronizes access per cache key. For Caffeine, use `refreshAfterWrite` which triggers a background refresh before the entry expires, spreading the load.

Q: How do I monitor Spring Boot performance in production?

Use Micrometer with a Prometheus registry. Expose metrics at `/actuator/prometheus`. Set up Grafana dashboards for the four golden signals: latency (P50, P95, P99), traffic (active requests), errors (5xx rate), and saturation (thread pool usage, connection pool usage, GC pauses). Alert on P99 exceeding 80% of your SLO and saturation metrics exceeding 70%.

Stop guessing at Spring Boot performance.

Naren Founder & Principal Engineer

20+ years shipping production Java in banking & fintech. Lessons pulled from things that broke in production.

✓ Production

production tested

July 04, 2026

last updated

1,697

articles · all by Naren

Before you start⏱ 30 min

✓Deep production experience
✓Understanding of internals and trade-offs
✓Experience debugging complex systems

● Production Incident 🔎 Debug Guide ⚙ Triage Commands

⚡Quick Answer

Thread pool sizing is a trap; measure blocking, not CPU cores
Connection pools must be tuned for your specific DB latency, not defaults
Reactive isn't always faster; it shifts the bottleneck, doesn't remove it
Caching is a hot path invariant, not an afterthought
Metrics without action are just expensive log files

✦ Definition~90s read

What is High Traffic Handling in Spring Boot?

High traffic handling in Spring Boot isn't about scaling out to 50 pods. That's a band-aid. It's about making each pod handle 10x more. It's thread pool management, connection pooling, reactive vs imperative trade-offs, and knowing exactly where your blocking calls live. The JVM is fast. Your code is the bottleneck. Identify it. Kill it. Repeat.

★

Imagine a busy kitchen.

You can't fix performance by adding hardware. You fix it by removing waste. Every lock, every DB round trip, every serialization step — they all add milliseconds. At 10,000 RPS, those milliseconds become seconds of queue time. The difference between a Senior and a Junior is knowing which milliseconds to fight for and which to accept.

This isn't theory. I've seen the same patterns fail repeatedly. The default Tomcat thread pool of 200 will make your DB fall over. The default HikariCP of 10 will keep you waiting. And calling a REST API synchronously inside a request will turn your throughput into a single-lane road. Let's fix it.

Plain-English First

Imagine a busy kitchen. If you have one chef doing everything, orders pile up. Spring Boot is like that kitchen. High traffic handling is about having the right number of chefs (threads), the right ovens (databases), and knowing when to prep food in advance (caching) vs cooking on demand. Get it wrong, and customers leave angry. Get it right, and you serve thousands without breaking a sweat.

Thursday, 2:47 AM. PagerDuty screaming. 500 errors flooding in. Your customers can't check out. Your boss is calling. Your hands are sweating. Welcome to the club.

I've been there more times than I care to count. Every time, the root cause is the same: someone assumed default configuration would handle production load. It never does. Spring Boot defaults are for getting started, not for getting paid.

The worst part? The fix is usually small. A config change. A thread pool limit. A missing index. But those small things compound into catastrophic failures when traffic spikes. Black Friday. Product launch. A tweet from an influencer. 10x load in 30 seconds. Your app melts.

Here's the hard truth: most performance problems aren't bugs. They're design flaws exposed by load. Your code works fine at 100 RPS. At 1000 RPS, every sin shows up. Blocking calls on the main thread. Lazy initialization in request paths. Connection pools that assume 100ms queries but get 2-second ones under load.

You can't wing this. You need a strategy. You need to know your numbers. What's your average response time at idle? What's your P99 at 80% CPU? If you don't know those numbers, you're not engineering. You're hoping.

I'm writing this because I've seen too many teams burn out on performance fires that were predictable and preventable. This isn't a "best practices" list. This is a survival guide. Patterns that actually work in production. Trade-offs you need to make. Incidents that taught me painful lessons. Read it. Apply it. Stop getting paged.

Thread Pools: The First Thing That Breaks

Every junior thinks more threads = more speed. Wrong. Threads are not free. Each thread eats stack memory (default 1MB on 64-bit JVM). 200 threads = 200MB just for stacks. And that's before any object allocations. The real sin: threads fighting over locks. When your DB connection pool is 10 and you have 200 threads, 190 threads are doing nothing but spinning. They're not idle — they're burning CPU in park loops.

I once diagnosed a service where the P99 was 30 seconds. The team added more threads. It got worse. The fix: drop threads to 50, increase connection pool to 30. P99 dropped to 200ms. The lesson: measure queue depth, not thread count. Use Micrometer's tomcat.threads.busy metric. If it's close to config.max, you're not thread-starved. You're downstream-starved. The threads are waiting on something else (DB, API, cache). Adding more threads just makes that thing wait harder.

Virtual threads (Project Loom) change this equation. They're lightweight enough to have thousands. But they're not magic. If you block a virtual thread on a synchronized block, it pins the carrier thread. Monitor this with jdk.VirtualThreadPinned events. Virtual threads don't fix bad queries. They just let you wait more efficiently.

Rule of thumb: match your thread pool size to your connection pool size times some factor (1.5x-2x). Never exceed the number of connections. And always use a bounded queue. Unbounded queues in ephemeral thread pools will OOM your heap. I've seen it. It's not pretty.

Production Trap:

Never set server.tomcat.threads.max above your HikariCP maximum-pool-size. You'll create thread starvation disguised as DB slowness. Monitor tomcat.threads.busy — if it hits max, you've found your bottleneck.

Production Insight

Thread pool tuning is a lever, not a knob. Moving it without understanding the downstream load just shifts the bottleneck.

Key Takeaway

Threads are a proxy for concurrency, not parallelism. Tune for the bottleneck downstream.

thecodeforge.io

Spring Boot High Traffic Handling

Connection Pooling: The Silent Killer

HikariCP is the default. It's fast. But defaults will burn you. maximum-pool-size=10 is fine for a toy app. For production, you need to know your DB's max connections and your query latency. Formula: pool size = (peak TPS average query duration in seconds) / (number of app instances). For example: 1000 TPS 0.05s avg query = 50 concurrent queries. If you have 5 instances, each needs at least 10 connections. But real life isn't that clean. Add buffer for spikes. I usually target 1.5x the calculated value.

Here's the gotcha: connection pools are per-datasource. If you have read replicas, don't pool them the same way. Read replicas handle more concurrent connections, so you can pool higher. But no connection pool should exceed the DB's max_connections. Otherwise, you'll get the dreaded FATAL: sorry, too many clients already. Fix: set spring.datasource.hikari.maximum-pool-size=30 and spring.datasource.hikari.minimum-idle=5. The idle connections keep startup fast. The max prevents DB overload.

leak-detection-threshold is your friend. Set it to 60 seconds. If a connection is held longer than that, HikariCP logs a stack trace. You'll catch bugs like "forgot to close PreparedStatement" or "transaction never committed." I caught a memory leak in legacy code this way. 30 minutes of investigation saved an outage.

Connection timeout is critical. Don't set it too high. 30 seconds is the default. Under load, threads pile up waiting for connections that never come. That becomes a thread pool problem. Lower connectionTimeout to 5-10 seconds. Fail fast. Let the client retry. Don't let threads queue up waiting for a connection that's not coming.

Senior Shortcut:

Set spring.datasource.hikari.leak-detection-threshold=60000. It logs a stack trace when a connection is held too long. You'll find your slow queries and missing close() calls fast.

Production Insight

Connection pool sizing is a math problem, not an opinion. Calculate based on TPS and query latency. Guess and you lose.

Key Takeaway

Default pool sizes are for demos. Calculate yours based on actual load patterns.

Reactive vs Imperative: Pick Your Poison

Reactive (WebFlux) isn't faster. It's different. It trades thread-per-request for event-loop-driven processing. This makes sense when you have many I/O-bound operations (DB calls, HTTP calls) and you're hitting thread limits. But reactive has a cost: debugging is harder, stack traces are useless, and you need to be reactive all the way down. One blocking call in a reactive pipeline ruins everything.

I've seen teams adopt reactive because "it's more scalable." Then they spend weeks debugging why their reactive chain hangs. The root cause? A Thread.sleep() in a flatMap. Or a synchronized block. Or a legacy library that uses blocking I/O. Reactive is not a performance upgrade. It's a programming model shift. Do it for the right reasons: high concurrency with limited resources.

For most CRUD apps, imperative with virtual threads is the sweet spot. Virtual threads let you write blocking code without blocking a carrier thread. You get performance parity with reactive for 10% of the complexity. But beware: virtual threads pinned by synchronized or native frames. Profile with -Djdk.tracePinnedThreads=short. If you see pinned threads, refactor those synchronized blocks to ReentrantLock or use the concurrency utilities from java.util.concurrent.

Here's my rule: if your request handler makes more than 3 I/O calls, reactive might win. If it's 1-2 calls, virtual threads are simpler and faster to debug. If it's CPU-bound, neither helps — you need better algorithms. Measure, don't guess.

Interview Gold:

"Reactive is not faster. It's more concurrent per thread. The right choice depends on your bottleneck profile. Virtual threads blur the line." Use this answer. It shows depth.

Production Insight

Virtual threads made most reactive migrations unnecessary. I've deprecated two WebFlux services in favor of virtual threads. Same perf, half the bug count.

Key Takeaway

Reactive is a tool, not a religion. Choose based on bottleneck type, not hype.

thecodeforge.io

Spring Boot High Traffic Handling

Caching: The Only Free Lunch

Caching is the cheapest performance optimization you'll ever make. But most teams do it wrong. They throw a Redis cache in front of everything and hope for the best. That creates a new bottleneck: the cache itself. The real trick is caching at the right level. Data that changes rarely and is read often? Yes. Data that changes every request? No. And never cache without a TTL. Infinite TTL is infinite stale data.

Cache stampede is the production horror story I see most often. Multiple threads compute the same value simultaneously when a cache expires. This doubles or triples load on your DB or API. Spring's @Cacheable(sync=true) solves this. It uses a ReentrantLock per key. Only one thread computes, others wait for the cached value. Simple. Effective.

But `sync=true` has a downside: it serializes access to that cache key. If your computation takes 2 seconds, all other threads waiting on that key lock up. Solution: pre-warm your cache. Compute the value before requests arrive, or use a shorter TTL with background refresh. Redis has no built-in background refresh; you need a scheduled job or a separate thread pool. Caffeine (JCache) supports it natively via refreshAfterWrite.

Here's a trick for high-volume endpoints: use a local cache for data that's the same for all users (e.g., configuration, lookup tables). Caffeine in-memory cache with a short TTL (seconds). This avoids network round trips to Redis. Combined with Redis as a second level for consistency. But be careful — local caches don't invalidate across instances. TTL must be short enough to tolerate inconsistency.

Always measure cache hit ratio. A 50% hit ratio means half your requests still hit the DB. That's a waste of memory. Aim for >95%. If you can't get there, your caching strategy is wrong.

Never Do This:

Using @Cacheable without sync=true on a hot key. Under load, multiple threads will compute the same value simultaneously, thrashing your DB. Always use sync=true for mutable cache entries.

Production Insight

I reduced DB load by 90% on a landing page by adding a 10-second local Caffeine cache for a lookup table. Redis wasn't even involved.

Key Takeaway

Cache the hot path. Measure hit ratio. Use sync=true. Pre-warm. Everything else is decoration.

Asynchronous Processing: Not Just For Eventual Consistency

Synchronous request processing is simple. But it's also a throughput killer. Every request ties up a thread until the response is sent. If you can defer work to later, do it. Sending emails, generating reports, processing images — these should never block a user's request. Use @Async with a bounded executor. Never use the default SimpleAsyncTaskExecutor — it creates a new thread per task. It will OOM your heap.

Configure a proper thread pool for async tasks. Name it. Monitor it. Set rejection policies. If your async queue fills up, do you drop tasks or block the caller? The answer depends on your use case. For logging or metrics, dropping is fine. For order processing, you need to block or persist to a dead-letter queue. Use ThreadPoolTaskExecutor with a CallerBlocksPolicy for critical work. But beware: blocking the caller defeats the purpose of async. Better to use a message broker (RabbitMQ, Kafka) for work that must not be lost.

Spring's @Async works by proxying the bean. If you call an @Async method from within the same class, it doesn't work — the proxy isn't invoked. Dependency-inject the bean and call it from another bean. That's a common mistake. I've debugged it half a dozen times. Now I always test async behavior with a simple log statement.

For long-running tasks, use TaskExecutor with a bounded queue and DiscardPolicy for non-critical tasks. Log the discard. Then alert on it. If you're discarding tasks under load, you have a capacity problem. Async doesn't make capacity infinite. It just makes delays less visible. Address the root cause: scale out workers or reduce work per task.

Senior Shortcut:

Always name your Async executors. @Async("emailExecutor"). This makes debugging trivial. You see "email-async-1" in a thread dump and know exactly which task is stuck.

Production Insight

The worst async bug I've seen: @Async method called from the same class. It runs synchronously. No error. Just slower. Took 4 hours to find.

Key Takeaway

Async is for deferring work, not eliminating it. Monitor queue depth and rejection rates.

Monitoring: If It's Not Measured, It's Not Optimized

You can't fix what you don't see. Micrometer is your single pane of glass. Expose metrics via /actuator/prometheus. Grafana dashboards. Alerts on P99 latency, thread pool busy, connection pool active, GC pause time. If you're not measuring P99, you don't know how your users feel. Average latency hides pain. P99 reveals it.

The four golden signals: latency, traffic, errors, saturation. For Spring Boot, that translates to: - Latency: http.server.requests (Micrometer timer) - Traffic: tomcat.threads.busy (concurrent requests) - Errors: http.server.requests.status (5xx count) - Saturation: hikaricp.connections.active, jvm.memory.used

Set up alerts for P99 exceeding 80% of your SLO. Alert on thread pool usage > 70%. Alert on connection pool usage > 80%. These are leading indicators of failure. By the time you get 5xx errors, you're already down. Catch the saturation before it breaks.

A war story: We had a service that spiked every hour during a scheduled job. The job queried all users. The P99 went from 200ms to 10 seconds. No 5xx errors. Users didn't complain because it was internal. But the latency triggered my P99 alert. We discovered the job was running on the main thread pool, blocking user requests. Fixed by running the job on a separate executor. If we hadn't had that alert, we'd have had a full outage within weeks as the system saturated.

Distributed tracing (Spring Cloud Sleuth -> Micrometer Tracing) is non-negotiable for anything with inter-service calls. Without it, you can't tell if the 2-second latency is in your service or downstream. I once chased a DB query for 3 hours. Turned out the downstream API was slow. Tracing showed it in 5 minutes. Use it.

The Classic Bug:

Measuring average latency instead of percentiles. Averages hide p99 spikes. Always publish percentiles. Micrometer's publishPercentiles(0.5, 0.95, 0.99) is your default.

Production Insight

I've never regretted having too many metrics. I've regretted having too few exactly once: the day we couldn't explain why the system was slow.

Key Takeaway

Monitor the leading indicators of saturation, not just the trailing indicators of failure.

Retries and Circuit Breakers: Stop Feeding the Dying

When your service starts dropping requests, the worst thing you can do is retry immediately. That's a stampede. You're just hammering an already overwhelmed system. This is why every high-traffic Spring Boot service needs a circuit breaker pattern. It's not optional. Here's the why: a circuit breaker monitors failure rates. Once you cross a threshold, it opens. No more calls pass through. The service gets a chance to recover. Spring Cloud Circuit Breaker with Resilience4j is your tool. Don't use Spring Retry alone for distributed calls; it has no back-pressure logic. Instead, configure a circuit breaker with a sliding window, a failure threshold (say 50% of last 100 calls), and a wait duration before half-open retry. Couple this with exponential backoff for retries within the closed state. This prevents cascading failures and saves your database from a death spiral. Without it, you're not handling traffic; you're amplifying a disaster.

CircuitBreakerConfig.javaJAVA

// io.thecodeforge — java tutorial
import io.github.resilience4j.circuitbreaker.CircuitBreakerConfig;
import io.github.resilience4j.timelimiter.TimeLimiterConfig;
import org.springframework.cloud.circuitbreaker.resilience4j.Resilience4JCircuitBreakerFactory;

@Configuration
public class CircuitBreakerConfig {

    @Bean
    public Resilience4JCircuitBreakerFactory factory() {
        Resilience4JCircuitBreakerFactory factory = new Resilience4JCircuitBreakerFactory();
        factory.configureDefault(id -> new Resilience4JConfig(id)
            .circuitBreakerConfig(CircuitBreakerConfig.custom()
                .slidingWindowType(COUNT_BASED)
                .slidingWindowSize(100)
                .failureRateThreshold(50.0f)
                .waitDurationInOpenState(Duration.ofSeconds(30))
                .permittedNumberOfCallsInHalfOpenState(10)
                .build())
            .timeLimiterConfig(TimeLimiterConfig.custom()
                .timeoutDuration(Duration.ofSeconds(5))
                .build())
        );
        return factory;
    }
}

Output

Circuit breaker opens after 50% failures in last 100 requests. Waits 30s before probing.

Production Trap:

Never use @Retryable on remote calls without a circuit breaker. You'll retry into a full outage. Always pair retries with a breaker and exponential backoff.

Key Takeaway

Circuit breakers prevent cascading failures. Retries without backoff are just denial-of-service attacks on your own database.

Graceful Degradation: Don't Serve 503s When Caches Fail

Your Redis goes down. Your cache is empty. What happens to your endpoint? If you wrote it right, it serves stale data or a default. If you wrote it wrong, it throws 503 errors and your users see a spinning wheel. Graceful degradation is not a nice-to-have; it's a production requirement for high-traffic systems. The why: users tolerate slightly stale data far more than a broken page. Implement this at every integration point. Use @Cacheable with a fallback method using unless or cacheManager error handling. For external API calls, wrap them in a try-catch that returns the last known good response from a local cache. Your database is your source of truth, but your cache is your lifeline. If the cache dies, serve from DB with a degraded SLA. Prioritize reads over writes during partial outages. Set timeouts aggressively on all remote calls. A slow external service shouldn't take down your entire endpoint. This is defensive design, and it keeps your site up when everything else is on fire.

GracefulDegradationService.javaJAVA

// io.thecodeforge — java tutorial
import org.springframework.cache.annotation.Cacheable;
import org.springframework.stereotype.Service;

@Service
public class GracefulDegradationService {

    @Cacheable(value = "product", unless = "#result == null")
    public Product findProduct(String id) {
        try {
            return database.find(id);
        } catch (Exception e) {
            // Fallback to stale cache or default
            return fallbackProduct(id);
        }
    }

    private Product fallbackProduct(String id) {
        // Serve last known good or empty DTO
        return new Product(id, "Currently Unavailable", 0.0);
    }
}

Output

When database fails, returns 'Currently Unavailable' product instead of 503. Users see something, not an error.

Production Experience:

During a Redis outage, we served 12 million requests from a degraded path without a single 503. The engineering team didn't even notice until the alert fired.

Key Takeaway

Degrade gracefully. Serve stale data or defaults. A 200 with stale data beats a 503 any day.

● Production incidentPOST-MORTEMseverity: high

The Thread Pool That Ate Our DB

Symptom

Gradually increasing response times, then sudden 500 errors. DB CPU at 100%. Connection pool timeout exceptions in logs.

Assumption

First thought: DB is slow. Maybe bad query. Maybe missing index. Double-checked query plans — all efficient.

Root cause

Default Tomcat thread pool of 200 threads all trying to acquire connections from a HikariCP pool of 10. Threads stack up waiting. Each waiting thread holds resources. Eventually, thread pool queue fills, Tomcat rejects requests. DB is fine. The thread pool is the bottleneck.

Fix

1. Set server.tomcat.threads.max=50 — match thread count to connection pool size. 2. Set spring.datasource.hikari.maximum-pool-size=30 — enough for 50 threads with some buffer. 3. Added metrics: micrometer:server.tomcat.threads.busy and hikaricp_connections_active. 4. Tested under load to verify no more errors.

Key lesson

Thread pool size must match connection pool size.
More threads doesn't mean more throughput.
It means more contention.

Production debug guideSymptom → root cause → fix for the failures that actually happen4 entries

Symptom · 01

Gradually increasing response times under load, then 503s

→

Fix

Check Tomcat thread pool metrics. server.tomcat.threads.busy vs max. If busy == max, you're thread-starved. Check HikariCP active connections. If that's also max, your DB queries are too slow or your pool is too small. Increase pool size or optimize queries. Never increase threads without increasing connections proportionally.

Symptom · 02

Intermittent 500s with 'Connection is not available, request timed out after 30000ms'

→

Fix

That's HikariCP timeout. Your threads are waiting for a connection longer than connectionTimeout. Check DB query performance under load. Look for slow queries (pg_stat_activity, SHOW PROCESSLIST). Also check if connection pool is too small. Increase maximum-pool-size or shorten query time. Don't just increase timeout — that hides the problem.

Symptom · 03

App crashes with OutOfMemoryError, but heap seems fine

→

Fix

Check off-heap memory. Netty direct buffers are a common culprit if you're using WebClient or reactive. Also check thread stack sizes — 200 threads at 1MB each is 200MB just for stacks. Use jcmd <pid> VM.native_memory summary to see off-heap allocations. Reduce thread count or switch to virtual threads.

Symptom · 04

Redis or other cache slow under load, app performance degrades

→

Fix

Check cache hit ratio. If it's low, your caching strategy is wrong. Check for cache stampede — multiple threads computing the same cache value simultaneously. Use @Cacheable(sync=true) for synchronized cache computation. Also check Redis INFO commandstats for slow commands. Use bulk operations (pipeline) instead of individual gets/sets.

★ Debug Cheat SheetCommands for fast diagnosis in production

Thread pool exhaustion−

Immediate action

Check Tomcat thread metrics

Commands

curl -s localhost:8080/actuator/metrics/tomcat.threads.busy

curl -s localhost:8080/actuator/metrics/tomcat.threads.config.max

Fix now

Set server.tomcat.threads.max=100 in application.yml, match to connection pool size

Connection pool timeout+

High GC pauses under load+

Thread Model Comparison

Attribute	Imperative (Platform Threads)	Reactive (WebFlux)	Virtual Threads (Loom)
Thread per request	Yes — one OS thread per request	No — event loop handles many requests	Yes — one virtual thread per request
Max concurrent requests (4GB heap)	~100-200 (limited by OS threads)	10,000+ (limited by memory)	10,000+ (limited by memory)
Debugging	Easy — stack traces are linear	Hard — stack traces are async	Easy — stack traces are linear
Blocking I/O	Natural — just call the method	Must be wrapped in Mono.fromCallable	Natural — just call the method
Pinned threads	N/A — all threads are carrier	N/A — no carrier threads	Yes — synchronized blocks pin carrier
Complexity	Low	High — requires non-blocking everything	Low — same as imperative
Library support	All libraries	Must be reactive-compatible	All libraries (with caveats)
Best for	Low concurrency, CPU-bound	High concurrency, I/O-bound	High concurrency, I/O-bound

⚙ Quick Reference

2 commands from this guide

File	Command / Code	Purpose
CircuitBreakerConfig.java	@Configuration	Retries and Circuit Breakers
GracefulDegradationService.java	@Service	Graceful Degradation

Key takeaways

Thread pool and connection pool sizes must be balanced. More threads doesn't mean more throughput.

Cache the hot path, measure hit ratio, and use sync=true to prevent stampede.

Virtual threads simplify high concurrency but watch for synchronized blocks that pin carrier threads.

Reactive is not universally faster. It's a different trade-off. Choose based on your bottleneck profile.

Monitor leading indicators (pool saturation, GC pauses), not just trailing ones (errors). Measure P99, not average.

Common mistakes to avoid

5 patterns

Using default Tomcat thread pool size (200) with default HikariCP pool size (10)

Symptom

High connection pool timeout errors, thread starvation, increasing response times under load

Fix

Reduce server.tomcat.threads.max to 50-100. Increase spring.datasource.hikari.maximum-pool-size to 20-40. Match thread count to connection capacity.

Using `@Cacheable` without `sync=true` on hot cache keys

Symptom

DB CPU spikes at TTL expiry, cache stampede, intermittent latency spikes

Fix

Add sync=true to @Cacheable. Use Caffeine with refreshAfterWrite for background population.

Calling `@Async` method from within the same class

Symptom

Method executes synchronously, no error, expected performance gain never materializes

Fix

Inject the service into a different bean. @Async only works through Spring AOP proxy. Self-invocation bypasses the proxy.

Setting `spring.jpa.open-in-view=true` in production

Symptom

Database connections held open for entire HTTP request cycle, connection pool exhaustion under load

Fix

Set spring.jpa.open-in-view=false. Use @Transactional explicitly where needed. This is disabled by default in Spring Boot 3.x, but many upgrades from 2.x carry this config.

Not setting `leak-detection-threshold` on HikariCP

Symptom

Gradual connection leak that surfaces as pool exhaustion under load after days of uptime

Fix

Set spring.datasource.hikari.leak-detection-threshold=60000. Investigate leaked connections logged with stack trace.

INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01SENIOR

How does the default Tomcat thread pool interact with the HikariCP conne...

Q02SENIOR

Explain the difference between `@Cacheable` with sync=true vs sync=false...

Q03SENIOR

What is the impact of setting `spring.jpa.open-in-view=true` in a high-t...

Q04SENIOR

How do virtual threads in Project Loom change the performance characteri...

Q05SENIOR

A Spring Boot service is experiencing increasing P99 latency under load,...

Q06SENIOR

What is the purpose of `leak-detection-threshold` in HikariCP, and how d...

Q07SENIOR

Compare and contrast using a local cache (Caffeine) vs a distributed cac...

Q08SENIOR

What is the danger of using `@Async` without a custom TaskExecutor? What...

Q01 of 08SENIOR

How does the default Tomcat thread pool interact with the HikariCP connection pool, and what happens if they are mismatched under load?

ANSWER

If Tomcat's max threads exceeds HikariCP's max pool size, threads will queue up waiting for connections. This creates thread starvation: requests are accepted but sit idle waiting for DB connections. Instead of 200 threads all trying to acquire 10 connections (190 threads blocked), you should reduce threads to match available connections. The metric to watch is tomcat.threads.busy vs hikaricp.connections.active — when busy hits max and active hits max, you've found the mismatch.

FAQ · 5 QUESTIONS

Frequently Asked Questions

What is the best thread pool size for a Spring Boot application?

Should I use WebFlux for all new Spring Boot services?

How do I debug a connection leak in HikariCP?

What is cache stampede and how do I prevent it?

How do I monitor Spring Boot performance in production?

Naren Founder & Principal Engineer

20+ years shipping production Java in banking & fintech. Lessons pulled from things that broke in production.

✓ Verified

production tested

July 04, 2026

last updated

1,697

articles · all by Naren

🔥

That's Performance. Mark it forged?

8 min read · try the examples if you haven't