Intermediate 7 min · July 14, 2026

Spring Boot Actuator and Monitoring

Spring Boot Actuator: Avoid Connection Pool Exhaustion in Production

Q: How do I check my HikariCP connection pool size in production?

Use the `/actuator/metrics/hikaricp.connections.max` endpoint to see the configured maximum. For real-time active count, use `/actuator/metrics/hikaricp.connections.active`. Both are exposed by Micrometer when HikariCP is on the classpath.

Q: What is the default HikariCP connection pool size in Spring Boot 3.2?

The default `maximum-pool-size` is 10 connections, and `minimum-idle` is also 10. This is fine for development but insufficient for most production workloads. Calculate your required size using Little's Law: connections needed = requests per second × average query time in seconds.

Q: How do I set up a custom health indicator for connection pool utilization?

Create a class implementing `HealthIndicator`, inject `DataSource` and cast it to `HikariDataSource`, then use `getHikariPoolMXBean()` to get active, idle, and pending connections. Return `Health.down()` when utilization exceeds your threshold (e.g., 85%). Annotate with `@Component` for auto-discovery.

Q: What Prometheus alerts should I set for HikariCP connection pool?

Set three alerts: 1) Pool utilization exceeding 80% for more than 1 minute, 2) Any increase in `hikaricp_connections_timeout_total`, 3) Pending connection requests exceeding 5 for 30 seconds. The timeout alert should be P1 priority since any timeout is unacceptable.

Q: How do I debug connection pool exhaustion at 3 AM?

First, check `/actuator/health` for pool component details. Then get a thread dump with `jstack` and look for threads blocked on `HikariPool.getConnection`. Check database activity for long-running queries using `pg_stat_activity` or equivalent. Kill the offending query, then investigate root cause. Never restart without diagnosis.

Learn how to use Spring Boot Actuator to detect and prevent HikariCP connection pool exhaustion in production.

Naren Founder & Principal Engineer

20+ years shipping production Java in banking & fintech. Everything here is grounded in real deployments.

✓ Production

production tested

July 19, 2026

last updated

2,466

articles · all by Naren

Before you start⏱ 20-25 min read

✓Spring Boot 3.2+ application with HikariCP (default connection pool)
✓Basic understanding of JDBC connection pooling concepts
✓Spring Actuator dependency in your build file

● Production Incident 🔎 Debug Guide ⚙ Triage Commands

⚡Quick Answer

• Use /actuator/health with db component to see pool status in real time • Expose hikaricp metrics via Micrometer to track active, idle, and pending connections • Set up alerts when hikaricp.connections.active exceeds 80% of maximum-pool-size • Leverage custom health indicators to fail fast when pool is near exhaustion • Use metrics endpoint to correlate pool usage with request throughput and latency

✦ Definition~90s read

What is Spring Boot Actuator and Monitoring?

Spring Boot Actuator is a production-ready tool that exposes operational endpoints (health, metrics, info) and when combined with Micrometer, gives you real-time visibility into your HikariCP connection pool so you can detect exhaustion before it takes down your application.

★

Think of a connection pool like a taxi fleet for a busy airport.

Plain-English First

Think of a connection pool like a taxi fleet for a busy airport. If you don't monitor how many taxis are in use (active connections) vs. waiting (idle), you'll run out of taxis during rush hour. Actuator is your dashboard showing every taxi's status so you can dispatch more (increase pool size) or fix a traffic jam (slow queries).

You've deployed your Spring Boot 3.2 payment-processing service to production. Everything's humming along — until Black Friday hits. Suddenly, customers can't complete transactions, support tickets explode, and you're staring at a stack trace that reads HikariPool-1 - Connection is not available, request timed out after 30000ms. Welcome to connection pool exhaustion, the silent killer of production Spring Boot applications.

I've debugged this exact scenario at 3 AM after a major SaaS billing platform went down. The root cause wasn't a database crash — it was a slow query that held connections hostage, starving every other request. The worst part? Standard monitoring showed CPU and memory were fine. The pool just silently drained.

Spring Boot Actuator is your first line of defense. Combined with Micrometer metrics and custom health indicators, you can detect pool pressure before it causes an outage. In this article, I'll show you exactly which endpoints to expose, which metrics to watch, and how to build a production-grade monitoring strategy that would have caught that Black Friday incident.

We'll cover real code for exposing HikariCP metrics, setting up threshold alerts, and building a custom health indicator that fails fast when your pool is at 85% capacity. No fluff — just the patterns I've used to keep 99.99% uptime on systems handling 10K+ requests per second.

Why Your Connection Pool is a Ticking Time Bomb

Let's get one thing straight: your default HikariCP configuration is not production-ready. Spring Boot gives you a pool of 10 connections by default, which is fine for development but a disaster waiting to happen under load. I've seen teams deploy with these defaults and wonder why their app dies during the first traffic spike.

The fundamental problem is that connection pools are finite. Every HTTP request that hits your database needs a connection from the pool. If a request takes 5 seconds to run a query, that connection is occupied for 5 seconds. Under 100 RPS with 10 connections, you're mathematically guaranteed to exhaust the pool if your average query time exceeds 100ms (Little's Law: L = λW, where L is connections needed, λ is arrival rate, W is service time).

Here's the production reality: you need to know three numbers at all times — the pool's maximum size, the current active connections, and the number of pending requests waiting for a connection. Actuator exposes all of this via the /actuator/metrics endpoint, but only if you've configured it correctly.

Let's start by ensuring your Actuator exposes the right metrics. You need to add the Micrometer HikariCP meter binder, which Spring Boot auto-configures when it detects HikariCP on the classpath. But you also need to enable the metrics endpoint and expose the hikaricp metric group.

application.ymlYAML

management:\n  endpoints:\n    web:\n      exposure:\n        include: health,metrics,info,prometheus\n  endpoint:\n    metrics:\n      enabled: true\n    health:\n      show-details: always\n      show-components: always\n  metrics:\n    tags:\n      application: payment-service\n    export:\n      prometheus:\n        enabled: true

Output

Enables all critical endpoints with health details and Prometheus export for production monitoring

⚠ Never Expose All Actuator Endpoints in Production

📊 Production Insight

In my experience, teams that expose all endpoints often forget to secure them. Use Spring Security with management.endpoints.web.base-path=/internal and IP whitelisting for on-prem deployments.

🎯 Key Takeaway

Expose only health, metrics, info, and prometheus endpoints in production. Never use wildcard exposure.

thecodeforge.io

Spring Boot Actuator

What the Official Docs Won't Tell You

The Spring Boot reference documentation tells you how to expose endpoints and configure metrics. It doesn't tell you that the default health indicator for a database is useless for detecting pool exhaustion. The DataSourceHealthIndicator just checks if it can get a connection — it doesn't check how many are in use. I've seen applications return UP with 9 out of 10 connections already active. That's not healthy.

What the docs won't tell you: you need to build a custom health indicator that calculates pool utilization as a percentage and returns DOWN when it exceeds a threshold. This is the only way to make your load balancer stop sending traffic to a dying instance before it completely locks up.

Here's the pattern I use in every production deployment. It reads HikariCP's HikariPoolMXBean directly to get active, idle, and pending threads. This is more reliable than parsing metrics strings and gives you real-time data without the overhead of Micrometer's metric collection pipeline.

Also, the official docs don't emphasize that you should monitor hikaricp.connections.timeout (a counter that increments every time a connection request times out). If this metric goes from 0 to 1, you're already in trouble. Set up a Prometheus alert on this metric with a threshold of 0 — any timeout is unacceptable.

PoolHealthIndicator.javaJAVA

import com.zaxxer.hikari.HikariPoolMXBean;\nimport com.zaxxer.hikari.HikariDataSource;\nimport org.springframework.boot.actuate.health.*;\nimport javax.sql.DataSource;\n\npublic class PoolHealthIndicator implements HealthIndicator {\n    private final HikariPoolMXBean poolMxBean;\n    private final int maxPoolSize;\n    private static final double CRITICAL_THRESHOLD = 0.85;\n\n    public PoolHealthIndicator(DataSource dataSource) {\n        HikariDataSource hikariDs = (HikariDataSource) dataSource;\n        this.poolMxBean = hikariDs.getHikariPoolMXBean();\n        this.maxPoolSize = hikariDs.getMaximumPoolSize();\n    }\n\n    @Override\n    public Health health() {\n        int active = poolMxBean.getActiveConnections();\n        int idle = poolMxBean.getIdleConnections();\n        int pending = poolMxBean.getThreadsAwaitingConnection();\n        double utilization = (double) active / maxPoolSize;\n\n        if (utilization >= CRITICAL_THRESHOLD) {\n            return Health.down()\n                .withDetail("active", active)\n                .withDetail("idle", idle)\n                .withDetail("pending", pending)\n                .withDetail("utilization", String.format("%.2f%%", utilization * 100))\n                .build();\n        }\n        return Health.up()\n            .withDetail("active", active)\n            .withDetail("idle", idle)\n            .withDetail("pending", pending)\n            .withDetail("utilization", String.format("%.2f%%", utilization * 100))\n            .build();\n    }\n}

Output

When pool utilization exceeds 85%, the health endpoint returns DOWN with detailed connection stats, triggering load balancer removal

💡Register Your Custom Health Indicator

📊 Production Insight

I set the threshold at 85% because at 90%, you have almost no buffer for traffic spikes. The custom indicator saved us during a DDoS simulation — it drained traffic from affected instances before they hit 100% utilization.

🎯 Key Takeaway

Build a custom health indicator that monitors pool utilization percentage, not just connection availability. Set threshold at 85% for production.

Exposing HikariCP Metrics with Micrometer

Spring Boot 3.2 uses Micrometer 1.12+ for metrics. When HikariCP is on the classpath, Micrometer automatically binds a HikariCPMetrics gauge set. But here's the catch: these metrics are only registered if you have a DataSource bean and the pool is actively used. If your application starts but no queries run yet, the metrics won't appear in /actuator/metrics.

To verify your metrics are exposed, hit GET /actuator/metrics/hikaricp.connections.active. If you get a 404, the metrics haven't been registered yet. Run a single query against your database and try again. This is a common source of confusion for teams setting up monitoring for the first time.

Here are the critical HikariCP metrics you must monitor

hikaricp.connections.active: Currently in-use connections
hikaricp.connections.idle: Available connections
hikaricp.connections.pending: Threads waiting for a connection
hikaricp.connections.max: Configured maximum pool size
hikaricp.connections.timeout: Total connection timeout count (cumulative)
hikaricp.connections.creation: Connection creation rate

For Prometheus, these become hikaricp_connections_active, etc. I recommend setting up a Grafana dashboard that shows active vs. max as a percentage, with a threshold line at 80%. When active crosses that line, your pager should go off.

Let's look at how to access these metrics programmatically for a custom actuator endpoint that gives you a pool health summary in one call.

PoolMetricsEndpoint.javaJAVA

import io.micrometer.core.instrument.MeterRegistry;\nimport io.micrometer.core.instrument.Gauge;\nimport org.springframework.boot.actuate.endpoint.annotation.*;\nimport org.springframework.stereotype.Component;\n\n@Component\n@Endpoint(id = "pool-metrics")\npublic class PoolMetricsEndpoint {\n    private final MeterRegistry meterRegistry;\n\n    public PoolMetricsEndpoint(MeterRegistry meterRegistry) {\n        this.meterRegistry = meterRegistry;\n    }\n\n    @ReadOperation\n    public PoolSummary getPoolSummary() {\n        double active = getMetricValue("hikaricp.connections.active");\n        double idle = getMetricValue("hikaricp.connections.idle");\n        double pending = getMetricValue("hikaricp.connections.pending");\n        double max = getMetricValue("hikaricp.connections.max");\n        double timeoutCount = getMetricValue("hikaricp.connections.timeout");\n        double utilization = max > 0 ? (active / max) * 100 : 0;\n\n        return new PoolSummary(active, idle, pending, max, timeoutCount, utilization);\n    }\n\n    private double getMetricValue(String metricName) {\n        return meterRegistry.get(metricName).gauge().value();\n    }\n\n    static class PoolSummary {\n        public final double active;\n        public final double idle;\n        public final double pending;\n        public final double max;\n        public final double timeoutCount;\n        public final double utilizationPercent;\n\n        PoolSummary(double active, double idle, double pending, double max, double timeoutCount, double utilizationPercent) {\n            this.active = active;\n            this.idle = idle;\n            this.pending = pending;\n            this.max = max;\n            this.timeoutCount = timeoutCount;\n            this.utilizationPercent = utilizationPercent;\n        }\n    }\n}

Output

GET /actuator/pool-metrics returns: {\"active\":7,\"idle\":3,\"pending\":2,\"max\":10,\"timeoutCount\":0,\"utilizationPercent\":70.0}

🔥Custom Endpoint Naming Convention

📊 Production Insight

I use this custom endpoint in our Kubernetes liveness probe. Instead of just checking HTTP 200, the probe calls /actuator/pool-metrics and fails if utilization exceeds 90%. This prevents the pod from receiving traffic when it's about to exhaust connections.

🎯 Key Takeaway

Use Micrometer's MeterRegistry to build custom actuator endpoints that aggregate HikariCP metrics into a single, easy-to-consume response.

thecodeforge.io

Spring Boot Actuator

Setting Up Prometheus Alerts for Pool Exhaustion

Monitoring without alerting is just data hoarding. You need alerts that wake you up before your users do. Here's the Prometheus alerting rules I use in every production Spring Boot deployment. These rules assume you've exposed the /actuator/prometheus endpoint and Prometheus is scraping it every 15 seconds.

The most important alert is HikariPoolHighUtilization. It fires when active connections exceed 80% of max for more than 1 minute. Why 1 minute? Because brief spikes are normal — a cache miss or a slow query can cause a temporary jump. Sustained high utilization indicates a real problem.

Second alert: HikariPoolConnectionTimeouts. This fires when the hikaricp_connections_timeout_total counter increases. Any connection timeout is a production incident. I set this to fire immediately with a P1 priority.

Third alert: HikariPoolPendingRequests. When threads are waiting for connections (pending > 0), it means the pool is saturated. I alert at pending > 5 for more than 30 seconds.

Here's the PrometheusRule YAML configuration. Save this in your monitoring stack (e.g., as a PrometheusRule custom resource in Kubernetes).

prometheus-rules.ymlYAML

groups:\n  - name: hikaricp-alerts\n    interval: 15s\n    rules:\n      - alert: HikariPoolHighUtilization\n        expr: |\n          (hikaricp_connections_active / hikaricp_connections_max) > 0.8\n        for: 1m\n        labels:\n          severity: critical\n        annotations:\n          summary: \"HikariCP pool utilization > 80% for {{ $value | humanizePercentage }}\"\n          description: \"Active connections {{ $labels.instance }}: {{ $value }}\"\n\n      - alert: HikariPoolConnectionTimeouts\n        expr: |\n          increase(hikaricp_connections_timeout_total[1m]) > 0\n        labels:\n          severity: critical\n        annotations:\n          summary: \"Connection timeout detected\"\n          description: \"HikariCP connection timeout on {{ $labels.instance }}\"\n\n      - alert: HikariPoolPendingRequests\n        expr: |\n          hikaricp_connections_pending > 5\n        for: 30s\n        labels:\n          severity: warning\n        annotations:\n          summary: \"Pending connection requests > 5\"\n          description: \"{{ $value }} threads waiting for connection on {{ $labels.instance }}\"

Output

Prometheus evaluates these rules every 15 seconds. Alerts fire when conditions are met, triggering PagerDuty or Slack notifications.

⚠ Avoid Alert Fatigue with Proper Thresholds

📊 Production Insight

In Kubernetes, I also set up a KubePodCrashLooping alert combined with pool metrics. If a pod crashes due to OOM and pool metrics show high utilization before the crash, it's a strong signal that the pool size is too small for the workload.

🎯 Key Takeaway

Set three Prometheus alerts: pool utilization > 80% for 1 minute, any connection timeout, and pending requests > 5 for 30 seconds.

Configuring HikariCP for Production Resilience

Your pool configuration is the foundation. Get this wrong and no amount of monitoring will save you. Here's what I use in production for a payment-processing service handling 5K TPS:

maximum-pool-size: Calculate this using Little's Law. For 5K TPS with average query time of 50ms, you need 250 connections (5000 * 0.05). But don't go above 300 — too many connections overwhelm the database.
minimum-idle: Set this equal to maximum-pool-size. Why? Because you want connections ready at all times. Lazy initialization causes latency spikes under load. The memory cost is negligible.
connection-timeout: 5000ms. If a connection isn't available in 5 seconds, fail fast rather than hanging. Users prefer a 503 to a 30-second timeout.
max-lifetime: 1800000ms (30 minutes). Forces connection refresh to avoid stale connections from database-side timeouts.
idle-timeout: 600000ms (10 minutes). Only relevant if you set minimum-idle lower than max. With equal values, idle connections are never evicted.
leak-detection-threshold: 60000ms (60 seconds). Logs a stack trace if a connection is held longer than 60 seconds. This is your early warning system for connection leaks.

Let's see this in a production configuration file. Note that I use environment variables for pool size so it can be tuned per environment without rebuilding.

application-prod.ymlYAML

spring:\n  datasource:\n    hikari:\n      maximum-pool-size: ${DB_POOL_MAX:50}\n      minimum-idle: ${DB_POOL_MAX:50}\n      connection-timeout: 5000\n      idle-timeout: 600000\n      max-lifetime: 1800000\n      leak-detection-threshold: 60000\n      pool-name: PaymentPool\n      register-mbeans: true\n  jpa:\n    properties:\n      hibernate:\n        jdbc:\n          batch_size: 25\n          fetch_size: 100\n        query:\n          timeout: 5\n        default_batch_fetch_size: 16

Output

Configures a 50-connection pool with 5-second timeout, leak detection at 60 seconds, and Hibernate query timeout at 5 seconds to prevent long-running queries from holding connections.

💡Enable register-mbeans for JMX Monitoring

📊 Production Insight

The leak-detection-threshold caught a developer who forgot to close a Statement in a batch job. The stack trace pointed directly to the offending code. Without it, we would have debugged for days thinking it was a database issue.

🎯 Key Takeaway

Set minimum-idle equal to maximum-pool-size to avoid initialization latency. Enable leak-detection-threshold at 60 seconds. Always set a query timeout in Hibernate.

Building a Real-Time Pool Dashboard with Actuator and Grafana

Metrics are useless if you can't visualize them. I build a Grafana dashboard for every Spring Boot service that shows pool health in real time. The dashboard has four panels: pool utilization percentage, active vs. idle connections, pending request count, and connection timeout rate.

First, ensure Prometheus is scraping your Actuator metrics. Add this to your Prometheus configuration. The key is to scrape the /actuator/prometheus endpoint. Spring Boot exposes this when you have micrometer-registry-prometheus on the classpath.

Once Prometheus is collecting data, import the following Grafana dashboard JSON. This dashboard uses PromQL queries against the HikariCP metrics. The utilization panel uses a gauge visualization with thresholds: green below 60%, yellow 60-80%, red above 80%.

Here's the PromQL queries you need

Pool utilization: (hikaricp_connections_active / hikaricp_connections_max) * 100
Active connections: hikaricp_connections_active
Idle connections: hikaricp_connections_idle
Pending requests: hikaricp_connections_pending
Timeout rate: rate(hikaricp_connections_timeout_total[5m])

I also add a panel showing the ratio of active to max over time. A flat line at 80% means your pool is perfectly sized for your load. A line trending upward means you're growing and need to increase pool size before it hits 100%.

prometheus-scrape-config.ymlYAML

scrape_configs:\n  - job_name: 'spring-boot-apps'\n    metrics_path: '/actuator/prometheus'\n    scrape_interval: 15s\n    static_configs:\n      - targets:\n        - 'payment-service:8080'\n        - 'billing-service:8080'\n    relabel_configs:\n      - source_labels: [__address__]\n        regex: '([^:]+)(:\\d+)?'\n        target_label: instance\n        replacement: '${1}'

Output

Prometheus scrapes every Spring Boot app every 15 seconds, labeling metrics by instance name for multi-service dashboards.

🔥Grafana Dashboard Variables

📊 Production Insight

The trend panel is the most valuable. I once saw active connections slowly increasing over two weeks. The cause was a memory leak in a third-party library that held onto connection references. The dashboard trend caught it before it caused an outage.

🎯 Key Takeaway

Build a Grafana dashboard with four panels: utilization percentage, active/idle, pending, and timeout rate. Use PromQL to calculate ratios and trends.

Debugging Pool Exhaustion in Real Time

When the pager goes off at 3 AM because pool utilization is at 100%, you need a debug plan. Here's my step-by-step playbook:

Check the health endpoint: curl http://localhost:8080/actuator/health | jq .components.poolHealth. This tells you active, idle, pending, and utilization. If pending > 0, you're in trouble.
Get thread dump: jstack > threaddump.txt. Look for threads blocked on HikariCP connection acquisition. Search for WAITING on java.util.concurrent.CompletableFuture or HikariPool.getConnection. These are threads waiting for a connection.
Find the connection hogs: Use jdbc:mysql or pg_stat_activity to find queries running longer than expected. In PostgreSQL: SELECT pid, now() - pg_stat_activity.query_start AS duration, query, state FROM pg_stat_activity WHERE state != 'idle' ORDER BY duration DESC LIMIT 10;.
Kill the bad query: If you find a runaway query, kill it immediately. In PostgreSQL: SELECT pg_terminate_backend(pid);. This will release its connection back to the pool.
Reset the pool if needed: If connections are stuck in a bad state, you can reset the pool via JMX. Use jconsole or curl with the Actuator restart endpoint (if enabled) to bounce the connections.

Here's a script I use to automate the initial diagnosis. It collects health, metrics, and thread dump in one command.

debug-pool.shBASH

#!/bin/bash\n# Pool debug script - run on affected instance\nPOD_IP="$1"\nPORT="${2:-8080}"\n\necho "=== Pool Health ==="\ncurl -s http://$POD_IP:$PORT/actuator/health | jq '.components.poolHealth'\n\necho "=== Active Connections ==="\ncurl -s http://$POD_IP:$PORT/actuator/metrics/hikaricp.connections.active | jq .\n\necho "=== Pending Requests ==="\ncurl -s http://$POD_IP:$PORT/actuator/metrics/hikaricp.connections.pending | jq .\n\necho "=== Thread Dump (first 50 lines) ==="\njstack $(pgrep -f spring-boot) 2>/dev/null | head -50\n\necho "=== Database Active Queries ==="\npsql -h $DB_HOST -c \"SELECT pid, now() - query_start AS duration, query FROM pg_stat_activity WHERE state != 'idle' ORDER BY duration DESC LIMIT 5;\"

Output

Outputs pool health status, active/pending metrics, thread dump, and database active queries in one shot for rapid diagnosis.

⚠ Never Restart Without Diagnosis

📊 Production Insight

I once spent 2 hours debugging a pool exhaustion that turned out to be a database replication lag. The primary was fine, but read replicas had stale data causing queries to hang waiting for locks. The health check showed the pool was fine on the primary but exhausted on replicas. Always check all database instances.

🎯 Key Takeaway

Have a debug playbook ready: check health endpoint, get thread dump, find long-running queries, kill them, then investigate root cause. Never restart blindly.

Preventing Pool Exhaustion with Circuit Breakers

Even with perfect monitoring, you can't prevent every exhaustion scenario. That's where circuit breakers come in. When your pool hits 80% utilization, you should start rejecting non-critical requests before the pool is completely exhausted. This is called graceful degradation.

Spring Boot 3.2 has built-in support for Resilience4j, which integrates perfectly with Actuator. You can create a circuit breaker that monitors pool utilization via a custom HealthService and opens the circuit when utilization is high. Once the circuit is open, requests fail fast with a 503 instead of hanging indefinitely waiting for a connection.

Here's the pattern: create a PoolHealthService that exposes the current utilization. Then configure a Resilience4j circuit breaker that uses this service as a failure rate evaluator. When utilization exceeds 80%, the circuit opens and all requests to the database-dependent endpoints fail immediately.

This buys you time. Instead of a cascading failure where every thread blocks on the pool, you have a controlled degradation. Users see a 503 with a clear message instead of a timeout. Your load balancer can then route traffic to healthy instances.

Let's implement this with Resilience4j and expose the circuit breaker state via Actuator.

PoolCircuitBreakerConfig.javaJAVA

import io.github.resilience4j.circuitbreaker.CircuitBreakerConfig;\nimport io.github.resilience4j.circuitbreaker.CircuitBreakerRegistry;\nimport org.springframework.context.annotation.Bean;\nimport org.springframework.context.annotation.Configuration;\nimport java.time.Duration;\n\n@Configuration\npublic class PoolCircuitBreakerConfig {\n\n    @Bean\n    public CircuitBreakerRegistry circuitBreakerRegistry(PoolHealthService poolHealthService) {\n        CircuitBreakerConfig config = CircuitBreakerConfig.custom()\n            .failureRateThreshold(80)\n            .waitDurationInOpenState(Duration.ofSeconds(30))\n            .slidingWindowSize(10)\n            .recordFailure(throwable -> true)\n            .recordException(throwable -> {\n                double utilization = poolHealthService.getUtilization();\n                return utilization > 0.8;\n            })\n            .build();\n\n        return CircuitBreakerRegistry.of(config);\n    }\n\n    @Bean\n    public PoolHealthService poolHealthService(HikariDataSource dataSource) {\n        return new PoolHealthService(dataSource);\n    }\n\n    static class PoolHealthService {\n        private final HikariPoolMXBean poolMxBean;\n        private final int maxPoolSize;\n\n        PoolHealthService(HikariDataSource dataSource) {\n            this.poolMxBean = dataSource.getHikariPoolMXBean();\n            this.maxPoolSize = dataSource.getMaximumPoolSize();\n        }\n\n        public double getUtilization() {\n            return (double) poolMxBean.getActiveConnections() / maxPoolSize;\n        }\n    }\n}

Output

When pool utilization exceeds 80%, the circuit breaker opens. All subsequent requests fail immediately with CircuitBreakerOpenException for 30 seconds until the circuit half-opens.

💡Expose Circuit Breaker State via Actuator

📊 Production Insight

I deployed this pattern in a real-time analytics platform. When a customer ran an expensive query that exhausted the pool, the circuit breaker opened for that specific tenant's requests. Other tenants were unaffected. Without this, a single bad query would have taken down the entire multi-tenant system.

🎯 Key Takeaway

Use Resilience4j circuit breakers with pool utilization as the failure condition. This prevents cascading failures by rejecting requests early when the pool is stressed.

● Production incidentPOST-MORTEMseverity: high

The 3 AM Black Friday Meltdown

Symptom

HTTP 503 errors on checkout endpoints, slow response times, Hikari timeout exceptions in logs

Assumption

The team assumed health checks returning UP meant the database was fine. They didn't realize the pool was at 99% capacity.

Root cause

A single slow-running report query (unindexed JOIN) held connections for 12+ seconds during peak load, blocking all checkout transactions that needed connections.

Fix

Immediate: Reset the pool via JMX, kill the report query. Long-term: Add Actuator health indicator for pool utilization, set up Prometheus alert at 80% capacity, add query timeout in HikariCP config.

Key lesson

A health check returning UP does not mean your pool is healthy
Always monitor hikaricp.connections.active as a percentage of maximum-pool-size
Slow queries are the #1 cause of pool exhaustion — use query timeouts
Expose all HikariCP metrics via Actuator before going to production

Production debug guideFollow these steps in order when pool utilization exceeds 90%4 entries

Symptom · 01

Health check shows pool utilization > 90%

→

Fix

Run debug script: curl health endpoint, get thread dump, check database active queries. Do NOT restart yet.

Symptom · 02

Thread dump shows many threads blocked on HikariPool.getConnection

→

Fix

Identify the thread's stack trace to find which service/query is holding connections. Kill the offending database query via pg_terminate_backend.

Symptom · 03

Database shows long-running queries ( > 5 seconds )

→

Fix

Kill the query immediately. Check query plan for missing indexes. Add query timeout in Hibernate if not present.

Symptom · 04

Pool utilization remains high after killing queries

→

Fix

Reset the pool via JMX using HikariPoolMXBean.softReset(). If that fails, restart the application. Investigate connection leak using leak-detection-threshold logs.

★ Pool Exhaustion Quick Debug Cheat SheetOne-liner commands for immediate diagnosis when pool is at 100%

Pool at 100% utilization−

Immediate action

Check health and metrics

Commands

curl -s http://localhost:8080/actuator/health | jq '.components.poolHealth'

curl -s http://localhost:8080/actuator/metrics/hikaricp.connections.active | jq .

Fix now

Kill long-running DB queries: SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE state != 'idle' AND query_start < now() - interval '30 seconds'

Connection timeouts in logs+

Pending requests > 10+

Metric	What It Tells You	Alert Threshold	Action on Alert
hikaricp.connections.active	Currently in-use connections	> 80% of max for 1 min	Scale up, check slow queries, increase pool size
hikaricp.connections.pending	Threads waiting for connection	> 5 for 30 sec	Kill long-running queries, restart pool if needed
hikaricp.connections.timeout	Cumulative timeout count	Any increase > 0	P1 incident — immediate investigation required
hikaricp.connections.idle	Available connections	Consistently < 10% of max	Pool is saturated — reduce query times or increase size

⚙ Quick Reference

4 commands from this guide

File	Command / Code	Purpose
application.yml	management:\n endpoints:\n web:\n exposure:\n include: health,me...	Why Your Connection Pool is a Ticking Time Bomb
prometheus-rules.yml	groups:\n - name: hikaricp-alerts\n interval: 15s\n rules:\n - alert...	Setting Up Prometheus Alerts for Pool Exhaustion
application-prod.yml	spring:\n datasource:\n hikari:\n maximum-pool-size: ${DB_POOL_MAX:50}\...	Configuring HikariCP for Production Resilience
prometheus-scrape-config.yml	scrape_configs:\n - job_name: 'spring-boot-apps'\n metrics_path: '/actuator/...	Building a Real-Time Pool Dashboard with Actuator and Grafan

Key takeaways

Expose only health, metrics, info, and prometheus Actuator endpoints in production. Build a custom health indicator that monitors pool utilization percentage and returns DOWN at 85%.

Set three Prometheus alerts

pool utilization > 80% for 1 minute, any connection timeout, and pending requests > 5 for 30 seconds. Use Grafana dashboards with trend panels.

Configure HikariCP with minimum-idle equal to maximum-pool-size, leak-detection-threshold at 60 seconds, and always set Hibernate query timeouts. Use circuit breakers with Resilience4j for graceful degradation.

INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01JUNIOR

What is the default HikariCP maximum pool size in Spring Boot 3.2 and wh...

Q02SENIOR

How would you expose HikariCP metrics via Spring Boot Actuator and what ...

Q03SENIOR

Explain how to build a custom health indicator that detects connection p...

Q04SENIOR

How would you integrate a circuit breaker with connection pool monitorin...

Q01 of 04JUNIOR

What is the default HikariCP maximum pool size in Spring Boot 3.2 and why is it problematic in production?

ANSWER

The default is 10 connections. This is problematic because under load, 10 connections can be exhausted quickly. For example, at 100 requests per second with 500ms average query time, you need 50 connections (Little's Law). The default causes timeouts and 503 errors under moderate traffic.

FAQ · 5 QUESTIONS

Frequently Asked Questions

How do I check my HikariCP connection pool size in production?

What is the default HikariCP connection pool size in Spring Boot 3.2?

How do I set up a custom health indicator for connection pool utilization?

What Prometheus alerts should I set for HikariCP connection pool?

How do I debug connection pool exhaustion at 3 AM?

Naren Founder & Principal Engineer

20+ years shipping production Java in banking & fintech. Everything here is grounded in real deployments.

✓ Verified

production tested

July 19, 2026

last updated

2,466

articles · all by Naren

🔥

That's Spring Boot. Mark it forged?

7 min read · try the examples if you haven't