Senior 5 min · March 05, 2026

Circuit Breaker Pattern — Timeouts Alone Kill Thread Pools

Thread pool hit 100% in 2 minutes when payment gateway leaked connections.

N
Naren · Founder
Plain-English first. Then code. Then the interview question.
About
 ● Production Incident 🔎 Debug Guide
Quick Answer
  • Circuit Breaker Pattern: a state machine that stops requests to a failing dependency
  • Closed: requests pass, failure counter increments on each failure
  • Open: all requests fail immediately, no network call made, threads freed
  • Half-Open: after timeout, limited probes test if service has recovered
  • Performance insight: fail-fast reduces thread pool exhaustion by up to 90% under high failure rates
  • Production insight: thread pool starvation is silent until timeout — circuit breaker prevents it
Plain-English First

Imagine your house has a fuse box. When too many appliances run at once and the wiring gets dangerously hot, the fuse trips and cuts power before your house burns down. You don't keep plugging things in — you wait, fix the problem, then carefully flip the switch back on. A Circuit Breaker in software does exactly this: when a downstream service keeps failing, it 'trips' and stops sending it requests so the whole system doesn't catch fire. It then quietly tests the water before fully reconnecting.

Distributed systems fail in ways that monoliths never do. A single slow database call can hold a thread. A hundred slow calls can hold a thread pool. At that point your entire service — which is otherwise perfectly healthy — is completely unavailable, brought down not by its own bugs but by something it was talking to. This is called cascading failure, and it's responsible for some of the most spectacular production outages in the industry.

The Circuit Breaker pattern exists to break that cascade. Instead of letting your service hammer a failing dependency indefinitely, it interposes a state machine between your code and the remote call. When failures breach a threshold, the breaker opens and subsequent calls fail fast — immediately, without touching the network — giving the downstream system breathing room to recover and protecting your own thread pool from exhaustion.

By the end of this article you'll understand exactly how the three-state machine works under the hood, how to tune failure thresholds and timeout windows without guessing, how to implement a production-grade breaker in Java from scratch, and the real-world gotchas that bite teams even when they think they've set it up correctly. We'll also compare the two dominant counting strategies — count-based and time-based sliding windows — so you can choose the right one for your traffic pattern.

What Is the Circuit Breaker Pattern?

The Circuit Breaker pattern is a state machine that monitors remote calls and opens when failures exceed a threshold. Its primary job: fail fast when a dependency is unhealthy, not slow — and give that dependency time to recover without being flooded with requests.

Think of it as a safety valve. In a closed state, all requests pass through normally. Each failure increments a counter. When the counter hits the configured threshold, the breaker trips to open, and subsequent requests are rejected immediately with an exception. After a recovery timeout, the breaker transitions to half-open, allowing a limited number of probe requests. If these succeed, the breaker closes again. If they fail, it reopens.

The pattern decouples error handling from business logic. You don't have to write try-catch blocks in every method that calls an external service. Instead, the circuit breaker centralises failure detection and recovery.

io/thecodeforge/circuitbreaker/CircuitBreaker.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
package io.thecodeforge.circuitbreaker;

public enum CircuitBreakerState {
    CLOSED,
    OPEN,
    HALF_OPEN
}

public class CircuitBreaker {
    private final int failureThreshold;
    private final long recoveryTimeoutMs;
    private CircuitBreakerState state = CircuitBreakerState.CLOSED;
    private int failureCount = 0;
    private Instant lastFailureTime;

    public CircuitBreaker(int failureThreshold, long recoveryTimeoutMs) {
        this.failureThreshold = failureThreshold;
        this.recoveryTimeoutMs = recoveryTimeoutMs;
    }

    public synchronized boolean isRequestAllowed() {
        if (state == CircuitBreakerState.CLOSED) {
            return true;
        }
        if (state == CircuitBreakerState.OPEN) {
            if (Duration.between(lastFailureTime, Instant.now()).toMillis() >= recoveryTimeoutMs) {
                state = CircuitBreakerState.HALF_OPEN;
                return true;
            }
            return false;
        }
        // half-open: allow exactly one probe (simplified)
        if (state == CircuitBreakerState.HALF_OPEN) {
            // In reality, track probe count
            return true;
        }
        return false;
    }

    public synchronized void recordFailure() {
        failureCount++;
        lastFailureTime = Instant.now();
        if (failureCount >= failureThreshold) {
            state = CircuitBreakerState.OPEN;
        }
    }

    public synchronized void recordSuccess() {
        if (state == CircuitBreakerState.HALF_OPEN) {
            state = CircuitBreakerState.CLOSED;
            failureCount = 0;
        }
    }

    public CircuitBreakerState getState() { return state; }
}
Output
The state machine tracks failures and transitions between CLOSED, OPEN, and HALF_OPEN. Simplified version for illustration — production implementations often use sliding windows and concurrent probes.
Why it's called a circuit breaker
  • Failures = current overload
  • Open state = tripped breaker, no current flows
  • Half-open state = attempt to reset breaker
  • Closed state = normal flow after reset
Production Insight
Circuit breakers protect your service's thread pool, not just the downstream system.
A thread pool that's 100% blocked on slow calls recovers slowly even after the downstream recovers — because all threads must complete their blocking calls first.
Always set a separate thread pool for the circuit-breaker-protected call to avoid cross-contamination.
Rule: isolate each dependency's circuit breaker into its own thread pool.
Key Takeaway
A circuit breaker centralises failure detection into a state machine.
Fail-fast beats fail-slow every time in production.
The breaker is a resource protector, not a retry mechanism.
When to use a circuit breaker
IfService calls a remote dependency that may fail intermittently
UseUse circuit breaker to fail fast and protect resources
IfFailures are transient and short-lived (e.g., network blip)
UseUse retry with exponential backoff instead — circuit breaker is too coarse
IfDependency is an internal microservice with SLAs
UseCircuit breaker is a good safety net even with retries. Combine both.
IfFailures are due to downstream unavailability (e.g., crash)
UseCircuit breaker + fallback (e.g., cached response) provides the best UX

The Three States and Their Transitions

CLOSED — Normal operation. All requests pass through. Each failure increments an internal counter. When the counter reaches the threshold, the breaker transitions to OPEN. In a count-based window, failures are counted within a fixed number of requests (e.g., 5 failures out of the last 10 requests). In time-based windows, failures are counted within a time window (e.g., 5 failures in the last 10 seconds).

OPEN — Requests are rejected immediately without calling the downstream service. The breaker remains open for a configurable recovery timeout. After this timeout, it transitions to HALF_OPEN.

HALF_OPEN — A limited number of probe requests are allowed through. If a probe succeeds, the breaker transitions back to CLOSED (and resets the failure count). If the probe fails, the breaker returns to OPEN and resets the recovery timeout. The number of probes and the success threshold are configurable.

The transition from HALF_OPEN to CLOSED should require a minimum number of consecutive successes (e.g., 3) to prevent flaps. A single success is not enough — one probe could succeed by luck while the downstream is still degraded.

io/thecodeforge/circuitbreaker/StateMachineTransition.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
package io.thecodeforge.circuitbreaker;

public class StateMachineTransition {
    public enum Transition {
        CLOSED_TO_OPEN,
        OPEN_TO_HALF_OPEN,
        HALF_OPEN_TO_CLOSED,
        HALF_OPEN_TO_OPEN
    }

    public Transition evaluate(CircuitBreakerState current, int failureCount, int threshold, long elapsedSinceLastFailure, long timeout) {
        switch (current) {
            case CLOSED:
                if (failureCount >= threshold) return Transition.CLOSED_TO_OPEN;
                break;
            case OPEN:
                if (elapsedSinceLastFailure >= timeout) return Transition.OPEN_TO_HALF_OPEN;
                break;
            case HALF_OPEN:
                // simplified: after one probe, decide based on success/failure
                // in production track probe results
                if (failureCount == 0) return Transition.HALF_OPEN_TO_CLOSED;
                else return Transition.HALF_OPEN_TO_OPEN;
        }
        throw new IllegalStateException("Unhandled state: " + current);
    }
}
Output
Transitions are deterministic based on failure counts and timers. The HALF_OPEN state acts as a liveness check.
Common mistake: probing with a different payload
When in HALF_OPEN, the probe request must be identical to a real request — including authentication headers, payload, and routing. A lightweight health endpoint does not test the actual service path. This leads to false positives: the breaker closes, but real requests fail.
Production Insight
The OPEN state often catches operators off guard because requests fail with an exception, not a timeout.
During an outage, the sudden 100% failure rate can seem worse than the original slow degradation — but it's actually protecting the system.
Alert on OPEN transitions to detect downstream failures early.
Rule: every OPEN transition should trigger a PagerDuty alert.
Key Takeaway
Three states, two transitions that matter: OPEN→HALF_OPEN is time-based, HALF_OPEN→CLOSED is success-based.
The half-open probe must mirror real traffic.
Don't flip back to CLOSED on a single success — require a minimum of 2–3 consecutive successes.
Choosing recovery timeout
IfDownstream service restarts in ~10 seconds
UseSet recovery timeout to 15–20 seconds to allow full startup
IfDownstream is a database that might need slow query recovery
UseRecovery timeout should be at least 30 seconds to allow query cache warmup
IfDownstream is a third-party API with unpredictable recovery
UseStart with 60 seconds and tune based on historical recovery data

Implementing a Circuit Breaker in Java: Production-Grade Approach

Building a circuit breaker from scratch is educational, but for production you should use a battle-tested library. Two popular choices in Java: Resilience4j and Spring Cloud Circuit Breaker. The following example uses Resilience4j, which provides sliding window counters, thread pool isolation, and event listeners.

Resilience4j's circuit breaker supports two counting strategies: - count-based: failures in the last N calls (e.g., last 10 calls) - time-based: failures within a time window (e.g., last 10 seconds)

Each strategy has its own internal sliding window implementation. The count-based strategy uses a circular buffer of size N, while the time-based strategy uses a sliding timestamp list. Both are efficient — O(1) for recording calls — but consume memory proportional to the window size.

io/thecodeforge/circuitbreaker/PaymentServiceWithBreaker.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
package io.thecodeforge.circuitbreaker;

import io.github.resilience4j.circuitbreaker.CircuitBreaker;
import io.github.resilience4j.circuitbreaker.CircuitBreakerConfig;
import io.github.resilience4j.circuitbreaker.CircuitBreakerRegistry;
import java.time.Duration;
import java.util.function.Supplier;

public class PaymentServiceWithBreaker {

    private final CircuitBreaker circuitBreaker;
    public PaymentServiceWithBreaker() {
        CircuitBreakerConfig config = CircuitBreakerConfig.custom()
                .slidingWindowType(CircuitBreakerConfig.SlidingWindowType.COUNT_BASED)
                .slidingWindowSize(10)
                .minimumNumberOfCalls(5)
                .failureRateThreshold(50)  // 50% failures -> open
                .recordExceptions(TimeoutException.class, IOException.class)
                .waitDurationInOpenState(Duration.ofSeconds(30))
                .permittedNumberOfCallsInHalfOpenState(3)
                .build();

        this.circuitBreaker = CircuitBreakerRegistry.ofDefaults()
                .circuitBreaker("paymentService", config);
    }

    public PaymentResult processPayment(PaymentRequest request) {
        Supplier<PaymentResult> decorated = CircuitBreaker.decorateSupplier(
            circuitBreaker, () -> callPaymentGateway(request));
        return decorated.get();
    }

    private PaymentResult callPaymentGateway(PaymentRequest request) {
        // actual HTTP call
        return paymentClient.charge(request);
    }
}
Output
Resilience4j provides all production features: sliding windows, half-open probes, thread pool isolation with Bulkhead pattern, and event streaming for monitoring.
Why minimumNumberOfCalls matters
The circuit breaker only evaluates the failure rate after at least minimumNumberOfCalls have been recorded. This prevents a small sample (e.g., first 2 requests both fail) from tripping the breaker too early. Set this to at least 5 for a medium-traffic service.
Production Insight
Using the default Resilience4j thread pool can cause thread contention if the breaker's thread pool is shared across multiple dependencies.
Create a separate thread pool per downstream dependency (or per group) to isolate failures.
Monitor the thread pool queue depth: if it builds up, the downstream is slow even before the circuit breaker opens.
Rule: one thread pool + circuit breaker pair per unique dependency.
Key Takeaway
Use a library like Resilience4j for production — don't write your own.
Sliding window choice affects how fast the breaker reacts to failure patterns.
Always configure minimumNumberOfCalls to avoid premature opening on cold start.
Resilience4j sliding window type selection
IfTraffic is constant and predictable (steady rate)
UseCOUNT_BASED — simpler, lower memory overhead
IfTraffic has bursts or lulls (e.g., batch jobs, spikes)
UseTIME_BASED — more accurate because it considers recent history regardless of request rate
IfYou have low request volume (< 1 req/sec)
UseTIME_BASED with a window of at least 60 seconds to gather enough sample

Count-Based vs Time-Based Sliding Windows: The Right Strategy for Your Traffic

The sliding window strategy determines how failures are aggregated. Count-based windows consider the last N requests. Time-based windows consider all requests within the last T duration. Both have trade-offs that matter in production.

Count-based is simple: keep a circular buffer of the last N call results. Each new call overwrites the oldest. Failure rate = failures / N. Works well when request rate is roughly constant. But during low traffic, the window is 'empty' for long periods, and a burst of failures near the end of the window may not trigger the breaker if earlier successes dilute the rate.

Time-based uses a sliding timestamp list. Each call records its result and timestamp. Old records are evicted when they're older than the window duration. This adapts naturally to traffic variations: during a spike, the window fills quickly; during a lull, it decays. The memory overhead is higher because every call's timestamp is stored — O(windowSize) in the count-based case vs O(requestsInWindow) in time-based.

Which one should you use? If your traffic is uniform (e.g., 100 req/s constantly), count-based is fine. If your traffic is bursty (e.g., periodic batch jobs that drive request spikes), time-based is more accurate because it measures real time, not request count.

Production Insight
A common production mistake: using count-based with a small window (e.g., 5) on a low-traffic service. If only 3 requests come in per minute, the window might contain results from 10 minutes ago — stale data. The breaker never opens even if the last 3 requests failed (but they're only 3 out of 5). Use time-based with at least 60 seconds for low-traffic services.
Another trap: time-based windows with very high request rates (e.g., 10k req/s) can consume significant memory if the window duration is long. The internal data structure stores every request's timestamp until eviction.
Rule: for high-volume systems, prefer count-based with a large enough window (100+); for variable traffic, use time-based.
Key Takeaway
Count-based is cheaper, time-based is more accurate under variable traffic.
Choose based on your request arrival distribution, not dogma.
Always test the window choice with production traffic replay before deploying.
Sliding window strategy decision
IfRequest rate is stable (> 10 req/sec)
UseCount-based, window size = 20–100
IfRequest rate varies by factor of 10 or more
UseTime-based, window duration = 10–60 seconds
IfVery high throughput (> 1000 req/sec)
UseCount-based with window size 100–1000 to limit memory
IfLow throughput (< 1 req/sec)
UseTime-based, window at least 60 seconds to collect meaningful sample

Production Gotchas: What Bites Teams That Think They've Set It Up Correctly

Even with a working circuit breaker, teams hit common pitfalls that cause outages. Here are the six most dangerous ones.

1. Circuit breaker on timeout only, not on exception type Many configurations only count timeouts as failures. But network errors, 5xx responses, and even 429 rate limits should also be counted. If you only count timeouts, a service returning 503 errors will never trip the breaker.

2. Half-open probes that don't match real traffic The probe request is often a simple health check. But the real failure could be a specific endpoint that's slow. Configuration: configure the circuit breaker's probe to use a representative call, or use the same method call with a decorator that records success/failure on every call (even when half-open).

3. Not isolating thread pools per circuit breaker If all circuit breakers share one thread pool for their downstream calls, one open breaker reduces the pool's available threads for other dependencies. Separate thread pools (using Resilience4j's Bulkhead) prevent this.

4. Recovery timeout too short Setting the open state duration to 5 seconds on a database that takes 30 seconds to restart causes continuous open/half-open flapping. Recovery timeout should be at least the P99 recovery time of the downstream service, plus 50%.

5. Forgetting to reset failures on success Some custom implementations never reset the failure count on a successful call while in CLOSED state. This causes the breaker to open after X total failures, even if they occurred days apart. Always reset the failure count after a successful call if you're using a count-based approach (or rely on sliding window).

6. No fallback mechanism Circuit breakers reject requests when open. If you don't provide a fallback (e.g., a cached response or a default value), the user gets an error. Combine circuit breaker with a fallback method for a better user experience.

io/thecodeforge/circuitbreaker/GotchaExample.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
package io.thecodeforge.circuitbreaker;

import io.github.resilience4j.circuitbreaker.CircuitBreakerConfig;
import io.github.resilience4j.circuitbreaker.CircuitBreaker;
import io.github.resilience4j.circuitbreaker.CircuitBreakerRegistry;
import java.time.Duration;
import java.util.function.Supplier;

public class GotchaExample {
    // CORRECT: records both exceptions and HTTP errors
    CircuitBreakerConfig config = CircuitBreakerConfig.custom()
        .recordExceptions(IOException.class, TimeoutException.class)
        .recordStatusCodes(500, 502, 503, 504, 429)
        .build();

    // WRONG: only records timeout
    CircuitBreakerConfig wrongConfig = CircuitBreakerConfig.custom()
        .recordExceptions(TimeoutException.class)
        .build();
}
Output
Record all failure signals: exceptions, HTTP error codes, and rate limits. A 503 is a failure even if it returns in 5ms.
Production Insight
The worst production failure I've seen: a team spent two weeks tuning circuit breakers per microservice, then forgot to add a fallback. When the payment gateway circuit opened, users got a raw 500 with a stack trace. The fix was a 3-line fallback returning a cached 'service unavailable' message.
Another team set recovery timeout to 5 seconds on a database that runs checkpoints every 30 seconds. The breaker flapped open/closed 12 times per minute, generating thousands of alerts.
Rule: always pair a circuit breaker with a meaningful fallback, and set recovery timeout to at least 1.5x the expected recovery time.
Key Takeaway
Six gotchas, six rules: record all failures, probe real traffic, isolate thread pools, set long enough recovery, reset counts on success, and always provide a fallback.
A circuit breaker without a fallback is just a faster error.
Gotcha prevention checklist
IfDo you record all relevant failure signals?
UseInclude exceptions, HTTP error codes, and rate limits
IfDoes the half-open probe match real traffic?
UseUse a decorator over the real method, not a separate health endpoint
IfIs there a fallback for when breaker is open?
UseProvide a cached response, default value, or degraded experience
IfIs the recovery timeout long enough?
UseAt least 1.5x the P99 recovery time of the downstream
● Production incidentPOST-MORTEMseverity: high

The Day the Thread Pool Died

Symptom
All checkout requests timeout after 30 seconds. No obvious error in the payment gateway logs. Thread pool metrics show 100% active threads, all waiting on the payment service.
Assumption
The payment gateway is slow but still processing — maybe we just need to bump the timeout. The database and other services are fine.
Root cause
The payment gateway had a connection pool leak, causing all connections to hang for 60 seconds. Without a circuit breaker, every incoming request created a new thread that blocked on the same downstream call. Thread pool exhausted in under 2 minutes.
Fix
Added a circuit breaker with a 5-failure threshold and 30-second recovery timeout. After the breaker opens, calls fail instantly, and the thread pool stays available for other operations. Configured a separate thread pool for payment calls to isolate failures.
Key lesson
  • Always wrap every remote call in a circuit breaker — even "reliable" internal services fail
  • Thread pool exhaustion is a silent killer; monitor thread pool usage with alerts at 80%
  • Timeouts alone are not enough — they just make the failure slower
Production debug guideSymptoms, actions, and commands to diagnose circuit breaker issues in production5 entries
Symptom · 01
Error rate spikes to 100% on a specific endpoint
Fix
Check if circuit breaker is open by inspecting logs for 'circuit breaker open' messages. If open, check health of downstream service. If closed, check failure counter and threshold.
Symptom · 02
Requests timeout after a consistent delay (e.g., 30s)
Fix
Verify the circuit breaker timeout window. A half-open state with a long timeout can cause all requests to wait for the probe result.
Symptom · 03
Thread pool metrics show high active threads but low CPU
Fix
Look for blocked I/O calls. Circuit breaker should be open — if not, the failure threshold may be too high or the counting window too large.
Symptom · 04
Intermittent failures even though downstream is healthy
Fix
Check if the half-open probe request is failing due to missing auth or payload mismatch. The probe path must exactly mirror a real request.
Symptom · 05
Circuit breaker toggles rapidly between open and closed
Fix
The recovery timeout may be too short (breaker reopens immediately). Increase it to at least the downstream service's average recovery time plus buffer.
★ Cheat Sheet: Debugging Circuit Breaker Failures FastQuick commands and checks for common circuit breaker problems in production.
Circuit breaker never opens
Immediate action
Check the failure count and threshold configuration. Ensure failures are being recorded correctly.
Commands
kubectl logs -l app=checkout --tail=100 | grep -i "circuit\|breaker"
curl localhost:8080/actuator/health | jq '.circuitBreakers'
Fix now
Increase failure threshold if too many transient errors are expected, or check that exception types are mapped correctly.
Circuit breaker stays open indefinitely+
Immediate action
Verify the half-open probe configuration. The recovery timeout may be unreachable or the probe request format wrong.
Commands
curl -X POST http://localhost:8080/actuator/circuitbreaker/reset
cat config/application.yml | grep -A5 "circuitbreaker"
Fix now
Reduce recovery timeout if downstream recovers quickly, or ensure probe request includes authentication headers.
Metric shows many half-open failures+
Immediate action
Check if the downstream service is intermittently failing. The recovery time may be insufficient.
Commands
kubectl exec -it deploy/payment-gateway -- curl localhost:8081/health
promql: rate(circuit_breaker_half_open_failures_total[5m])
Fix now
Increase the number of probe requests (e.g., 3 probes before closing) and monitor success rate.
Circuit Breaker Strategies vs Retry vs Timeout
PatternPrimary GoalWhen to UseRisk
Circuit BreakerFail fast, protect resourcesRemote calls with intermittent failuresPremature opening, false positives
Retry with BackoffHandle transient failuresNetwork blips, temporary unavailabilityExacerbation of load (thundering herd)
TimeoutLimit wait timeAll remote callsThread pool exhaustion without breaker
FallbackProvide degraded responseWhen breaker open or retries exhaustedStale data, user confusion

Key takeaways

1
Circuit breaker is a state machine that opens when failures exceed a threshold, giving downstream time to recover.
2
Fail-fast protects your thread pool
a blocked thread is worse than a quick error.
3
Half-open probes must mirror real traffic, not a separate health endpoint.
4
Choose sliding window type based on traffic pattern
count-based for steady, time-based for variable.
5
Always provide a fallback when the breaker is open.
6
Resilience4j (or equivalent) is production-ready; don't hand-roll for critical paths.

Common mistakes to avoid

5 patterns
×

Setting failure threshold too high

Symptom
Circuit breaker never opens; thread pool exhausts before breaker trips
Fix
Set threshold to 50% failure rate over the last 10–20 requests. Adjust based on normal error rate.
×

Using a generic health endpoint for half-open probes

Symptom
Half-open probe succeeds, but real request fails — breaker closes prematurely
Fix
Use the same method call with a decorator that records success/failure even in half-open state. Never use a separate health check.
×

Sharing thread pool across all circuit breakers

Symptom
One slow dependency starves the shared pool, affecting all other services
Fix
Use Resilience4j's Bulkhead to create a separate thread pool per circuit breaker group.
×

Forgetting to reset failure count on success in custom implementations

Symptom
Breaker opens after X cumulative failures, even if they happened weeks apart
Fix
Use a sliding window implementation (Resilience4j's built-in) that naturally ages out old failures.
×

Not providing a fallback

Symptom
Users see raw 500 errors when breaker opens
Fix
Always implement a fallback method that returns a cached reply, default value, or degraded message.
INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01SENIOR
Explain the three states of a circuit breaker and how transitions happen...
Q02SENIOR
What's the difference between count-based and time-based sliding windows...
Q03SENIOR
How would you debug a circuit breaker that never opens despite downstrea...
Q01 of 03SENIOR

Explain the three states of a circuit breaker and how transitions happen.

ANSWER
The three states are CLOSED (normal operation, failures counted), OPEN (fail fast, no calls to downstream), and HALF_OPEN (probing for recovery after timeout). Transition: CLOSED → OPEN when failure threshold is reached; OPEN → HALF_OPEN after recovery timeout expires; HALF_OPEN → CLOSED after a configurable number of consecutive probe successes; HALF_OPEN → OPEN if a probe fails. Transitions are enforced by a state machine to ensure deterministic behaviour.
FAQ · 5 QUESTIONS

Frequently Asked Questions

01
What is a circuit breaker pattern in simple terms?
02
Is circuit breaker the same as timeout?
03
When should you not use a circuit breaker?
04
How do I test a circuit breaker implementation?
05
Can multiple circuit breakers share a thread pool?
🔥

That's Components. Mark it forged?

5 min read · try the examples if you haven't

Previous
Service Discovery
9 / 18 · Components
Next
Reverse Proxy vs Forward Proxy