Advanced 7 min · March 06, 2026

CountDownLatch Deadlock — Missing countDown() After Crash

One unchecked exception before countDown() hangs your service forever.

N
Naren · Founder
Plain-English first. Then code. Then the interview question.
About
Quick Answer
  • CountDownLatch is a one-shot gate: thread(s) wait until count reaches zero, then stays open forever.
  • CyclicBarrier is a reusable meeting point: all threads wait for each other, then reset for next cycle.
  • CountDownLatch uses AQS with a single CAS per countDown(); CyclicBarrier uses ReentrantLock + Condition, higher overhead.
  • In production, always use await(timeout) with CountDownLatch; a hung worker blocks indefinitely.
  • CyclicBarrier's broken barrier state is active failure detection; CountDownLatch just stays stuck silently.
  • Biggest mistake: using CountDownLatch when you need reuse, or CyclicBarrier when participants are dynamic.

Modern Java applications rarely run a single task at a time. Whether you're loading config from three different microservices before serving the first request, running parallel test suites, or coordinating phases in a data-processing pipeline, you need threads to wait for each other in a controlled, predictable way. Get this wrong and you end up with race conditions, deadlocks, or — the sneaky worst case — a service that silently produces incomplete results because one thread raced ahead before the others were ready.

Both CountDownLatch and CyclicBarrier live in java.util.concurrent and solve the 'threads waiting for each other' problem, but they solve subtly different flavours of it. CountDownLatch is about one or more threads waiting until a set of operations performed by other threads completes — think dependencies. CyclicBarrier is about a fixed group of threads all waiting until every member of that group is ready to proceed together — think synchronisation points in iterative work.

By the end of this article you'll understand the internal mechanics of both primitives, know exactly which one to reach for in a given situation, be able to explain their trade-offs in an interview without hesitation, and have production-ready patterns you can drop straight into your codebase.

CountDownLatch — Internals, Lifecycle and When to Reach for It

CountDownLatch wraps an AbstractQueuedSynchronizer (AQS) state integer. When you call new CountDownLatch(n), the AQS state is initialised to n. Every countDown() call performs a compareAndSet that decrements the state by 1 — atomically, without a lock. When the state hits 0, all threads parked in await() are unblocked via AQS's release mechanism. That's it. There is no reset path in the API. The latch is a one-way gate.

This single-use nature is a feature, not a limitation. It makes CountDownLatch perfect for start-up sequencing (wait for N services to register before opening traffic), test coordination (wait for N worker threads to complete before asserting results), and event broadcasting (all waiting threads unblock simultaneously the moment the count hits zero).

The key mental model: the thread calling await() is the dependent — it needs work done. The threads calling countDown() are the producers — they signal completion. These roles can overlap; a thread can countDown() and then await() on a different latch, which is exactly how two-phase startup coordination is built.

CyclicBarrier — Reusable Phases, the Barrier Action, and Its AQS Internals

CyclicBarrier is built differently from CountDownLatch. It uses an internal ReentrantLock and a Condition to park threads rather than AQS directly. The critical state is a 'generation' object that gets replaced each time the barrier trips (resets). This generation mechanism is precisely what makes the barrier cyclic — each trip through the barrier starts a fresh generation, so the same CyclicBarrier instance coordinates an unbounded number of phases.

The constructor accepts an optional Runnable barrierAction. This action runs exactly once per cycle, in the last thread to arrive at the barrier, before any of the waiting threads are released. This is incredibly useful for aggregating results from the phase that just completed (e.g., merging partial sums) before the next phase begins — all without an external synchronisation step.

Broken barrier state is a crucial concept you must understand. If any thread waiting at a barrier is interrupted or times out, the barrier enters a broken state. Every thread currently waiting — and every thread that calls await() on that barrier in the future — gets a BrokenBarrierException. The only recovery is to build a new CyclicBarrier. This failure mode is intentional: a partially-completed phase in iterative work produces corrupt results, so it's better to fail loudly.

Use CyclicBarrier for parallel iterative algorithms (matrix multiplication phases, parallel merge sort stages), simulation loops where N agent threads must sync before each tick, and multi-stage data-processing pipelines where every worker must finish stage N before any starts stage N+1.

Head-to-Head Comparison — Choosing the Right Tool Under Pressure

The single most important question to ask yourself is: 'Is the wait one-directional (waiters depend on workers) or mutual (everyone waits for everyone)?' CountDownLatch is one-directional. CyclicBarrier is mutual.

The second question is: 'Does this pattern repeat?' If threads need to sync once and move on independently, use CountDownLatch. If threads must sync at the end of every phase in a loop, CyclicBarrier's automatic reset is exactly what you need — recreating a CountDownLatch every iteration is wasteful and error-prone.

Performance considerations matter at scale. CountDownLatch.countDown() is a single CAS on an AQS integer — extremely cheap. CyclicBarrier.await() acquires a ReentrantLock, which involves more overhead. For ultra-hot paths with thousands of threads syncing per second, consider Phaser (the more flexible successor to both) which uses a tree-structured internal state to reduce contention. For most application-level coordination (tens of threads, not thousands), both primitives are fast enough that the design clarity matters far more than the performance difference.

Error propagation also differs sharply. A failed countDown() call (e.g., from a crashed thread that never calls it) simply leaves the latch stuck — which is why the timeout overload of await() is non-negotiable in production. CyclicBarrier's broken-barrier state at least actively notifies waiting threads that something went wrong, making it somewhat easier to detect a fault mid-cycle.

Production Decision Framework — How to Pick the Right Primitive

You've seen the internals. Now here's a concrete decision tree you can apply in code reviews or on the whiteboard. Start with these three questions:

1. Roles: Are there distinct 'waiters' and 'workers', or does every thread play both roles?** - Distinct roles → CountDownLatch. One thread (or group) waits for others to finish. - All threads equal → CyclicBarrier. Everyone waits for everyone.

2. Repeatability: Will this coordination point be used exactly once, or multiple times?** - Once → CountDownLatch (or create a new one each time, but that's fragile). - Multiple times → CyclicBarrier (auto-reset) or Phaser (if participants change).

3. Failure semantics: What should happen if a worker fails?** - Silent stuck latch? Use CountDownLatch with timeout. - Active failure notification? Use CyclicBarrier — BrokenBarrierException tells all threads.

ScenarioBest ChoiceWhy
Start-up sequencing (wait for N services)CountDownLatchOne-shot, distinct roles
Parallel algorithm with phasesCyclicBarrierReusable, all threads equal
Test coordination (wait for threads to finish)CountDownLatchSimple, one-time
Dynamic worker pool for iterative processingPhaserParticipants can join/leave
Event broadcasting (fire when all ready)CountDownLatchAll waiters unblock simultaneously
Simulation ticks where each tick is a phaseCyclicBarrierAuto-reset, barrier action for aggregation

In production, apply the rule of least surprise: pick the primitive whose name and contract clearly communicate the intent. Your future self — and your colleagues — will thank you.

Common Pitfalls and How to Avoid Them

Even experienced developers make these mistakes. Here's what to watch for.

Pitfall 1: Missing countDown() guarantee If your worker code throws an unchecked exception before calling countDown(), the latch never reaches zero. Always wrap the body in try-finally and call countDown() in the finally block. This ensures the latch is decremented even on failure.

Pitfall 2: Forgeting to restore the interrupt flag When you catch InterruptedException, you must call Thread.currentThread().interrupt() to reassert the interrupt. Failure to do so leaves the thread in a state that can't be cancelled, and if that thread is waiting on a CyclicBarrier, it never breaks the barrier — leading to a deadlock.

Pitfall 3: Reusing a CountDownLatch by creating a new one in a loop You create a new CountDownLatch(n) each iteration, but if a reference from a previous iteration is still held by another thread, that latch is exhausted and await() returns immediately. Switch to CyclicBarrier or Phaser if you need reuse.

Pitfall 4: Calling countDown() after the latch has reached zero It's a no-op, but it can mask bugs. For example, if you accidentally call countDown() 5 times on a latch initialised with 3, the extra calls do nothing — but you'll never know a worker was supposed to only run once. Add assertions if you suspect over-counting.

Pitfall 5: Using CyclicBarrier with more threads than the party count If you submit 5 workers but the barrier expects 4, the barrier will never trip because the 5th thread's await() doesn't count? Actually it does – if you submit extra threads that also call await(), they increase the effective party? No, the barrier waits for exactly its party count. If 5 threads call await() on a 4-part barrier, one thread will be left waiting forever. Ensure threads == barrier parties exactly, or use a secondary coordination mechanism.

Pitfall 6: Not handling BrokenBarrierException If you ignore BrokenBarrierException and continue, you risk processing garbage data. Always abort the current phase and restart with a fresh barrier.

CountDownLatch vs CyclicBarrier — Side-by-Side
Feature / AspectCountDownLatchCyclicBarrier
Reusable after trippingNo — single use onlyYes — resets automatically each cycle
Who waitsOne or more designated waiter threadsAll participant threads wait for each other
Internal synchroniserAQS (compareAndSet on state integer)ReentrantLock + Condition + generation object
Barrier/trip actionNot supportedOptional Runnable runs in last-arriving thread
Failed-thread behaviourLatch stays stuck (timeout is your safety net)Barrier enters broken state; BrokenBarrierException thrown to all
Dynamic participant countNot supportedNot supported (use Phaser instead)
Primary use caseStart-up sequencing, one-time event signallingIterative phase synchronisation, parallel algorithms
Thread rolesWaiters vs. workers (distinct roles)All threads are both workers and waiters
Performance overheadVery low — single CAS per countDown()Higher — ReentrantLock acquisition per await()
Available sinceJava 5 (java.util.concurrent)Java 5 (java.util.concurrent)

Key Takeaways

  • CountDownLatch is a one-shot gate: once the count hits zero it stays open forever and cannot be reset — design your code around this or pick a different tool.
  • CyclicBarrier's broken-barrier state is a feature: it actively poisons all waiting threads when one fails, preventing silent partial-phase execution that would corrupt iterative results.
  • The barrier action in CyclicBarrier runs in the last-arriving thread before any waiting threads are released — exploit this for zero-overhead phase aggregation without an extra synchronisation step.
  • When participant count is unknown at construction time, or you need tiered phase callbacks, Phaser is the right upgrade path — mentioning it unprompted in an interview signals genuine production experience.
  • Always use timeout-based await() variants in production — the one-shot nature of CountDownLatch makes infinite blocking dangerous, and CyclicBarrier's timeout prevents indefinite waits on broken barriers.

Common Mistakes to Avoid

  • Not guaranteeing countDown() in a finally block
    Symptom: A crashed worker thread never calls countDown(). The latch never reaches zero, and the awaiting main thread hangs indefinitely (if no timeout) or times out (if timeout used but wasted).
    Fix: Always wrap the body of the worker Runnable in try-finally, placing latch.countDown() in the finally block. Even if the worker throws an unchecked exception, the latch is decremented.
  • Catching InterruptedException without restoring the interrupt flag
    Symptom: If InterruptedException is caught but not reasserted, the thread is not marked as interrupted. Future blocking calls won't respond to cancellation. In CyclicBarrier, the barrier breaks but the thread doesn't propagate the interrupt; other threads may not get BrokenBarrierException correctly.
    Fix: Always call Thread.currentThread().interrupt() in the catch block for InterruptedException. For CountDownLatch, still call countDown() to unblock others. For CyclicBarrier, let await() throw BrokenBarrierException.
  • Reusing a CountDownLatch by allocating a new one in a loop
    Symptom: Each iteration creates a fresh latch, but if any thread from a previous iteration still holds a reference, that old latch is already at zero. The new latch may not be used by all participants, causing premature completion or hang.
    Fix: If you need a reusable countdown mechanism, switch to CyclicBarrier or Phaser. If you must use CountDownLatch, ensure no stale references exist before creating a new instance.
  • Calling CyclicBarrier.await() without handling BrokenBarrierException
    Symptom: If a barrier breaks (due to timeout or interrupt), subsequent await() calls throw BrokenBarrierException. If the code catches Exception generically or ignores it, the phase may continue with partial data.
    Fix: Always catch BrokenBarrierException separately and abort the current computation. Do not proceed with the next phase. Log the error and create a new CyclicBarrier if needed.
  • Using CyclicBarrier with a thread pool that has dynamic size
    Symptom: If the number of threads calling await() varies (e.g., cached thread pool), the barrier may never trip because the party count is fixed. Extra threads calling await() exceed the intended count, or too few threads arrive.
    Fix: Use Phaser instead of CyclicBarrier when the participant count is dynamic. Alternatively, ensure your thread pool size exactly matches the barrier party count and no extra threads call await().

Interview Questions on This Topic

  • QCan you explain the difference between CountDownLatch and CyclicBarrier, and give a concrete production scenario where you'd pick one over the other?Mid-levelReveal
    CountDownLatch is a one-shot gate: one or more threads wait until a count reaches zero. CyclicBarrier is a reusable meeting point where a fixed number of threads wait for each other, then reset. In production, I'd use CountDownLatch for service startup sequencing — wait for database, cache, and message broker to signal readiness before opening HTTP traffic. I'd use CyclicBarrier for a parallel image processing pipeline where each processor must finish filtering before the next stage begins, and this repeats for every batch.
  • QWhat happens to a CyclicBarrier if one of the waiting threads is interrupted? How does that affect the other threads, and how would you recover?SeniorReveal
    If a thread waiting at CyclicBarrier.await() is interrupted, the barrier immediately enters a broken state. All other waiting threads receive a BrokenBarrierException when they call await(). The barrier stays broken permanently — you must construct a new CyclicBarrier to recover. Recovery involves catching BrokenBarrierException, logging the failure, and restarting the phase with a fresh barrier instance.
  • QIf CountDownLatch.countDown() is called more times than the initial count — say the latch was created with count 3 and countDown() is called 5 times — what happens? And how does that differ from CyclicBarrier.await() being called by more threads than the barrier's party count?SeniorReveal
    Extra countDown() calls on CountDownLatch are no-ops when the count is already zero. They do not throw or have any effect. For CyclicBarrier, calling await() with more threads than the party count will cause the barrier to trip when the exact number of parties arrive, leaving the extra threads waiting forever on a barrier that will never reset. This is a deadlock. Always ensure the number of threads calling await() equals the barrier's party count exactly.
  • QExplain the generation mechanism inside CyclicBarrier. Why is it important for correctness?SeniorReveal
    CyclicBarrier uses an object called 'generation' to track which cycle the barrier is in. Each time the barrier trips, a new generation is created. This is important because when a barrier breaks (e.g., due to timeout), the generation is marked as broken. Any thread that calls await() on the old generation immediately sees the broken flag and throws BrokenBarrierException. When the barrier resets normally, a new generation is created, so new cycles are unaffected by past failures. This design prevents stale state from corrupting future cycles.
  • QHow would you implement a reusable countdown latch using CyclicBarrier? What are the limitations?JuniorReveal
    You could implement a reusable countdown by using a CyclicBarrier with party count = N+1 (N workers + 1 controller). The controller thread waits at the barrier after setting up work; workers call await() after completing their task. When all N workers have arrived, the barrier trips and the controller knows all tasks are done. However, this approach is convoluted — you're abusing CyclicBarrier's design. It's better to use Phaser or simply create a new CountDownLatch per iteration. The limitations include managing the controller thread and handling the barrier reset correctly.

Frequently Asked Questions

Can a CountDownLatch count back up after reaching zero?

No. Once a CountDownLatch reaches zero it is permanently open. There is no reset or increment method in the API — this is by design. If you need a resettable gate, use CyclicBarrier (fixed participant count) or Phaser (dynamic participant count).

What is the BrokenBarrierException in Java and when does it get thrown?

BrokenBarrierException is thrown to any thread calling CyclicBarrier.await() when the barrier is in a broken state. The barrier breaks if any waiting thread is interrupted, if any waiting thread times out via the await(long, TimeUnit) overload, or if the barrier action throws an exception. Once broken, the barrier stays broken — you must construct a new instance to recover.

Is it safe to call CountDownLatch.countDown() from multiple threads at the same time?

Yes, completely. countDown() performs a single atomic compareAndSet operation on the internal AQS state, making it inherently thread-safe with no locks. Multiple threads can call it simultaneously without any external synchronisation.

Can CyclicBarrier be reused after a BrokenBarrierException?

No. Once a CyclicBarrier enters a broken state, it stays broken permanently. Even if all threads receive BrokenBarrierException, the barrier will never reset. You must create a new CyclicBarrier instance to continue coordination.

What is the difference between CyclicBarrier and Phaser?

Phaser is a more flexible version of CyclicBarrier. Key differences: Phaser supports dynamic participant registration (parties can join/leave at runtime), per-phase callbacks, and can be terminated or deregistered. CyclicBarrier requires a fixed number of parties at construction and has only one optional barrier action. Use Phaser when the number of threads is unknown or changes over time.

🔥

That's Multithreading. Mark it forged?

7 min read · try the examples if you haven't

Previous
Atomic Classes in Java
10 / 10 · Multithreading
Next
File Handling in Java