Junior 9 min · March 06, 2026

CountDownLatch Deadlock — Missing countDown() After Crash

One unchecked exception before countDown() hangs your service forever.

N
Naren Founder & Principal Engineer

20+ years shipping production Java in banking & fintech. Drawn from code that ran under real load.

Follow
Production
production tested
May 24, 2026
last updated
1,554
articles · all by Naren
 ● Production Incident 🔎 Debug Guide ⚙ Triage Commands
Quick Answer
  • CountDownLatch is a one-shot gate: thread(s) wait until count reaches zero, then stays open forever.
  • CyclicBarrier is a reusable meeting point: all threads wait for each other, then reset for next cycle.
  • CountDownLatch uses AQS with a single CAS per countDown(); CyclicBarrier uses ReentrantLock + Condition, higher overhead.
  • In production, always use await(timeout) with CountDownLatch; a hung worker blocks indefinitely.
  • CyclicBarrier's broken barrier state is active failure detection; CountDownLatch just stays stuck silently.
  • Biggest mistake: using CountDownLatch when you need reuse, or CyclicBarrier when participants are dynamic.
✦ Definition~90s read
What is CountDownLatch and CyclicBarrier?

CountDownLatch and CyclicBarrier are Java's go-to concurrency synchronizers for coordinating threads, but they solve fundamentally different problems. CountDownLatch is a one-shot gate: you set a count, threads decrement it via countDown(), and one or more threads block on await() until the count hits zero.

Imagine a rocket launch.

It's designed for scenarios like waiting for N services to start, N tasks to complete, or a single signal to fire. Once the latch reaches zero, it's permanently open — no reset. CyclicBarrier, by contrast, is a reusable rendezvous point: N threads call await(), and when all arrive, they all proceed simultaneously.

It optionally runs a barrier action (a Runnable) at that point, then resets for the next cycle. This makes it ideal for phased computations like iterative algorithms, batch processing, or multi-stage simulations where threads synchronize at phase boundaries.

Under the hood, both use AbstractQueuedSynchronizer (AQS), but CountDownLatch uses a shared acquire/release mode with a state representing the count, while CyclicBarrier uses a ReentrantLock and Condition internally (not AQS directly) to manage the generation and reset logic. The critical difference in practice: a missing countDown() after a crash in CountDownLatch causes permanent deadlock — threads waiting on await() never wake.

CyclicBarrier handles thread failure more gracefully via a broken barrier exception, but misuse (e.g., forgetting to catch exceptions in the barrier action) can leave it permanently broken. Choose CountDownLatch for one-shot coordination where you control the count precisely; choose CyclicBarrier for repeated synchronization phases where you need reset capability and fault detection.

Plain-English First

Imagine a rocket launch. The countdown — 10, 9, 8 … 1, 0 — happens once, and when it hits zero, the rocket fires. That's a CountDownLatch: a one-shot gate that opens when a count reaches zero. Now imagine a relay race where all four runners must reach the exchange zone before anyone passes the baton. Once they're all there, they all go — and the next lap can repeat the same wait. That's a CyclicBarrier: a reusable meeting point that resets after every group finishes.

Modern Java applications rarely run a single task at a time. Whether you're loading config from three different microservices before serving the first request, running parallel test suites, or coordinating phases in a data-processing pipeline, you need threads to wait for each other in a controlled, predictable way. Get this wrong and you end up with race conditions, deadlocks, or — the sneaky worst case — a service that silently produces incomplete results because one thread raced ahead before the others were ready.

Both CountDownLatch and CyclicBarrier live in java.util.concurrent and solve the 'threads waiting for each other' problem, but they solve subtly different flavours of it. CountDownLatch is about one or more threads waiting until a set of operations performed by other threads completes — think dependencies. CyclicBarrier is about a fixed group of threads all waiting until every member of that group is ready to proceed together — think synchronisation points in iterative work.

By the end of this article you'll understand the internal mechanics of both primitives, know exactly which one to reach for in a given situation, be able to explain their trade-offs in an interview without hesitation, and have production-ready patterns you can drop straight into your codebase.

CountDownLatch vs CyclicBarrier — Two Synchronizers, One Critical Difference

CountDownLatch and CyclicBarrier are both Java synchronizers that coordinate multiple threads, but they solve fundamentally different problems. CountDownLatch is a one-shot gate: one or more threads block until a fixed number of countDown() calls have been made. CyclicBarrier is a reusable rendezvous point: a fixed number of threads all wait for each other to arrive, then proceed together.

CountDownLatch is not reusable — once the count reaches zero, the latch is permanently open. CyclicBarrier resets automatically after all parties trip it, and can optionally run a barrier action. Both operate in O(1) time per operation under the hood, using AQS (AbstractQueuedSynchronizer). The practical difference: CountDownLatch signals an event; CyclicBarrier synchronizes a phase.

Use CountDownLatch when you need to wait for N operations to complete before proceeding — e.g., waiting for N services to start, or N parallel tasks to finish. Use CyclicBarrier when you have a fixed-size group of threads that must meet at a common point repeatedly — e.g., in parallel simulations or multi-phase computations. The wrong choice leads to deadlocks or wasted threads.

One-Shot vs Reusable
CountDownLatch cannot be reset. If you need to synchronize multiple phases, use CyclicBarrier — or you'll be creating a new latch each time, which is error-prone.
Production Insight
Missing countDown() after a thread crash or exception leaves the latch permanently open, causing all waiting threads to block forever.
Symptom: application hangs at startup or during shutdown, thread dumps show threads parked at CountDownLatch.await().
Rule: always wrap countDown() in a finally block, and consider a timeout on await() to detect stuck latches.
Key Takeaway
CountDownLatch is a one-shot event signal; CyclicBarrier is a reusable thread rendezvous.
Always decrement CountDownLatch in a finally block to prevent deadlock on exception.
Use CyclicBarrier for multi-phase parallelism; use CountDownLatch for one-time coordination.
CountDownLatch vs CyclicBarrier: Key Differences THECODEFORGE.IO CountDownLatch vs CyclicBarrier: Key Differences Comparison of synchronizer mechanics, lifecycle, and use cases CountDownLatch One-shot gate; countDown() decrements; await() blocks until zero CyclicBarrier Reusable barrier; parties await; optional barrier action runs CountDownLatch Lifecycle Not reusable; count set once; tasks decrement independently CyclicBarrier Reusability Resets after all parties arrive; supports multiple phases Production Decision Choose based on reusability need and task vs thread semantics ⚠ Missing countDown() after crash causes deadlock Use try/finally to ensure countDown() even on failure THECODEFORGE.IO
thecodeforge.io
CountDownLatch vs CyclicBarrier: Key Differences
Countdownlatch Cyclicbarrier Java

CountDownLatch — Internals, Lifecycle and When to Reach for It

CountDownLatch wraps an AbstractQueuedSynchronizer (AQS) state integer. When you call new CountDownLatch(n), the AQS state is initialised to n. Every countDown() call performs a compareAndSet that decrements the state by 1 — atomically, without a lock. When the state hits 0, all threads parked in await() are unblocked via AQS's release mechanism. That's it. There is no reset path in the API. The latch is a one-way gate.

This single-use nature is a feature, not a limitation. It makes CountDownLatch perfect for start-up sequencing (wait for N services to register before opening traffic), test coordination (wait for N worker threads to complete before asserting results), and event broadcasting (all waiting threads unblock simultaneously the moment the count hits zero).

The key mental model: the thread calling await() is the dependent — it needs work done. The threads calling countDown() are the producers — they signal completion. These roles can overlap; a thread can countDown() and then await() on a different latch, which is exactly how two-phase startup coordination is built.

io.thecodeforge.concurrent.ServiceStartupCoordinator.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
package io.thecodeforge.concurrent;

import java.util.concurrent.CountDownLatch;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;

/**
 * Simulates a service that must wait for three dependent sub-services
 * (database, cache, and message broker) to finish initialising before
 * it opens its own HTTP listener.
 */
public class ServiceStartupCoordinator {

    // Three sub-services must signal readiness before the main service starts.
    private static final int DEPENDENCY_COUNT = 3;

    public static void main(String[] args) throws InterruptedException {

        CountDownLatch readinessLatch = new CountDownLatch(DEPENDENCY_COUNT);
        ExecutorService startupPool = Executors.newFixedThreadPool(DEPENDENCY_COUNT);

        // Each Runnable simulates a sub-service initialising and then
        // decrementing the latch to signal it is ready.
        startupPool.submit(new SubServiceInitialiser("DatabasePool",   1200, readinessLatch));
        startupPool.submit(new SubServiceInitialiser("RedisCache",      400, readinessLatch));
        startupPool.submit(new SubServiceInitialiser("MessageBroker",   800, readinessLatch));

        System.out.println("[MainThread] Waiting for all dependencies to become ready...");

        // await() parks the main thread inside AQS until the internal state reaches 0.
        // Using a timeout is critical in production — never block forever.
        boolean allReady = readinessLatch.await(5, TimeUnit.SECONDS);

        if (allReady) {
            System.out.println("[MainThread] All dependencies ready. Opening HTTP listener on :8080");
        } else {
            // This branch fires if one sub-service hangs past the timeout.
            System.err.println("[MainThread] Startup timeout! Shutting down safely.");
        }

        startupPool.shutdown();
    }

    static class SubServiceInitialiser implements Runnable {
        private final String serviceName;
        private final long   initDelayMs;    // Simulates different boot times
        private final CountDownLatch latch;

        SubServiceInitialiser(String serviceName, long initDelayMs, CountDownLatch latch) {
            this.serviceName  = serviceName;
            this.initDelayMs  = initDelayMs;
            this.latch        = latch;
        }

        @Override
        public void run() {
            try {
                System.out.printf("[%s] Initialising...%n", serviceName);
                Thread.sleep(initDelayMs);  // Simulate IO-bound startup work
                System.out.printf("[%s] Ready. Counting down.%n", serviceName);

                // countDown() is atomic — safe to call from multiple threads simultaneously.
                // It NEVER throws; even calling it when count is already 0 is a no-op.
                latch.countDown();

            } catch (InterruptedException e) {
                // Restore the interrupt flag — never swallow InterruptedException silently.
                Thread.currentThread().interrupt();
                System.err.printf("[%s] Interrupted during initialisation.%n", serviceName);
                // Still count down so the main thread isn't left waiting forever.
                latch.countDown();
            }
        }
    }
}
Output
[MainThread] Waiting for all dependencies to become ready...
[DatabasePool] Initialising...
[RedisCache] Initialising...
[MessageBroker] Initialising...
[RedisCache] Ready. Counting down.
[MessageBroker] Ready. Counting down.
[DatabasePool] Ready. Counting down.
[MainThread] All dependencies ready. Opening HTTP listener on :8080
Watch Out: Always Use the Timeout Overload of await()
await() with no arguments blocks forever. In production, a crashed sub-service will never call countDown(), parking your main thread indefinitely. Use await(long timeout, TimeUnit unit) and handle the false return — it's the difference between a graceful degradation and a hung service with no stack trace to diagnose.
Production Insight
A single crashed worker can turn a CountDownLatch into a permanent deadlock.
Always wrap countDown() in finally — even exceptions during init must not skip it.
Rule: use timeout on await() and handle the false return every single time.
Key Takeaway
CountDownLatch is a one-way gate.
Once open, it never closes.
Guarantee countDown() with finally, and never await() without a timeout.

CyclicBarrier — Reusable Phases, the Barrier Action, and Its AQS Internals

CyclicBarrier is built differently from CountDownLatch. It uses an internal ReentrantLock and a Condition to park threads rather than AQS directly. The critical state is a 'generation' object that gets replaced each time the barrier trips (resets). This generation mechanism is precisely what makes the barrier cyclic — each trip through the barrier starts a fresh generation, so the same CyclicBarrier instance coordinates an unbounded number of phases.

The constructor accepts an optional Runnable barrierAction. This action runs exactly once per cycle, in the last thread to arrive at the barrier, before any of the waiting threads are released. This is incredibly useful for aggregating results from the phase that just completed (e.g., merging partial sums) before the next phase begins — all without an external synchronisation step.

Broken barrier state is a crucial concept you must understand. If any thread waiting at a barrier is interrupted or times out, the barrier enters a broken state. Every thread currently waiting — and every thread that calls await() on that barrier in the future — gets a BrokenBarrierException. The only recovery is to build a new CyclicBarrier. This failure mode is intentional: a partially-completed phase in iterative work produces corrupt results, so it's better to fail loudly.

Use CyclicBarrier for parallel iterative algorithms (matrix multiplication phases, parallel merge sort stages), simulation loops where N agent threads must sync before each tick, and multi-stage data-processing pipelines where every worker must finish stage N before any starts stage N+1.

io.thecodeforge.concurrent.ParallelMatrixRowProcessor.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
package io.thecodeforge.concurrent;

import java.util.concurrent.BrokenBarrierException;
import java.util.concurrent.CyclicBarrier;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.Arrays;

/**
 * Demonstrates a two-phase parallel computation.
 * Phase 1: Each worker thread computes the row-sum for its assigned matrix row.
 * Phase 2: After all row-sums are ready, each worker uses the global total
 *          to normalise its own row-sum.
 *
 * The CyclicBarrier ensures Phase 2 never starts until Phase 1 is 100% done.
 */
public class ParallelMatrixRowProcessor {

    private static final int ROW_COUNT    = 4;  // One worker thread per row
    private static final int COLUMN_COUNT = 5;

    // Shared result arrays — workers write here, the barrier action reads here.
    private static final int[]    rowSums     = new int[ROW_COUNT];
    private static       int      globalTotal = 0;  // Set by barrier action

    // Sample matrix — in practice this would be loaded from a data source.
    private static final int[][] matrix = {
        {1,  2,  3,  4,  5},
        {6,  7,  8,  9, 10},
        {11, 12, 13, 14, 15},
        {16, 17, 18, 19, 20}
    };

    public static void main(String[] args) throws InterruptedException {

        // The barrier action runs in the LAST thread to arrive.
        // It aggregates all row-sums into a global total before Phase 2 starts.
        Runnable aggregateRowSums = () -> {
            globalTotal = Arrays.stream(rowSums).sum();
            System.out.printf("%n[BarrierAction] All row sums computed. Global total = %d. Releasing Phase 2.%n%n", globalTotal);
        };

        // A single CyclicBarrier instance coordinates BOTH phases.
        // After it trips once (end of Phase 1), it resets automatically for Phase 2.
        CyclicBarrier phaseBarrier = new CyclicBarrier(ROW_COUNT, aggregateRowSums);

        ExecutorService workerPool = Executors.newFixedThreadPool(ROW_COUNT);

        for (int rowIndex = 0; rowIndex < ROW_COUNT; rowIndex++) {
            workerPool.submit(new RowProcessor(rowIndex, phaseBarrier));
        }

        workerPool.shutdown();
    }

    static class RowProcessor implements Runnable {
        private final int           rowIndex;
        private final CyclicBarrier phaseBarrier;

        RowProcessor(int rowIndex, CyclicBarrier phaseBarrier) {
            this.rowIndex     = rowIndex;
            this.phaseBarrier = phaseBarrier;
        }

        @Override
        public void run() {
            try {
                // ── PHASE 1: Compute this row's sum ──────────────────────────────────
                int sum = 0;
                for (int col = 0; col < COLUMN_COUNT; col++) {
                    sum += matrix[rowIndex][col];
                }
                rowSums[rowIndex] = sum;  // Write result to shared array
                System.out.printf("[Row-%d] Phase 1 done. Row sum = %d. Waiting at barrier.%n", rowIndex, sum);

                // await() decrements the internal count. When the last thread
                // arrives, the barrier action fires, THEN all threads are released.
                phaseBarrier.await();

                // ── PHASE 2: Normalise using the global total ─────────────────────────
                // At this point, globalTotal is guaranteed to be fully populated
                // because the barrier action completed before this line runs.
                double normalisedShare = (double) rowSums[rowIndex] / globalTotal * 100.0;
                System.out.printf("[Row-%d] Phase 2 done. Share of total = %.2f%%%n", rowIndex, normalisedShare);

            } catch (InterruptedException e) {
                Thread.currentThread().interrupt();
                System.err.printf("[Row-%d] Interrupted.%n", rowIndex);
            } catch (BrokenBarrierException e) {
                // Another thread was interrupted or timed out — the barrier is broken.
                // Log and exit cleanly; do NOT proceed with partial data.
                System.err.printf("[Row-%d] Barrier broken — aborting phase processing.%n", rowIndex);
            }
        }
    }
}
Output
[Row-0] Phase 1 done. Row sum = 15. Waiting at barrier.
[Row-1] Phase 1 done. Row sum = 40. Waiting at barrier.
[Row-2] Phase 1 done. Row sum = 65. Waiting at barrier.
[Row-3] Phase 1 done. Row sum = 90. Waiting at barrier.
[BarrierAction] All row sums computed. Global total = 210. Releasing Phase 2.
[Row-3] Phase 2 done. Share of total = 42.86%
[Row-0] Phase 2 done. Share of total = 7.14%
[Row-2] Phase 2 done. Share of total = 30.95%
[Row-1] Phase 2 done. Share of total = 19.05%
Pro Tip: The Barrier Action Runs in the Last-Arriving Thread
If your barrier action throws an unchecked exception, it propagates to the last thread and all other waiting threads receive a BrokenBarrierException. Keep the barrier action lightweight and exception-safe. Do your heavy aggregation work there but wrap it in a try-catch — a crashed barrier action poisons the entire cycle silently if you're not watching.
Production Insight
A barrier action that throws breaks the barrier for everyone.
The last-arriving thread pays the price, but all threads get BrokenBarrierException.
Rule: keep barrier actions simple and wrap them in try-catch to avoid poisoning the cycle.
Key Takeaway
CyclicBarrier resets automatically after each trip.
BrokenBarrierException is active failure detection.
Use the barrier action for phase aggregation, but keep it exception-safe.

Head-to-Head Comparison — Choosing the Right Tool Under Pressure

The single most important question to ask yourself is: 'Is the wait one-directional (waiters depend on workers) or mutual (everyone waits for everyone)?' CountDownLatch is one-directional. CyclicBarrier is mutual.

The second question is: 'Does this pattern repeat?' If threads need to sync once and move on independently, use CountDownLatch. If threads must sync at the end of every phase in a loop, CyclicBarrier's automatic reset is exactly what you need — recreating a CountDownLatch every iteration is wasteful and error-prone.

Performance considerations matter at scale. CountDownLatch.countDown() is a single CAS on an AQS integer — extremely cheap. CyclicBarrier.await() acquires a ReentrantLock, which involves more overhead. For ultra-hot paths with thousands of threads syncing per second, consider Phaser (the more flexible successor to both) which uses a tree-structured internal state to reduce contention. For most application-level coordination (tens of threads, not thousands), both primitives are fast enough that the design clarity matters far more than the performance difference.

Error propagation also differs sharply. A failed countDown() call (e.g., from a crashed thread that never calls it) simply leaves the latch stuck — which is why the timeout overload of await() is non-negotiable in production. CyclicBarrier's broken-barrier state at least actively notifies waiting threads that something went wrong, making it somewhat easier to detect a fault mid-cycle.

io.thecodeforge.concurrent.PhaserMigrationHint.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
package io.thecodeforge.concurrent;

import java.util.concurrent.Phaser;

/**
 * Quick illustration of Phaser as the flexible upgrade path.
 * Unlike CyclicBarrier, Phaser supports dynamic participant registration
 * and per-phase arrival tracking — useful when the number of workers
 * isn't known at construction time.
 *
 * This is NOT a replacement example — it's a pointer for when you've
 * outgrown both CountDownLatch and CyclicBarrier.
 */
public class PhaserMigrationHint {

    public static void main(String[] args) throws InterruptedException {

        // Phaser starts with 1 participant — the main thread (the 'overseer').
        Phaser overseerPhaser = new Phaser(1);

        for (int workerIndex = 0; workerIndex < 3; workerIndex++) {
            final int id = workerIndex;

            // Each worker registers itself dynamically — no fixed count at construction.
            overseerPhaser.register();

            Thread worker = new Thread(() -> {
                System.out.printf("[Worker-%d] Arriving at phase %d%n", id, overseerPhaser.getPhase());

                // arriveAndAwaitAdvance is the CyclicBarrier.await() equivalent.
                // Returns the phase number AFTER the barrier trips.
                overseerPhaser.arriveAndAwaitAdvance();

                System.out.printf("[Worker-%d] Phase advanced. Continuing.%n", id);

                // Deregister when done — reduces participant count for future phases.
                overseerPhaser.arriveAndDeregister();
            });
            worker.start();
        }

        // Main thread arrives — this is the last arrival that trips the phase.
        overseerPhaser.arriveAndDeregister();

        System.out.println("[Main] All workers released. Phaser terminated: " + overseerPhaser.isTerminated());
    }
}
Output
[Worker-0] Arriving at phase 0
[Worker-1] Arriving at phase 0
[Worker-2] Arriving at phase 0
[Worker-0] Phase advanced. Continuing.
[Worker-1] Phase advanced. Continuing.
[Worker-2] Phase advanced. Continuing.
[Main] All workers released. Phaser terminated: true
Interview Gold: Know When to Mention Phaser
Interviewers love it when you voluntarily bring up Phaser after explaining CyclicBarrier. Say: 'If the participant count is dynamic, or if I need per-phase callbacks with richer lifecycle control, I'd reach for Phaser instead.' It signals you actually use these APIs in anger, not just read about them.
Production Insight
Choosing the wrong primitive leads to awkward workarounds or outright bugs.
A CountDownLatch reused in a loop is a code smell; switch to CyclicBarrier or Phaser.
Performance overhead only matters at thousands of threads per second — clarity wins at normal scale.
Key Takeaway
Ask: one-directional or mutual? One-time or repeating?
CountDownLatch for dependencies, CyclicBarrier for phases.
When participant count is dynamic, reach for Phaser.

Production Decision Framework — How to Pick the Right Primitive

You've seen the internals. Now here's a concrete decision tree you can apply in code reviews or on the whiteboard. Start with these three questions:

1. Roles: Are there distinct 'waiters' and 'workers', or does every thread play both roles?** - Distinct roles → CountDownLatch. One thread (or group) waits for others to finish. - All threads equal → CyclicBarrier. Everyone waits for everyone.

2. Repeatability: Will this coordination point be used exactly once, or multiple times?** - Once → CountDownLatch (or create a new one each time, but that's fragile). - Multiple times → CyclicBarrier (auto-reset) or Phaser (if participants change).

3. Failure semantics: What should happen if a worker fails?** - Silent stuck latch? Use CountDownLatch with timeout. - Active failure notification? Use CyclicBarrier — BrokenBarrierException tells all threads.

ScenarioBest ChoiceWhy
Start-up sequencing (wait for N services)CountDownLatchOne-shot, distinct roles
Parallel algorithm with phasesCyclicBarrierReusable, all threads equal
Test coordination (wait for threads to finish)CountDownLatchSimple, one-time
Dynamic worker pool for iterative processingPhaserParticipants can join/leave
Event broadcasting (fire when all ready)CountDownLatchAll waiters unblock simultaneously
Simulation ticks where each tick is a phaseCyclicBarrierAuto-reset, barrier action for aggregation

In production, apply the rule of least surprise: pick the primitive whose name and contract clearly communicate the intent. Your future self — and your colleagues — will thank you.

Mental Model: The Gate vs The Round Table
  • Gate: Opens once. Once open, nothing stops it. Perfect for one-time dependencies.
  • Round table: Everyone sits, the barrier action runs (like a toast), then they get up and the table resets for the next course.
  • Phaser extends the round table: chairs can be added or removed between courses.
Production Insight
I've seen teams force CountDownLatch into iterative patterns by creating a new instance each loop — that's a memory leak waiting to happen.
I've also seen CyclicBarrier misused for startup sequencing where the main thread never participates, causing confusing await() hangs.
Rule: map roles and repeatability first; let API choices follow naturally.
Key Takeaway
Match the primitive to the coordination pattern.
CountDownLatch = one-way gate.
CyclicBarrier = reusable round table.
If participants are dynamic, Phaser is your only real option.

Common Pitfalls and How to Avoid Them

Even experienced developers make these mistakes. Here's what to watch for.

Pitfall 1: Missing countDown() guarantee If your worker code throws an unchecked exception before calling countDown(), the latch never reaches zero. Always wrap the body in try-finally and call countDown() in the finally block. This ensures the latch is decremented even on failure.

Pitfall 2: Forgeting to restore the interrupt flag When you catch InterruptedException, you must call Thread.currentThread().interrupt() to reassert the interrupt. Failure to do so leaves the thread in a state that can't be cancelled, and if that thread is waiting on a CyclicBarrier, it never breaks the barrier — leading to a deadlock.

Pitfall 3: Reusing a CountDownLatch by creating a new one in a loop You create a new CountDownLatch(n) each iteration, but if a reference from a previous iteration is still held by another thread, that latch is exhausted and await() returns immediately. Switch to CyclicBarrier or Phaser if you need reuse.

Pitfall 4: Calling countDown() after the latch has reached zero It's a no-op, but it can mask bugs. For example, if you accidentally call countDown() 5 times on a latch initialised with 3, the extra calls do nothing — but you'll never know a worker was supposed to only run once. Add assertions if you suspect over-counting.

Pitfall 5: Using CyclicBarrier with more threads than the party count If you submit 5 workers but the barrier expects 4, the barrier will never trip because the 5th thread's await() doesn't count? Actually it does – if you submit extra threads that also call await(), they increase the effective party? No, the barrier waits for exactly its party count. If 5 threads call await() on a 4-part barrier, one thread will be left waiting forever. Ensure threads == barrier parties exactly, or use a secondary coordination mechanism.

Pitfall 6: Not handling BrokenBarrierException If you ignore BrokenBarrierException and continue, you risk processing garbage data. Always abort the current phase and restart with a fresh barrier.

A Silent Deadly Mistake: InterruptedException in CyclicBarrier
If a thread receives InterruptedException while inside CyclicBarrier.await(), it breaks the barrier. But if you catch InterruptedException and don't restore the flag, the barrier stays broken and other threads get BrokenBarrierException. Always call Thread.currentThread().interrupt() — otherwise you mask the interruption and the barrier recovery is incomplete.
Production Insight
In one production incident, a cached thread pool caused a CyclicBarrier to never trip because an extra thread from a previous task was still in the pool and called await() from an abandoned phase.
Always align thread count with barrier parties, and use Phaser if the pool size is dynamic.
Rule: keep coordination simple; test with thread dumps to verify no one is left waiting.
Key Takeaway
CountDownLatch: finally block for countDown().
CyclicBarrier: never ignore BrokenBarrierException.
Both: always use timeout on await().

Why CountDownLatch Cares About Tasks, Not Threads

That distinction kills more junior engineers than null pointers. CountDownLatch tracks a counter you decrement. It has zero interest in who does the decrementing. One thread can call countDown() five times. Five threads can each call it once. The latch doesn't care. It only watches that counter hit zero. This matters in production because you might have a thread pool of three workers needing to complete eight pre-flight checks. With CountDownLatch, you set the initial count to eight, each check calls countDown(), and your coordinator thread waits on await(). The pool size is irrelevant. CyclicBarrier would force you to match thread count to barrier count, which is the wrong abstraction when you're tracking units of work, not thread rendezvous. That's the root cause I've debugged at 2 AM: someone treated the barrier count like a task counter and wondered why their pipeline deadlocked. Don't be that engineer.

PreflightCoordinator.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
// io.thecodeforge
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.Executors;

public class PreflightCoordinator {
    public void run() throws InterruptedException {
        var latch = new CountDownLatch(8); // eight checks, not eight threads
        var executor = Executors.newFixedThreadPool(3);

        for (int i = 0; i < 8; i++) {
            executor.submit(() -> {
                try { Thread.sleep(200); } catch (InterruptedException e) {}
                latch.countDown(); // each task calls countDown
            });
        }
        latch.await(); // coordinator waits for all eight
        System.out.println("All preflight checks passed. Launching.");
        executor.shutdown();
    }
}
Output
All preflight checks passed. Launching.
Production Trap:
Never let thread pool size dictate your latch count. Treat CountDownLatch as a counter of pending tasks, not pending threads. Mix them up and you'll block forever on await().
Key Takeaway
CountDownLatch counts tasks. Thread identity is irrelevant. Set the initial count to the number of work items, not the number of workers.

Reusability Is Where CyclicBarrier Earns Its Paycheck

CountDownLatch is a one-shot. Once you hit zero, it's a corpse. You cannot reset it. CyclicBarrier resets implicitly when all parties trip the barrier, or explicitly via reset(). That reusability defines when you reach for it. Think phased computations: map stage, then reduce stage, then output stage. Each phase needs all threads to sync before the next. You create one CyclicBarrier with your phase count, call await() at the end of each phase, and optionally run a barrier action (like shuffling data) between phases. The barrier handles reset automatically. In production, I've used this for partitioned cache refresh jobs: each partition loads fresh data, threads rendezvous, then the barrier action publishes the combined update. Without CyclicBarrier, you'd be wiring CountDownLatch phantoms, resetting them manually, and praying you don't leak a reference. That's fragile. CyclicBarrier is built for that rhythm.

PhasedCacheRefresh.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
// io.thecodeforge
import java.util.concurrent.BrokenBarrierException;
import java.util.concurrent.CyclicBarrier;

public class PhasedCacheRefresh {
    private static final int PARTITIONS = 4;
    private final CyclicBarrier barrier;
    private String[] partitionData = new String[PARTITIONS];

    public PhasedCacheRefresh() {
        barrier = new CyclicBarrier(PARTITIONS, () -> {
            System.out.println("Merging partition data across all threads");
            // barrier action: publish combined cache update
        });
    }

    public void refreshPartition(int partitionId) {
        partitionData[partitionId] = "fresh-data-" + partitionId;
        try {
            barrier.await(); // sync point
            // after barrier, all partitions have fresh data
        } catch (InterruptedException | BrokenBarrierException e) {
            Thread.currentThread().interrupt();
        }
    }
}
Output
Merging partition data across all threads
Design Principle:
CyclicBarrier shines in phased workflows where all threads must synchronize repeatedly. If you only need one synchronization point, use CountDownLatch. Don't force reusability where you don't need it.
Key Takeaway
Reach for CyclicBarrier when your algorithm has multiple synchronization phases. For a single barrier, CountDownLatch is simpler and safer.
● Production incidentPOST-MORTEMseverity: high

Startup Hang: Missing countDown() After Worker Crash

Symptom
Service hangs during startup. Health checks fail, deployments stall. No error logged – just a thread stuck in CountDownLatch.await().
Assumption
All workers will always complete successfully. countDown() calls are lightweight and never fail.
Root cause
A worker thread threw an unchecked NullPointerException before reaching countDown(). The main thread used await() with no timeout. Latch count stayed at 1 forever.
Fix
Always use await(long, TimeUnit) with a reasonable timeout. Wrap worker code in try-finally to guarantee countDown() even on failure. Log and handle the false return from await().
Key lesson
  • Never call await() without a timeout in production.
  • Put countDown() in a finally block – every time.
  • If a worker crashes before decrementing, your latch becomes a deadlock trap.
Production debug guideSymptom → Action guide for the three most common production failures3 entries
Symptom · 01
Thread stuck in CountDownLatch.await()
Fix
Take a thread dump (jstack <pid>). Look for threads blocked in LockSupport.park. Check latch count via debugger or add count logging. Verify all workers have called countDown() or crashed.
Symptom · 02
CyclicBarrier throws BrokenBarrierException repeatedly
Fix
Check which thread timed out or was interrupted. The barrier is broken permanently; you must create a new instance. Identify the root cause – network timeout, resource starvation, or a bug in one worker.
Symptom · 03
await() returns immediately after first use on a reused barrier
Fix
The barrier may have already tripped. Call getNumberWaiting() to confirm. If you expect a fresh cycle, ensure all threads have completed the previous cycle and called await() again. Broken barriers require a new instance.
★ Quick Debug Cheat Sheet: Thread Synchronisation IssuesUse these commands and checks when you suspect a CountDownLatch or CyclicBarrier problem in production
Service startup hangs
Immediate action
Take thread dump and search for CountDownLatch.await
Commands
jstack $(pgrep -f 'your-app') | grep -A 20 'CountDownLatch.await'
Check latch count: add debug logging or attach with jcmd to inspect object
Fix now
Restart main thread with timeout; ensure all workers use try-finally for countDown()
Iterative algorithm fails mid-phase+
Immediate action
Check for BrokenBarrierException in logs
Commands
grep -r 'BrokenBarrierException' /var/log/app/ | tail -50
jstack <pid> | grep -A 15 'CyclicBarrier.doWait'
Fix now
Restart the phase with a fresh CyclicBarrier instance; add timeout to await() to prevent indefinite hang
Phaser participants never advance+
Immediate action
Check if a worker never arrives
Commands
jstack <pid> | grep -A 10 'Phaser'
Verify registered parties: add Phaser.getRegisteredParties() log
Fix now
Ensure all workers call arrive() or arriveAndDeregister(); use timeout in awaitAdvance()
CountDownLatch vs CyclicBarrier — Side-by-Side
Feature / AspectCountDownLatchCyclicBarrier
Reusable after trippingNo — single use onlyYes — resets automatically each cycle
Who waitsOne or more designated waiter threadsAll participant threads wait for each other
Internal synchroniserAQS (compareAndSet on state integer)ReentrantLock + Condition + generation object
Barrier/trip actionNot supportedOptional Runnable runs in last-arriving thread
Failed-thread behaviourLatch stays stuck (timeout is your safety net)Barrier enters broken state; BrokenBarrierException thrown to all
Dynamic participant countNot supportedNot supported (use Phaser instead)
Primary use caseStart-up sequencing, one-time event signallingIterative phase synchronisation, parallel algorithms
Thread rolesWaiters vs. workers (distinct roles)All threads are both workers and waiters
Performance overheadVery low — single CAS per countDown()Higher — ReentrantLock acquisition per await()
Available sinceJava 5 (java.util.concurrent)Java 5 (java.util.concurrent)

Key takeaways

1
CountDownLatch is a one-shot gate
once the count hits zero it stays open forever and cannot be reset — design your code around this or pick a different tool.
2
CyclicBarrier's broken-barrier state is a feature
it actively poisons all waiting threads when one fails, preventing silent partial-phase execution that would corrupt iterative results.
3
The barrier action in CyclicBarrier runs in the last-arriving thread before any waiting threads are released
exploit this for zero-overhead phase aggregation without an extra synchronisation step.
4
When participant count is unknown at construction time, or you need tiered phase callbacks, Phaser is the right upgrade path
mentioning it unprompted in an interview signals genuine production experience.
5
Always use timeout-based await() variants in production
the one-shot nature of CountDownLatch makes infinite blocking dangerous, and CyclicBarrier's timeout prevents indefinite waits on broken barriers.

Common mistakes to avoid

5 patterns
×

Not guaranteeing countDown() in a finally block

Symptom
A crashed worker thread never calls countDown(). The latch never reaches zero, and the awaiting main thread hangs indefinitely (if no timeout) or times out (if timeout used but wasted).
Fix
Always wrap the body of the worker Runnable in try-finally, placing latch.countDown() in the finally block. Even if the worker throws an unchecked exception, the latch is decremented.
×

Catching InterruptedException without restoring the interrupt flag

Symptom
If InterruptedException is caught but not reasserted, the thread is not marked as interrupted. Future blocking calls won't respond to cancellation. In CyclicBarrier, the barrier breaks but the thread doesn't propagate the interrupt; other threads may not get BrokenBarrierException correctly.
Fix
Always call Thread.currentThread().interrupt() in the catch block for InterruptedException. For CountDownLatch, still call countDown() to unblock others. For CyclicBarrier, let await() throw BrokenBarrierException.
×

Reusing a CountDownLatch by allocating a new one in a loop

Symptom
Each iteration creates a fresh latch, but if any thread from a previous iteration still holds a reference, that old latch is already at zero. The new latch may not be used by all participants, causing premature completion or hang.
Fix
If you need a reusable countdown mechanism, switch to CyclicBarrier or Phaser. If you must use CountDownLatch, ensure no stale references exist before creating a new instance.
×

Calling CyclicBarrier.await() without handling BrokenBarrierException

Symptom
If a barrier breaks (due to timeout or interrupt), subsequent await() calls throw BrokenBarrierException. If the code catches Exception generically or ignores it, the phase may continue with partial data.
Fix
Always catch BrokenBarrierException separately and abort the current computation. Do not proceed with the next phase. Log the error and create a new CyclicBarrier if needed.
×

Using CyclicBarrier with a thread pool that has dynamic size

Symptom
If the number of threads calling await() varies (e.g., cached thread pool), the barrier may never trip because the party count is fixed. Extra threads calling await() exceed the intended count, or too few threads arrive.
Fix
Use Phaser instead of CyclicBarrier when the participant count is dynamic. Alternatively, ensure your thread pool size exactly matches the barrier party count and no extra threads call await().
INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01SENIOR
Can you explain the difference between CountDownLatch and CyclicBarrier,...
Q02SENIOR
What happens to a CyclicBarrier if one of the waiting threads is interru...
Q03SENIOR
If CountDownLatch.countDown() is called more times than the initial coun...
Q04SENIOR
Explain the generation mechanism inside CyclicBarrier. Why is it importa...
Q05JUNIOR
How would you implement a reusable countdown latch using CyclicBarrier? ...
Q01 of 05SENIOR

Can you explain the difference between CountDownLatch and CyclicBarrier, and give a concrete production scenario where you'd pick one over the other?

ANSWER
CountDownLatch is a one-shot gate: one or more threads wait until a count reaches zero. CyclicBarrier is a reusable meeting point where a fixed number of threads wait for each other, then reset. In production, I'd use CountDownLatch for service startup sequencing — wait for database, cache, and message broker to signal readiness before opening HTTP traffic. I'd use CyclicBarrier for a parallel image processing pipeline where each processor must finish filtering before the next stage begins, and this repeats for every batch.
FAQ · 5 QUESTIONS

Frequently Asked Questions

01
Can a CountDownLatch count back up after reaching zero?
02
What is the BrokenBarrierException in Java and when does it get thrown?
03
Is it safe to call CountDownLatch.countDown() from multiple threads at the same time?
04
Can CyclicBarrier be reused after a BrokenBarrierException?
05
What is the difference between CyclicBarrier and Phaser?
N
Naren Founder & Principal Engineer

20+ years shipping production Java in banking & fintech. Drawn from code that ran under real load.

Follow
Verified
production tested
May 24, 2026
last updated
1,554
articles · all by Naren
🔥

That's Multithreading. Mark it forged?

9 min read · try the examples if you haven't

Previous
Atomic Classes in Java
10 / 10 · Multithreading
Next
File Handling in Java