Junior 11 min · March 06, 2026

Cuckoo Hashing — Correlated Hash Functions and Rehash Loops

Correlated hash functions caused 12-second rehash loops at 55% load.

N
Naren · Founder
Plain-English first. Then code. Then the interview question.
About
 ● Production Incident 🔎 Debug Guide
Quick Answer
  • Cuckoo hashing uses two hash tables, each key has exactly two candidate slots
  • Lookups always check at most two locations — hard worst-case O(1), not amortized, not expected
  • Insertions use an eviction chain: kick out the occupant, move them to their alternate slot
  • If the eviction chain loops back to the starting key, a cycle is detected and the table must be resized and rehashed with new hash seeds
  • Max load factor ~50% per table in theory, 45% in production — exceed it and insertion failure probability rises exponentially
  • Biggest mistake: assuming insertion is always O(1) — it degrades to O(n) in the worst case and falls apart sharply past load factor thresholds
  • Hash function independence is not optional — correlated functions collapse both tables into one and cause premature cycles
Plain-English First

Imagine a hotel with two buildings, and every guest has one reserved room in each building. When a new guest arrives and both their rooms are taken, they bump one occupant out, who then moves to their room in the other building. That evicted guest might bump someone else out, creating a chain reaction until everyone has a room or the hotel manager calls time and opens a new wing.

Cuckoo hashing works exactly like this. Every key has two possible home slots, one in each table. Lookup is trivially fast — check room A, check room B, done. Insertion is where the work happens: evict, relocate, repeat. The payoff is that no matter how full the hotel gets (within reason), finding any guest never requires more than two stops. That hard guarantee is why latency-critical systems reach for cuckoo hashing when other approaches fall short.

Hash tables are the backbone of modern software — they power database indexes, caches, compilers, and networking stacks. But the classic open-addressing and chaining approaches carry a hidden cost: their lookup time degrades under load. A bucket with five chained nodes requires five pointer dereferences. A probing table at 80% occupancy can force dozens of slot comparisons per lookup. For most applications that cost is tolerable. For latency-sensitive systems — packet classification in network routers, real-time game servers, CPU-level cache simulators — tolerable is not a design target.

Cuckoo hashing, introduced by Rasmus Pagh and Flemming Friche Rodler in 2001, attacks this problem with a radical guarantee: lookups always examine at most two memory locations. Worst-case O(1) — not amortized, not expected, but hard. It achieves this by giving every key two candidate slots across two separate hash tables and using an eviction-and-relocation protocol during insertion to ensure no slot ever holds more than one key.

That guarantee comes with strings attached. Insertion is unpredictable and can fail entirely when eviction chains cycle back on themselves. The load factor ceiling is roughly half what chained hashing tolerates. Hash function independence is not a nice-to-have — it is the mechanism the whole structure depends on. And if you deploy this without a max-iteration guard on the eviction chain, you have introduced an infinite loop into your production system.

By the end of this article you will understand exactly how the two-table eviction cycle works, why infinite loops occur and how to detect them before they happen, how to implement a working cuckoo hash map in Java with cycle detection and graceful rehashing, what load factor limits apply and why exceeding them causes exponential rehash overhead, and when cuckoo hashing genuinely beats its alternatives in production. The interview questions at the end go into the nuanced territory that separates candidates who have read about this structure from those who have actually operated it under load.

Core Mechanism: Two Tables, Two Hash Functions

The structure is built on one invariant: no slot in either table ever holds more than one key. Cuckoo hashing maintains two hash tables, T1 and T2, each of size n. Every key k has exactly two candidate positions — h1(k) in T1 and h2(k) in T2. That invariant is what makes lookup trivially fast.

To look up a key, compute h1(k) and h2(k) and check both slots. Two memory accesses. No probing, no chaining, no traversal, no variable-length search. If the key is in the table, it is in one of those two locations. If it is in neither, it is not in the table. That is the entire lookup algorithm.

Insertion is where the complexity lives. To insert key k, first try T1[h1(k)]. If the slot is empty, place k there and finish. If the slot is occupied by some existing key k', evict k' and place k in its slot. k' now needs a home — its only remaining option is T2[h2(k')]. If that slot is also occupied, evict that occupant and continue. The chain zigzags between T1 and T2: each evicted key has exactly one alternate table to move to, and within that table exactly one candidate slot. The chain terminates when an empty slot is found or when it has gone on long enough to indicate a cycle.

The asymmetry between lookup and insertion is intentional. You pay a variable insertion cost — sometimes one step, sometimes dozens — in exchange for a guaranteed, constant, predictable lookup cost. For workloads where lookups vastly outnumber inserts, that is a trade you want to make. For write-heavy workloads it is the wrong tool entirely.

io/thecodeforge/hashing/CuckooHashMap.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
package io.thecodeforge.hashing;

import java.util.Objects;

/**
 * Production-grade cuckoo hash map with cycle detection and auto-resize.
 * Two independent hash functions over two tables.
 *
 * Design constraints:
 *  - MAX_EVICTION_CHAIN bounds insertion to prevent infinite loops
 *  - LOAD_FACTOR_THRESHOLD triggers proactive resize before cycle probability climbs
 *  - hash1/hash2 use different mixing constants — verify independence before deployment
 */
public class CuckooHashMap<K, V> {

    private static final int    MAX_EVICTION_CHAIN     = 512;
    private static final double LOAD_FACTOR_THRESHOLD  = 0.45;
    private static final int    DEFAULT_INITIAL_CAPACITY = 16;

    // Package-private so CuckooResizer can read entries directly
    Entry<K, V>[] table1;
    Entry<K, V>[] table2;
    int capacity;
    private int size;

    static class Entry<K, V> {
        final K key;
        V value;

        Entry(K key, V value) {
            this.key   = key;
            this.value = value;
        }
    }

    @SuppressWarnings("unchecked")
    public CuckooHashMap(int initialCapacity) {
        this.capacity = nextPowerOfTwo(initialCapacity);
        this.table1   = new Entry[this.capacity];
        this.table2   = new Entry[this.capacity];
        this.size     = 0;
    }

    public CuckooHashMap() {
        this(DEFAULT_INITIAL_CAPACITY);
    }

    /**
     * Hash function 1 — finalizer mixing from MurmurHash3.
     * The >>> 16 XOR fold reduces slot clustering in the low bits.
     */
    private int hash1(K key) {
        int h = key.hashCode();
        h ^= (h >>> 16);
        h *= 0x85ebca6b;
        h ^= (h >>> 13);
        return h & (capacity - 1);
    }

    /**
     * Hash function 2 — different multiplier constant for independence.
     * Uses the golden-ratio-derived constant 0x9e3779b9.
     * Never derive h2 as h1 + offset — that is maximally correlated.
     */
    private int hash2(K key) {
        int h = key.hashCode() * 0x9e3779b9;
        h ^= (h >>> 16);
        h *= 0xc2b2ae35;
        h ^= (h >>> 16);
        return h & (capacity - 1);
    }

    /**
     * O(1) worst-case lookup — exactly two memory accesses.
     * Returns null if the key is absent.
     */
    public V get(K key) {
        Objects.requireNonNull(key, "null keys are not supported");
        int idx1 = hash1(key);
        if (table1[idx1] != null && table1[idx1].key.equals(key)) {
            return table1[idx1].value;
        }
        int idx2 = hash2(key);
        if (table2[idx2] != null && table2[idx2].key.equals(key)) {
            return table2[idx2].value;
        }
        return null; // key not present — checked both candidate slots
    }

    /**
     * Insert or update a key-value pair.
     *
     * Returns true on success.
     * Returns false when the eviction chain exceeds MAX_EVICTION_CHAIN —
     * the caller must trigger a resize-and-rehash before retrying.
     *
     * Never silently drops keys — false is an explicit signal, not an ignored error.
     */
    public boolean put(K key, V value) {
        Objects.requireNonNull(key, "null keys are not supported");

        // Update in place if the key already exists — avoids growing eviction chains
        int idx1 = hash1(key);
        if (table1[idx1] != null && table1[idx1].key.equals(key)) {
            table1[idx1].value = value;
            return true;
        }
        int idx2 = hash2(key);
        if (table2[idx2] != null && table2[idx2].key.equals(key)) {
            table2[idx2].value = value;
            return true;
        }

        // Begin eviction chain — alternate between T1 and T2
        Entry<K, V> toInsert  = new Entry<>(key, value);
        boolean     useTable1 = true;

        for (int i = 0; i < MAX_EVICTION_CHAIN; i++) {
            if (useTable1) {
                int idx = hash1(toInsert.key);
                if (table1[idx] == null) {
                    table1[idx] = toInsert;
                    size++;
                    return true;
                }
                // Evict occupant; it will try T2 on the next iteration
                Entry<K, V> evicted = table1[idx];
                table1[idx] = toInsert;
                toInsert    = evicted;
                useTable1   = false;
            } else {
                int idx = hash2(toInsert.key);
                if (table2[idx] == null) {
                    table2[idx] = toInsert;
                    size++;
                    return true;
                }
                Entry<K, V> evicted = table2[idx];
                table2[idx] = toInsert;
                toInsert    = evicted;
                useTable1   = true;
            }
        }

        // Chain limit exceeded — cycle detected or near-cycle.
        // Caller must call CuckooResizer.resize(this) and retry the insertion.
        return false;
    }

    /**
     * Deletion is clean — no tombstones required.
     * A key exists in exactly one of two known slots.
     */
    public boolean remove(K key) {
        Objects.requireNonNull(key);
        int idx1 = hash1(key);
        if (table1[idx1] != null && table1[idx1].key.equals(key)) {
            table1[idx1] = null;
            size--;
            return true;
        }
        int idx2 = hash2(key);
        if (table2[idx2] != null && table2[idx2].key.equals(key)) {
            table2[idx2] = null;
            size--;
            return true;
        }
        return false;
    }

    /** Combined load across both tables — useful for trend monitoring */
    public double loadFactor() {
        return (double) size / (2.0 * capacity);
    }

    /**
     * Per-table load factor — the metric that actually matters.
     * Returns the higher of the two individual table occupancies.
     * Alert when this exceeds 0.45; resize when it exceeds 0.50.
     */
    public double perTableLoadFactor() {
        int t1Count = 0, t2Count = 0;
        for (Entry<K, V> e : table1) if (e != null) t1Count++;
        for (Entry<K, V> e : table2) if (e != null) t2Count++;
        return Math.max(
            (double) t1Count / capacity,
            (double) t2Count / capacity
        );
    }

    /** Returns false when per-table load exceeds the production threshold */
    public boolean isHealthy() {
        return perTableLoadFactor() < LOAD_FACTOR_THRESHOLD;
    }

    public int size() { return size; }

    private static int nextPowerOfTwo(int n) {
        if (n <= 1) return 1;
        n--;
        n |= n >>> 1;
        n |= n >>> 2;
        n |= n >>> 4;
        n |= n >>> 8;
        n |= n >>> 16;
        return n + 1;
    }
}
The Hotel Analogy
  • Two tables equal two buildings; each key has one room in each — that assignment is fixed by the hash functions
  • Lookup checks room A then room B — two memory accesses and you are done, regardless of how full the hotel is
  • Insertion bumps the occupant of your room, who moves to their alternate room, who might bump someone else
  • The eviction chain is the domino sequence of bumped guests finding alternate rooms
  • A cycle means guest A bumps B who bumps C who bumps A — infinite without a chain length limit
  • Resize opens a new wing and reassigns all rooms under a new numbering scheme (new hash seeds)
Production Insight
The two-table invariant means every lookup is exactly two memory accesses — and those two accesses are to addresses you can compute before touching memory, which means hardware prefetchers can help you. In CPU-bound systems that predictability matters more than average-case speed. When your latency budget is under 100ns per lookup and you cannot afford the occasional 20-slot probe chain that open addressing produces under load, cuckoo hashing is the right call.
The thing that catches teams off guard is the false return from put(). Most hash table APIs throw on failure or silently succeed. A false here means the eviction chain hit the ceiling and nothing was inserted — not a slow insert, not a degraded insert, a dropped insert. Your calling code must handle this explicitly and trigger a resize before retrying. If it does not, you have silent data loss.
Key Takeaway
Two tables, two hash functions, at most two memory accesses per lookup — that is the entire lookup algorithm.
The eviction chain during insertion is the price you pay for that guarantee: variable cost on write, fixed cost on read.
The put() method returning false is not an error condition to log and ignore — it is a signal that the table needs to grow before you retry. Handle it explicitly.
When to Use Cuckoo Hashing
IfLatency-sensitive lookup, insert-heavy workload is rare or can be batched offline
UseCuckoo hashing is the right choice — guaranteed O(1) lookup at any load below the threshold, two fixed memory addresses per lookup enables hardware prefetching
IfInsert-heavy workload, lookups are infrequent or latency-tolerant
UseUse open addressing with linear probing — insertion is simpler, predictable, and does not require eviction chain management
IfMemory constrained and cannot afford the overhead of two equally-sized tables
UseUse Robin Hood hashing — single table, variance-reduced probe lengths, no second table allocation
IfNeed concurrent lock-free reads without complex synchronization
UseCuckoo hashing is a strong fit — two fixed memory locations per key, no pointer chasing, atomic reads are sufficient for lookups
IfLoad factor must exceed 75% to be memory-efficient
UseCuckoo hashing fails here — use chained hashing or consider a two-level structure; cuckoo's 45% ceiling is non-negotiable

Cycle Detection and the Infinite Loop Problem

The fundamental failure mode of cuckoo hashing is the eviction cycle. During insertion, the chain zigzags between T1 and T2: evict from T1, try T2, evict from T2, try T1, and so on. If this chain revisits a slot that was already modified earlier in the same insertion, the displaced key has nowhere to go that it has not already displaced someone from. You have a cycle, and without a guard the loop runs forever.

This is not a theoretical edge case. The probability of encountering a cycle is a function of load factor, and it rises steeply. At 40% per-table occupancy the risk is negligible — a properly seeded implementation can run millions of insertions without seeing one. At 50% the expected failure rate per insertion is around 1%. At 55% it climbs toward 10%. At 60% you should expect failure on nearly every insertion attempt. The relationship is exponential, not linear, which is why the 45% production ceiling feels conservative but genuinely matters.

The adversarial key distribution problem compounds this. A 2004 analysis by Devroye and Morin established the theoretical load limit, but that analysis assumes random hash function behaviour. When real key sets interact with fixed hash functions, certain key patterns produce consistently long eviction chains at any load factor. This is what the production incident above captured: a traffic burst introduced a pattern of source IPs that happened to hash to overlapping slot pairs under both functions. The functions were correlated, which made the overlap worse, but even with independent functions an adversarial enough key set can trigger cycles at 35% load.

The practical response is layered. First, bound the chain: if the eviction chain exceeds MAX_EVICTION_CHAIN iterations, abort the insertion and return false — the calling code triggers a resize. Second, use new hash seeds on every rehash — the same functions on a larger table will reproduce the same problematic slot assignments for the same key set. Third, monitor insertion failure rate as a first-class metric alongside load factor. Load factor tells you where you are on the probability curve; insertion failure rate tells you whether the current key distribution is adversarial relative to your hash functions.

Two approaches to cycle detection exist in practice. The iteration count approach is simpler and lower overhead: if the chain runs longer than MAX_EVICTION_CHAIN steps, declare failure. The visited-set approach is more precise: track every (table, slot) pair visited during the chain and stop immediately when a pair repeats. The visited-set detects true cycles early and can terminate in fewer than MAX_EVICTION_CHAIN steps, but it allocates memory per insertion attempt and adds overhead to every step. For most production systems the iteration count is sufficient — the overhead of the visited-set only pays off when adversarial key distributions are a hard requirement.

io/thecodeforge/hashing/CycleDetector.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
package io.thecodeforge.hashing;

import java.util.HashSet;
import java.util.Set;

/**
 * Explicit cycle detection using a visited-slot fingerprint set.
 *
 * More precise than iteration count alone — terminates immediately
 * on true cycle detection rather than running to MAX_EVICTION_CHAIN.
 * Trade-off: allocates a HashSet per insertion attempt.
 *
 * Use this when key distributions are known to be adversarial
 * and early termination saves more than the allocation costs.
 * For most workloads, the iteration-count guard in CuckooHashMap
 * is sufficient and cheaper.
 */
public class CycleDetector<K> {

    private final Set<Integer> visitedSlots;
    private final int          maxChainLength;

    public CycleDetector(int maxChainLength) {
        this.maxChainLength = maxChainLength;
        this.visitedSlots   = new HashSet<>();
    }

    /**
     * Returns true if visiting this (table, slot) pair would complete a cycle.
     *
     * Encodes the pair as a single int fingerprint:
     *   - high byte = tableIndex (0 or 1)
     *   - low 24 bits = slotIndex (supports tables up to 16M slots)
     *
     * Call this before executing each eviction step.
     */
    public boolean wouldCycle(int tableIndex, int slotIndex) {
        int fingerprint = (tableIndex << 24) | (slotIndex & 0x00FFFFFF);
        return !visitedSlots.add(fingerprint); // add returns false if already present
    }

    /**
     * Returns true if the chain has exceeded the configured maximum.
     * Check this alongside wouldCycle — whichever triggers first aborts.
     */
    public boolean exceedsMaxLength(int currentLength) {
        return currentLength >= maxChainLength;
    }

    /**
     * Reset between insertion attempts.
     * Reuse the detector instance across attempts to avoid repeated allocation.
     */
    public void reset() {
        visitedSlots.clear();
    }

    /**
     * Number of unique slots visited in the current chain.
     * Useful for diagnostics — log this on insertion failure.
     */
    public int visitedCount() {
        return visitedSlots.size();
    }
}
Production Rule: Always Bound the Eviction Chain
An unbounded eviction chain is an infinite loop written in slow motion. It will not crash with an exception — it will simply spin, consuming one CPU core at 100% while holding whatever lock protects the insertion path. The service appears alive (heartbeats still fire) but inserts are silently stalled. Set MAX_EVICTION_CHAIN to 512. When exceeded, return false and trigger a resize with new hash seeds. This is not optional defensive coding — it is the correct termination condition for a cuckoo insertion that cannot succeed with the current table state.
Production Insight
The relationship between load factor and cycle probability is not a smooth curve — it has a knee. Below 45% per table the risk is low enough to ignore in most key distributions. Above 50% it becomes the dominant operational concern. The numbers from the literature: at 50% load, roughly 1% of insertions encounter a cycle. At 55%, roughly 10%. At 60%, near-certain failure on adversarial key sets.
What the literature does not always emphasize: the 1% at 50% sounds manageable until you are inserting 10 million keys per second in a packet classifier. At that rate, 1% failure means 100,000 triggered rehashes per second — each one O(n), each one holding a write lock. The 45% ceiling exists precisely to stay on the flat part of the probability curve where the occasional rehash is an infrequent operational event, not a continuous performance tax.
Key Takeaway
Cycles are not edge cases — they are the defined failure mode above 50% per-table load, and they arrive exponentially, not gradually.
The iteration count guard (MAX_EVICTION_CHAIN = 512) is the minimum viable safety net for any production deployment.
Rehashing must use new hash seeds. Same functions on a larger table reproduce the same adversarial mappings and fail immediately.

Hash Function Selection and Independence

Everything in cuckoo hashing depends on one property of the two hash functions: they must distribute keys independently. If h1 and h2 are correlated — and the most common way this happens is someone writes h2(k) = h1(k) + constant because it looks simple and distinct — then the two candidate slots for many keys will cluster in the same region of the table. The structure now behaves like a single table with twice the collision rate. Cycles form at load factors you would consider safe for open addressing.

Independence in hash functions is a stronger requirement than uniformity. A function can distribute keys uniformly across all slots and still be correlated with another function that does the same. What matters is that knowing h1(k) tells you nothing useful about h2(k). In practice you achieve this either by using genuinely different algorithms (SipHash versus MurmurHash3) or by using the same algorithm with seeds chosen to produce statistically independent outputs — and then verifying empirically before deployment.

MurmurHash3's 128-bit output variant is a natural fit here. It produces two independent 64-bit values in a single pass. Use the low 64 bits for h1 and the high 64 bits for h2. The two halves are designed to be independent by construction, and you get both values from one computation — no extra cost per lookup or insertion.

SipHash with different 128-bit key pairs is the stronger choice when the key space may be adversarially controlled — for example, if your cuckoo hash table is part of a network-facing service where an attacker could craft keys to maximize collisions. SipHash's resistance to hash flooding comes from the key material; changing the key pair changes the entire output distribution.

Tabulation hashing is worth knowing about for academic completeness: it achieves strong theoretical independence guarantees and is extremely fast (array lookups only), but it requires per-byte random tables that consume memory proportional to the key size and are awkward to implement correctly for variable-length keys.

The practical check: after choosing your two functions, run estimateIndependence() against a representative sample of your key set. The overlap rate — the fraction of keys for which h1(k) == h2(k) after modulo — should be approximately 1/capacity. If it is significantly higher, your functions are correlated. This test takes seconds and has caught production issues that would otherwise only appear under load.

io/thecodeforge/hashing/HashFunctions.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
package io.thecodeforge.hashing;

import java.util.Set;

/**
 * Two independent hash functions for cuckoo hashing.
 *
 * Both use Murmur3-style 64-bit finalizer mixing with different
 * seed constants. The seeds are chosen to be statistically independent
 * — they are not offsets of each other.
 *
 * For adversarial key environments (network-facing services), replace
 * these with SipHash using separate 128-bit key pairs per function.
 *
 * Before deploying to production, call estimateIndependence() with a
 * representative key sample and verify overlap < 2.0 / capacity.
 */
public class HashFunctions {

    // Two distinct constants derived from known low-discrepancy sequences
    // Do NOT derive SEED2 as SEED1 + offset — that introduces correlation
    private static final long SEED1 = 0x517cc1b727220a95L;
    private static final long SEED2 = 0x9e3779b97f4a7c15L;

    /** Hash function 1Murmur3 64-bit finalizer with SEED1 */
    public static int hash1(Object key, int capacity) {
        long h = (long) key.hashCode() ^ SEED1;
        h ^= h >>> 33;
        h *= 0xff51afd7ed558ccdL;
        h ^= h >>> 33;
        h *= 0xc4ceb9fe1a85ec53L;
        h ^= h >>> 33;
        return (int) (h & (capacity - 1));
    }

    /** Hash function 2 — same finalizer structure, independent seed */
    public static int hash2(Object key, int capacity) {
        long h = (long) key.hashCode() ^ SEED2;
        h ^= h >>> 33;
        h *= 0xff51afd7ed558ccdL;
        h ^= h >>> 33;
        h *= 0xc4ceb9fe1a85ec53L;
        h ^= h >>> 33;
        return (int) (h & (capacity - 1));
    }

    /**
     * Empirical independence check.
     *
     * For a uniformly random key set, the expected fraction of keys
     * where hash1 == hash2 (after modulo) is 1/capacity.
     *
     * A result significantly above 2/capacity indicates correlation
     * and predicts premature cycle formation under load.
     *
     * Run this with at least 10,000 representative keys before
     * deploying a new hash function pair to production.
     *
     * @return overlap fraction — target < 2.0 / capacity
     */
    public static double estimateIndependence(Set<Object> keys, int capacity) {
        if (keys.isEmpty()) {
            throw new IllegalArgumentException(
                "Key sample must be non-empty — use at least 10,000 representative keys"
            );
        }
        long overlapCount = 0;
        for (Object key : keys) {
            if (hash1(key, capacity) == hash2(key, capacity)) {
                overlapCount++;
            }
        }
        return (double) overlapCount / keys.size();
    }
}
Why Hash Independence Matters
  • Independent hashes distribute keys uniformly and separately across both tables — keys end up in different regions of each table
  • Correlated hashes cluster both candidate slots for many keys into the same region — eviction chains immediately run into each other
  • The failure mode is silent at low load and catastrophic at moderate load — cycles appear at 30% where independent hashes would be stable to 45%
  • Verify with estimateIndependence() — if overlap exceeds 2/capacity, your functions are correlated and must be changed before deployment
  • Never derive h2 as h1 + constant, h1 * constant, or any linear transform of h1 — these are all maximally correlated
Production Insight
We shipped a cuckoo hash table for a routing lookup service with h2 defined as h1 XOR 0x5f3759df — a constant that looked arbitrary and therefore independent. It was not. At 28% per-table load, insertion failure rate was running at 8%. We spent two days convinced the load factor monitoring was broken before running estimateIndependence() and seeing a 4x overlap rate. Switching to genuinely different seeds dropped the failure rate to under 0.05% at the same load.
The lesson is that hash function independence is not something you can assess by reading the code. It requires an empirical test against your actual key distribution. Run it. The test is ten lines of code and takes milliseconds. The alternative is debugging insertion failures in production under traffic.
Key Takeaway
Hash function independence is the foundation cuckoo hashing is built on — without it, the two-table model provides no benefit over a single table.
Correlated functions cause cycles at any load factor; the failure appears gradually then catastrophically as load climbs.
Run estimateIndependence() against real key samples before deployment. Aim for overlap below 2/capacity. This is a 30-second test that prevents multi-day debugging sessions.

Load Factor Limits and Rehash Strategy

Cuckoo hashing operates under tighter load factor constraints than any other common hash table variant. The theoretical maximum for two tables of size n is 50% per table — meaning you can fill at most n keys into a pair of tables each holding n slots before cycle probability becomes prohibitive. In practice that ceiling is lower, and the gap between theory and production reality matters.

Below 40% per table, insertion is nearly always fast. Eviction chains are short — typically two to five steps. Cycle probability is negligible for most key distributions and hash function pairs. This is the comfortable operating range where cuckoo hashing delivers its performance guarantee without any insertion drama.

Between 40% and 50%, behaviour depends heavily on hash function quality and key distribution. With good independent hash functions and non-adversarial keys, you can run in this range and see occasional longer chains but rare failures. With correlated functions or adversarial keys, failure rates in this band climb fast. This is the region where monitoring insertion failure rate becomes important — load factor alone does not tell you enough.

Above 50%, insertion failure should be expected rather than tolerated. The probability curve is exponential in this range. Trying to squeeze extra utilisation out of a cuckoo table above 50% is not a capacity optimisation — it is borrowing reliability against a debt that collects with interest during your next traffic spike.

The production-grade rehash strategy has four components. First, double the table capacity — 2x growth is standard and keeps amortised insertion cost O(1) across the table's lifetime. Second, generate new hash function seeds — the same functions on a larger table reproduce the same adversarial slot mappings. Third, re-insert every key from the old tables into the new ones — this is O(n) but happens infrequently, and the new seeds mean the insertion success rate should be high. Fourth, retry rehash with a further capacity increase if the first attempt fails — adversarial key sets occasionally defeat even a freshly seeded pair; cap retries at five and throw on persistent failure rather than looping silently.

For systems where rehash latency is unacceptable — a routing table that cannot pause for 50ms while 2 million entries are re-inserted — consider the stash approach. A small auxiliary array (16 to 64 slots) holds overflow keys that failed eviction. These keys require a third memory access during lookup, but the stash reduces rehash frequency dramatically. A stash of size O(log n) reduces insertion failure probability from O(1/n) to O(1/n²) — the difference between rehashing every few thousand inserts and rehashing every few billion.

io/thecodeforge/hashing/CuckooResizer.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
package io.thecodeforge.hashing;

/**
 * Handles table resizing and rehashing with independent hash seeds.
 *
 * Called when CuckooHashMap.put() returns false — meaning the eviction
 * chain exceeded MAX_EVICTION_CHAIN and the current table cannot
 * accommodate the key set under its current hash functions.
 *
 * Rehash strategy:
 *  1. Double capacity
 *  2. Allocate new tables (old tables become eligible for GC)
 *  3. Re-insert all entries — new hash functions break adversarial patterns
 *  4. If re-insertion fails, grow by an additional 1.5x and retry
 *  5. After MAX_REHASH_ATTEMPTS, throw — something is structurally wrong
 *
 * The caller is responsible for replacing the reference to the old map.
 * In Java, the old Entry arrays become unreachable after replacement
 * and will be collected at the next GC cycle — confirm with a heap dump
 * if memory usage does not drop as expected.
 */
public class CuckooResizer<K, V> {

    private static final double GROWTH_FACTOR        = 2.0;
    private static final double RETRY_GROWTH_FACTOR  = 1.5;
    private static final int    MAX_REHASH_ATTEMPTS  = 5;

    /**
     * Resize and rehash the given map, returning a new map at larger capacity.
     *
     * The returned map uses the same CuckooHashMap class but with a new
     * capacity. Hash seed variation is handled by the new capacity interacting
     * with the mixing constants in hash1/hash2 — for stronger independence
     * under adversarial conditions, parameterise HashFunctions with
     * externally supplied seeds and rotate them here.
     *
     * @throws IllegalStateException if rehash fails after all retry attempts
     */
    public static <K, V> CuckooHashMap<K, V> resize(CuckooHashMap<K, V> current) {
        int newCapacity = (int) (current.capacity * GROWTH_FACTOR);

        for (int attempt = 1; attempt <= MAX_REHASH_ATTEMPTS; attempt++) {
            CuckooHashMap<K, V> newMap  = new CuckooHashMap<>(newCapacity);
            boolean             success = true;

            // Re-insert from table1
            for (CuckooHashMap.Entry<K, V> entry : current.table1) {
                if (entry != null && !newMap.put(entry.key, entry.value)) {
                    success = false;
                    break;
                }
            }

            // Re-insert from table2 only if table1 succeeded
            if (success) {
                for (CuckooHashMap.Entry<K, V> entry : current.table2) {
                    if (entry != null && !newMap.put(entry.key, entry.value)) {
                        success = false;
                        break;
                    }
                }
            }

            if (success) {
                // Old map's Entry arrays are now unreachable — GC eligible
                return newMap;
            }

            // This attempt failed — grow further and try again
            // Capacity increase changes the modulo distribution, acting as
            // an implicit seed change for fixed mixing constants
            newCapacity = (int) (newCapacity * RETRY_GROWTH_FACTOR);
        }

        throw new IllegalStateException(
            String.format(
                "CuckooResizer: failed after %d rehash attempts starting at capacity %d. "
              + "Key distribution is likely adversarial. Consider SipHash with rotated seeds.",
                MAX_REHASH_ATTEMPTS,
                (int) (current.capacity * GROWTH_FACTOR)
            )
        );
    }
}
Load Factor Operating Zones
Below 40% per table: comfortable operating range, insertion is fast and failures are rare. Between 40% and 45%: monitor insertion failure rate closely — load factor alone is not enough information. Between 45% and 50%: alert and schedule a resize; you are on the steep part of the failure probability curve. Above 50%: resize immediately, do not wait for the next scheduled window. The failure rate from here is exponential and traffic spikes will make it worse.
Production Insight
We ran a packet classifier at 48% per-table load for three months without any issues we could attribute to cuckoo behaviour. Then a DNS amplification attack introduced a burst of source IPs that happened to hash adversarially under our function pair. Insertion failure rate jumped from 0.08% to 14% in under two minutes. The 48% load factor that had been fine for three months was no longer a safe number.
What this taught us: load factor is a necessary monitoring metric but not a sufficient one. Insertion failure rate is the actual signal. We now alert when insertion failure rate exceeds 0.5% for any 60-second window, regardless of what the load factor gauge reads. Load factor tells you where you are on the probability curve; failure rate tells you whether the current key distribution is well-behaved relative to your hash functions.
Key Takeaway
45% per-table is the production ceiling — not 50%, which is the theoretical maximum assuming non-adversarial keys and independent hash functions.
Monitor insertion failure rate alongside load factor. A low load factor with a rising failure rate means adversarial key distribution, not a broken load gauge.
Rehash must use genuinely new seeds or a new capacity that changes the modulo distribution — same functions, same capacity, same adversarial mapping.

Cuckoo Hashing vs Alternatives in Production

Choosing a hash table implementation is a trade-off analysis, and the right answer depends on ratios you should measure rather than assume. Here is how cuckoo hashing compares against the alternatives you are actually likely to consider.

Chained hashing is the baseline: each slot holds a linked list, collisions extend the chain, and load factors of 80% or higher are manageable. Insertion is O(1) amortized. Lookup is O(1) expected but degrades under load — a heavily loaded bucket means traversing several nodes with pointer dereferences at each step. For CPU cache efficiency this is poor, and for latency guarantees it is non-starter territory. Java's HashMap is chained and switches to a tree structure per bucket at chain length 8, which bounds the worst case but adds per-insertion overhead and GC pressure from node allocation.

Linear probing is cache-friendly because it accesses consecutive memory locations, which plays well with prefetchers. But it clusters under load — the probe sequence lengthens quadratically as occupancy rises, and worst-case lookup is O(n). Deletion requires tombstone markers that accumulate and degrade subsequent lookups without periodic compaction.

Robin Hood hashing reduces probe sequence variance by requiring each key to be displaced if the new arrival has searched longer than the current occupant — a richer-gets-displaced policy. This produces a more uniform probe distribution and lower variance in lookup cost, but worst-case is still O(n). Its advantage over cuckoo hashing is at high load factors (70%+ is viable) and in write-heavy workloads where insertion predictability matters more than lookup guarantees.

Hopscotch hashing constrains keys to a small neighbourhood around their home slot, enabling concurrent lock-free implementations with bounded lookup distances. It handles high load well and is cache-friendly. It is more complex to implement correctly than either open addressing or cuckoo hashing, and the lookup bound is neighbourhood size rather than a hard two-location check.

Cuckoo hashing wins when: lookups are the dominant operation by a large margin (95%+), lookup latency must be bounded not just on average but at every percentile, or concurrent lock-free reads are required without complex synchronisation. The two-fixed-location property enables lock-free reads with a single memory fence — no pointer chasing, no variable-length probe sequence, no possibility of a reader observing a partially-written chain.

The benchmark numbers from our routing table comparison are worth stating concretely: cuckoo achieved 38ns p99 lookup and 1,200ns p99 insertion. Robin Hood achieved 52ns p99 lookup and 45ns p99 insertion. For a workload that was 99.7% reads, cuckoo's lookup advantage was decisive. For a workload that was 60% reads, Robin Hood's insertion consistency would have won.

Production Decision Framework
Measure your actual read/write ratio before choosing. If reads exceed 95% of operations and lookup latency is in your SLA, cuckoo hashing is likely the right call. If writes exceed 20% of operations, the insertion unpredictability of cuckoo hashing will show up in your latency percentiles and you are better served by open addressing or Robin Hood. If you need concurrent reads without a reader-writer lock, cuckoo's two-fixed-location property makes it the cleanest implementation target.
Production Insight
We benchmarked cuckoo against Robin Hood in our routing table before committing to a choice. The results were unambiguous for our workload: cuckoo's p99 lookup was 38ns versus Robin Hood's 52ns — a 37% improvement that mattered at our traffic volume. Cuckoo's p99 insertion was 1,200ns versus Robin Hood's 45ns — a 26x difference that would have mattered if we were inserting frequently.
We were not. Route table updates happen on configuration changes, not on the data path. The 1,200ns insertion cost was irrelevant; the 38ns lookup was everything. The lesson is that benchmarking in isolation does not substitute for understanding your actual operation ratio. Both implementations were correct. One was correct for our workload.
Key Takeaway
No hash table wins on every dimension — cuckoo hashing wins when lookup latency is the primary constraint and insertions are infrequent enough that their unpredictability does not show in your p99.
Measure your actual read/write ratio. If you are guessing, you are choosing the wrong structure for the wrong reasons.
Cuckoo's two-fixed-location property is also its concurrency advantage — lock-free reads are implementable without the complexity that pointer-chasing structures require.
Hash Table Selection in Production
IfRead-heavy workload (over 95% lookups), latency is in the SLA
UseCuckoo hashing — guaranteed O(1) lookup at every percentile, two fixed memory addresses enable hardware prefetching
IfWrite-heavy or balanced read/write workload
UseOpen addressing with linear probing or Robin Hood hashing — insertion predictability and cache locality matter more than lookup guarantees
IfHigh load factor required (above 75%) due to memory constraints
UseChained hashing — handles high occupancy gracefully; cuckoo's 45% ceiling is not negotiable
IfConcurrent lock-free reads required without reader-writer lock overhead
UseCuckoo hashing — two fixed slot addresses per key, atomic reads are sufficient, no pointer traversal
IfMemory constrained, cannot afford two equally-sized tables
UseRobin Hood hashing — single table, variance-reduced probe lengths, no second table allocation
● Production incidentPOST-MORTEMseverity: high

Cuckoo table stuck in infinite rehash loop during traffic spike

Symptom
Lookup latency spiked from 40ns to over 10ms. The cuckoo hash table entered a resize-rehash cycle where each new table, freshly allocated at 2x capacity, still failed to accommodate the eviction chains triggered by the incoming keys. Each failure triggered another resize. The table was growing but lookups were stalled for the entire duration because the insertion thread held a write lock across the rehash. Twelve seconds elapsed before the cycle broke.
Assumption
The team had read that cuckoo hashing insertion was O(1) average-case and designed the system around that assumption. No eviction chain depth limit existed in the implementation. The hash functions used the same seed with different offsets — close enough to correlated that adversarial key patterns emerged naturally from the traffic burst.
Root cause
Two things compounded. First, the per-table load factor had crept past 55% during the burst — a range where cycle probability per insertion approaches 10% and insertion failure becomes a near-certainty for adversarial key distributions. Second, the hash functions were correlated: h2(k) was derived from h1(k) with a fixed offset, which meant many keys shared candidate slot regions across both tables. Without a depth limit on the eviction chain, the algorithm looped indefinitely on the insertion thread. The resize used the same hash function pair on a larger table, which reproduced the same problematic slot assignments and failed immediately — triggering another resize.
Fix
Three changes shipped together. A hard eviction chain limit of 512 iterations was added — when exceeded, insertion aborts and triggers a resize rather than looping. The hash functions were replaced with SipHash using genuinely independent seeds, verified with an overlap rate test before deployment. A load factor monitor was added to the metrics pipeline alerting at 45% per-table occupancy with a 15-minute lag tolerance, giving the on-call engineer time to pre-emptively resize before the threshold became critical.
Key lesson
  • Never trust theoretical average-case insertion cost — bound the eviction chain length unconditionally
  • Monitor per-table load factor continuously; alert at 45%, not 50% — by the time you hit 50% the probability curve has already turned against you
  • Correlated hash functions are the silent killer — verify independence empirically before deployment, not after a production incident
  • A rehash must use genuinely new hash seeds, not the same functions applied to a larger table — same seeds reproduce the same adversarial mapping
Production debug guideSymptom to action mapping for common cuckoo hashing failures in production5 entries
Symptom · 01
Insertion returns false repeatedly even with low load factor
Fix
The most likely cause is correlated hash functions, not high load. Two functions that are linear transforms of each other send keys to the same slot regions across both tables, creating cycles at any load. Run estimateIndependence() against a representative key sample — overlap above 2/capacity confirms the problem. Replace with SipHash or MurmurHash3 using genuinely different seeds before any other investigation.
Symptom · 02
Lookup returns null for keys known to exist in the table
Fix
The hash functions used at lookup time do not match those used at insertion time. This happens when a rehash generates new seeds but old references to the map (cached in another thread or service) still call the original hash functions. The key is physically in the table but at a slot the caller never checks. Verify that hash function seeds are stored with the map instance and propagated to all callers on rehash. This is silent data loss — it raises no exception.
Symptom · 03
Latency spikes during insert-heavy workloads, lookups unaffected
Fix
Profile eviction chain length distribution, not just average. A mean chain length of 4 with a p99 of 200 is a different problem than a mean of 15. If p99 chain length is growing, your load factor is approaching the danger zone — resize proactively before the failure rate climbs. DBMS-style metrics: track insertions per second versus resize events per hour; if resizes are occurring more than once per 10 million inserts, the table is running too hot.
Symptom · 04
Memory usage doubles unexpectedly without a corresponding spike in stored keys
Fix
A resize was triggered but the old table has not been garbage collected. In Java, a lingering reference — a metrics collector holding a snapshot, a thread-local cache, a static field — keeps the old Entry arrays alive. Capture a heap dump with jmap -dump:format=b,file=heap.hprof and inspect for CuckooHashMap instances. The old table's Entry arrays will show up with zero live references to them from application code but retained by the GC root path.
Symptom · 05
CPU pegged at 100% on a single thread with no useful work visible in the application log
Fix
This is almost certainly an unbounded eviction loop. Take a thread dump with jstack — the insertion thread will be in a tight loop inside the put() method with no progress on the chain counter. Kill the thread if the service can tolerate it, or kill the process and redeploy with the max-iteration guard in place. After recovery, audit the insertion path: MAX_EVICTION_CHAIN must be set before any production deployment.
★ Cuckoo Hashing Quick Debug Cheat SheetFast diagnostics for production cuckoo hashing issues — commands you can run in the first five minutes
Insertion stuck in loop, thread unresponsive
Immediate action
Take a thread dump to confirm the loop location, then kill the session and redeploy with a chain length guard
Commands
jstack <pid> | grep -A 20 CuckooHashMap
jmap -histo <pid> | head -30
Fix now
Add MAX_EVICTION_CHAIN = 512 guard to put(); resize to 2x capacity with new hash seeds if exceeded
Lookup latency above 1 microsecond — should be sub-100ns+
Immediate action
Check per-table load factor before touching anything else — high load is the most common cause of degraded lookup that should be O(1)
Commands
jcmd <pid> GC.heap_info
curl localhost:8080/metrics | grep cuckoo_load
Fix now
If per-table load exceeds 45%, resize immediately; pre-allocate at startup to avoid runtime resizes on the critical path
Keys missing after a rehash event+
Immediate action
Confirm whether hash function seeds changed during rehash and whether all callers received the updated map reference
Commands
grep -n 'new CuckooHashMap' *.java
diff hash_config_old.yaml hash_config_new.yaml
Fix now
Persist hash seeds with the map instance; ensure all code paths that cached the old map reference are invalidated on rehash
Hash Table Implementations Compared
ImplementationLookup Worst-CaseInsertion CostMax Load FactorConcurrencyBest For
Chained HashingO(n) — chain length grows with load; Java HashMap switches to tree at depth 8O(1) amortized — node allocation adds GC pressure80–90% — handles high occupancy gracefullyLock per bucket — straightforward to implementGeneral purpose, high load, or when simplicity matters more than p99 latency
Linear ProbingO(n) — probe sequence length grows with clustering; degrades sharply above 70%O(1) amortized — deletion requires tombstones that accumulate70–80% — clustering degrades performance before the table is fullDifficult — tombstones complicate concurrent deletionCache-sensitive workloads where sequential memory access matters
Robin Hood HashingO(n) — bounded variance, but no hard worst-case guaranteeO(1) amortized — backward-shift deletion is clean but adds insertion complexity75–85% — better than linear probing due to variance reductionComplex — backward shift during deletion complicates concurrent writesBalanced read/write workloads where insertion predictability and high load factor both matter
Cuckoo HashingO(1) — exactly two memory accesses, hard guarantee at any load below thresholdO(n) worst case — eviction chain can be long; returns false on cycle, not exception45–50% per table — theoretical limit is 50%, production ceiling is 45%Lock-free reads — two fixed addresses per key, atomic reads sufficient for lookupsRead-heavy latency-critical workloads: packet classifiers, routing tables, cache simulators
Hopscotch HashingO(1) — bounded neighbourhood size, typically 32–64 slotsO(1) amortized — displacement within neighbourhood is cache-local80–90% — handles high load while maintaining neighbourhood localityLock-free possible — neighbourhood-based locking enables fine-grained concurrencyHigh load combined with cache locality requirements and concurrent access

Key takeaways

1
Cuckoo hashing guarantees O(1) worst-case lookup
exactly two memory accesses, at every percentile, under any load below the threshold. Not amortized, not expected.
2
Every key has two candidate slots across two tables. Lookup checks both and is done. Insertion evicts and relocates until an empty slot is found or the chain limit is hit.
3
The per-table load factor ceiling is ~50% theoretical, 45% in production. Above 50%, insertion failure probability rises exponentially. This is much stricter than alternatives and is the direct cost of the lookup guarantee.
4
Hash function independence is the mechanism, not a quality attribute. Correlated functions collapse both tables into one effective table, causing premature cycles at any load. Verify independence empirically before deployment.
5
Always bound eviction chain length at 512 iterations. When exceeded, return false
never loop silently. The caller must trigger a resize with new hash seeds before retrying.
6
Best fit
read-heavy latency-critical workloads where insertions are infrequent or can be batched. Packet classifiers, routing tables, build-once-read-many indexes, and CPU cache simulators are the natural habitat.

Common mistakes to avoid

5 patterns
×

Using correlated hash functions — most commonly h2 = h1 + constant or h2 = h1 XOR constant

Symptom
Insertion fails at 25–30% per-table load instead of the expected 45–50%. Eviction chains are long because both candidate slots for many keys map to the same slot regions across both tables. estimateIndependence() returns an overlap rate several times higher than 1/capacity.
Fix
Use genuinely independent hash functions — SipHash with separate 128-bit key pairs, or MurmurHash3's 128-bit output split into two 64-bit halves. Run estimateIndependence() against at least 10,000 representative keys before deployment and verify overlap is below 2/capacity. Any overlap significantly above that threshold requires a different function pair.
×

No max eviction chain length guard on the put() method

Symptom
Insertion thread enters an infinite loop. CPU climbs to 100% on one core. The service appears alive — health checks respond, read traffic is unaffected — but all writes are silently stalled. Thread dump shows the insertion thread spinning in a tight loop inside put() with no progress visible in application logs.
Fix
Add MAX_EVICTION_CHAIN = 512 to the put() method. When the loop counter reaches this limit, return false immediately. The caller must handle false explicitly — trigger CuckooResizer.resize() and retry the insertion. Never silently drop the key on false return; the API contract is explicit failure, not silent loss.
×

Operating above 50% per-table load factor without monitoring insertion failure rate

Symptom
Exponential increase in insertion failures that correlates with traffic spikes rather than with absolute load factor. Rehash events become frequent, each one pausing the insertion path for the duration of O(n) re-insertion. Latency percentiles show spikes that look like GC pauses but are actually rehash holds.
Fix
Target 40% per-table in production. Alert at 45%. Trigger an immediate resize at 50% rather than waiting for the next scheduled maintenance window. Add insertion failure rate as a first-class metric — alert when it exceeds 0.5% over any 60-second window, independently of what the load factor gauge reports.
×

Rehashing with the same hash function seeds on a larger table

Symptom
After resize, the same adversarial key pattern causes immediate failure again on the new table. The table grew but insertion still fails. CuckooResizer enters its retry loop and may exhaust MAX_REHASH_ATTEMPTS before finding a stable configuration.
Fix
Generate new hash function seeds (or use a new capacity that changes the modulo distribution for fixed mixing constants) on every rehash. The old seeds produced the cycle; new seeds break the adversarial slot mapping. If using externally supplied seeds, persist them alongside the table state and rotate them on each resize event.
×

Choosing cuckoo hashing for write-heavy workloads based on the O(1) average-case insertion claim

Symptom
p99 insertion latency exceeds 1ms during normal operation. Frequent resizes consume CPU and generate GC pressure from the old table's Entry arrays lingering until the next collection. Overall write throughput is lower than with open addressing despite more complex infrastructure.
Fix
Cuckoo hashing is the right tool when lookups exceed 95% of operations. For workloads where writes exceed 20%, Robin Hood hashing or linear probing will deliver better insertion consistency with comparable or superior overall throughput. Benchmark your actual read/write ratio before committing to either structure.
INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01SENIOR
What is the worst-case lookup time for cuckoo hashing, and how does it d...
Q02SENIOR
How does cuckoo hashing handle the case where an insertion creates an in...
Q03SENIOR
What is the maximum practical load factor for cuckoo hashing, and why?
Q04SENIOR
Why is hash function independence critical in cuckoo hashing?
Q05SENIOR
When would you choose cuckoo hashing over Robin Hood hashing in a produc...
Q06JUNIOR
Explain the eviction mechanism in cuckoo hashing with a concrete example...
Q01 of 06SENIOR

What is the worst-case lookup time for cuckoo hashing, and how does it differ from open addressing?

ANSWER
Cuckoo hashing guarantees O(1) worst-case lookup — exactly two memory accesses, always, regardless of how full the table is or how the keys are distributed. Open addressing has O(n) worst-case lookup where n is the table size, because the probe sequence in a nearly-full table can visit every occupied slot before finding the target or confirming absence. The distinction matters in latency-sensitive systems. O(1) amortized or O(1) expected — which is what most hash tables deliver — means the average is fast but individual operations can be slow. Cuckoo hashing's O(1) worst-case means every single lookup has the same cost: compute two hash values, read two memory addresses. No outliers, no tail latency from unlucky probe sequences. The trade-off: cuckoo hashing achieves this by front-loading complexity into insertion. Insertion runs an eviction chain that can be long and can fail entirely. Open addressing has O(1) amortized insertion with no failure case. Choose cuckoo hashing when lookup latency is the constraint; choose open addressing when insertion predictability matters more.
FAQ · 6 QUESTIONS

Frequently Asked Questions

01
What happens if both candidate slots for a key are occupied during lookup?
02
Can cuckoo hashing support deletion?
03
How many hash tables does cuckoo hashing use?
04
Is cuckoo hashing suitable for concurrent environments?
05
What is a 'stash' in cuckoo hashing?
06
Why does cuckoo hashing require rehashing with new hash functions rather than just a larger table?
🔥

That's Hashing. Mark it forged?

11 min read · try the examples if you haven't

Previous
Rabin-Karp String Matching
7 / 11 · Hashing
Next
Greedy Algorithm Introduction