Universal Hashing and Perfect Hashing
- Fixed hash functions are attackable β adversary can force O(n) operations.
- 2-universal family: Pr[h(x)=h(y)] β€ 1/m for random h, any xβ y.
- Python 3.3+ randomises hash seeds β this is universal hashing in practice.
In 2011, the same hash collision vulnerability hit Python, PHP, Ruby, Java, and several other languages simultaneously. Attackers discovered that submitting carefully crafted HTTP POST data β with parameter names all hashing to the same slot β could cause web servers to spend seconds processing a single request. The fix was not changing the hash function; it was randomising the hash function at startup. This is universal hashing in practice.
Universal hashing is the theoretical foundation for this defence: if the hash function is chosen randomly from a 2-universal family, no adversary can reliably cause collisions without knowing which function was selected. The adversary's best attack strategy β crafting inputs that collide under the fixed function β is defeated.
Why Fixed Hash Functions Fail
Any deterministic hash function h can be attacked. Given h, an adversary can find n keys that all hash to the same slot β making every operation O(n). This is a real attack: hash DoS attacks against web servers were widely exploited before randomisation became standard (Crosby & Wallach, 2003).
Universal Hash Family Construction
The polynomial hash family for strings is 2-universal: choose random a,b modulo prime p > m: h_{a,b}(x) = (ax + b) mod p mod m
import random class UniversalHash: """2-universal hash family for integers.""" def __init__(self, m: int, prime: int = 10**9+7): self.m = m self.p = prime self.a = random.randint(1, prime - 1) self.b = random.randint(0, prime - 1) def hash(self, x: int) -> int: return (self.a * x + self.b) % self.p % self.m def rehash(self): """Pick a new random function from the family.""" self.a = random.randint(1, self.p - 1) self.b = random.randint(0, self.p - 1) # Demonstrate: collision probability β 1/m h = UniversalHash(m=100) collisions = sum(1 for _ in range(10000) if h.hash(42) == h.hash(99)) print(f'Collision probability β {collisions/10000:.3f} (expected β€ {1/100:.3f})')
Python's Hash Randomisation
Since Python 3.3, Python randomises its string/bytes hash function at startup (PYTHONHASHSEED). This is exactly universal hashing in practice β prevents hash DoS attacks.
import os # Python uses random seed at startup print(hash('hello')) # Different every run unless PYTHONHASHSEED set print(os.environ.get('PYTHONHASHSEED', 'random')) # 'random' = randomised # Stable hashing when needed (e.g., caching) import hashlib stable = int(hashlib.md5(b'hello').hexdigest(), 16) % (10**9) print(stable) # Same every run
random
294702384
Perfect Hashing β O(1) Worst Case
For a static set of n keys, FKS (Fredman-KomlΓ³s-SzemerΓ©di) perfect hashing achieves O(1) worst-case lookup with O(n) space: 1. Hash n keys into n/2 buckets (first level, some collisions expected) 2. For each bucket with k_i keys, use a second-level table of size k_iΒ² β guarantees no collisions with high probability
Total space: O(n) expected. Build time: O(n) expected. Lookup: O(1) worst case.
Used in compilers (keyword lookup), databases (static index), and network packet classification.
π― Key Takeaways
- Fixed hash functions are attackable β adversary can force O(n) operations.
- 2-universal family: Pr[h(x)=h(y)] β€ 1/m for random h, any xβ y.
- Python 3.3+ randomises hash seeds β this is universal hashing in practice.
- Perfect hashing: O(1) worst-case lookup for static key sets using two-level scheme.
- FKS hashing achieves O(n) space and O(1) worst case for static sets.
Interview Questions on This Topic
- QWhat is a universal hash family and why is it important?
- QHow does Python prevent hash DoS attacks?
- QExplain the two-level perfect hashing scheme.
- QWhat is the collision probability guarantee for a 2-universal hash family?
Frequently Asked Questions
Is Python's dict vulnerable to hash collision attacks?
Not since Python 3.3 β PYTHONHASHSEED randomises string/bytes hashing at startup. Integers still hash deterministically (hash(n)=n for small n), but integer-keyed dict DoS is harder to exploit in practice.
Developer and founder of TheCodeForge. I built this site because I was tired of tutorials that explain what to type without explaining why it works. Every article here is written to make concepts actually click.