Hamming Code — Double-Bit Error Mis-Correction
A stuck bit in ECC RAM is silently corrected; a second cosmic-ray flip causes Hamming mis-correction and undetected corruption.
- Hamming code adds parity bits at power-of-2 positions to detect and correct single-bit errors.
- Each parity bit covers a specific set of data bits; XOR checks produce a syndrome that points to the error location.
- Minimum Hamming distance of 3 allows 1-bit correction; distance d corrects ⌊(d-1)/2⌋ errors.
- Real-world use: ECC RAM corrects memory bit flips silently, preventing crashes.
- Common mistake: assuming Hamming(7,4) handles burst errors — it cannot; it only fixes one bit per block.
When data travels over a network or is stored on disk, bits can flip. Hamming code adds a few extra 'parity' bits that together act like a GPS for errors — they can pinpoint exactly which bit flipped and correct it. Richard Hamming invented this in 1950 after getting frustrated with punch card machines that crashed on single-bit errors.
In 1947, Richard Hamming was working at Bell Labs on the Model V relay computer. Every time the computer hit an error while he was away for the weekend, it would simply stop and wait for an operator to fix it — losing his entire Saturday batch job. Frustrated, he thought: 'If the machine can detect an error, why can't it correct it?' Over the next two years he developed Hamming codes — the first error-correcting codes.
Today, ECC (Error Correcting Code) RAM uses Hamming codes on every memory operation. Server-grade machines will not pass certification without ECC RAM. Every time you access memory on a server, a Hamming code is silently verifying and potentially correcting the read. The DRAM inside your laptop likely performs billions of Hamming-protected reads per second.
Hamming Distance
The Hamming distance between two binary strings is the number of positions where they differ. Hamming(7,4) has minimum distance 3 — any two valid codewords differ in at least 3 positions. This means: - Any 1-bit error produces an invalid codeword (detectable) - The invalid codeword is closest to exactly one valid codeword (correctable)
General rule: minimum distance d allows detecting d-1 errors and correcting ⌊(d-1)/2⌋ errors.
- Distance 3 means no two codewords share a nearest neighbour for single-bit errors.
- If a received word has one flipped bit, it's still closer to the original than to any other codeword.
- Two-bit errors can be detected (land halfway between two lights) but not corrected.
Hamming(7,4) Encoding
Place data bits at positions 3,5,6,7 (non-powers-of-2). Parity bits at positions 1,2,4 (powers of 2). Each parity bit covers a specific set of positions. The encoding process uses XOR to compute parity values that make the total number of 1s in each parity's coverage set even (even parity).
Syndrome Decoding and Error Correction
Syndrome decoding is the core of Hamming code correction. After receiving a codeword, we recompute the parity checks. If all three parity checks pass (syndrome = 0), no single-bit error occurred. If one or more fail, the binary number formed by the failed parity checks (p4,p2,p1) points directly to the bit position (1-indexed) that flipped.
For example, if parity check for p1 fails (s1=1), p2 passes (s2=0), p4 passes (s4=0), then syndrome = 001 binary = 1 -> bit 1 (the p1 bit itself) is erroneous. If s1=1, s2=1, s4=0 -> 011 = 3 -> bit 3 (the first data bit) flipped.
This works because each data bit is covered by exactly three of the parity checks — the pattern of 'which parity checks cover this bit' is unique for each bit position. A single-bit error causes the parity checks it participates in to fail, forming a unique syndrome.
Generalization to Larger Codes
Hamming codes exist for any r ≥ 2: Hamming(2^r - 1, 2^r - 1 - r). For 64-bit words (common in modern memory), we need r such that 2^r ≥ 64 + r + 1. r = 7 gives 2^7 = 128 ≥ 64 + 7 + 1 = 72, so Hamming(127,120) is possible, but in practice memory controllers use a smaller block size with interleaving. Common variant: (72,64) SECDED code with 8 parity bits per 64-bit word — that uses a Hamming(127,120) derived code with an extra parity bit for double-error detection.
The generalization follows the same parity coverage pattern: parity bit i covers all positions where the i-th bit of the position's binary representation is 1. This allows systematic construction for any block size.
Applications
ECC RAM: Error Correcting Code memory adds Hamming-like parity bits to each 64-bit word. Used in servers, workstations, and anywhere data corruption is unacceptable.
RAID-2: Uses Hamming codes across disk drives (largely superseded by RAID-5/6).
QR Codes: Use Reed-Solomon codes (a generalisation) allowing correction of up to 30% data loss.
Satellite/Deep Space: Voyager probes use Golay codes (24,12) — correcting 3 errors per 24-bit block across billions of kilometres.
ECC Memory Internals
Modern ECC memory controllers implement a (72,64) SECDED code. For each 64-bit data word, 8 check bits are stored. The extra bit enables detection of two-bit errors (but correction of single-bit only). The controller computes the syndrome on every read; if the syndrome is non-zero and the extra parity check fails, a double-bit error is flagged.
The memory subsystem is organized into multiple banks and ranks; errors may affect one chip (chipkill) or multiple chips. Advanced features like Chipkill Correct and Memory Scrubbing use Hamming-based codes to handle partial failures.
Silent Data Corruption from ECC-Protected Memory Failure
- ECC corrects single-bit errors but cannot correct double-bit errors; it only detects them (if the code has distance 3 or more).
- Silent correction hides failing hardware — monitor corrected error counts even if they're 'harmless'.
- Use SECDED (Single Error Correction, Double Error Detection) variants (e.g., Hamming with extra parity) for production systems.
Key takeaways
Common mistakes to avoid
4 patternsUsing Hamming(7,4) for burst error correction
Assuming ECC RAM prevents all data corruption
Miscalculating required parity bits for a given data size
Ignoring corrected error counters in production
Interview Questions on This Topic
How many parity bits are needed to correct a single-bit error in an n-bit codeword?
Frequently Asked Questions
That's Hashing. Mark it forged?
3 min read · try the examples if you haven't