Hamming Code — Double-Bit Error Mis-Correction
- Hamming distance: bits that differ between two strings — minimum distance 3 enables 1-bit correction.
- Parity bits at power-of-2 positions each cover a specific subset of data positions.
- Syndrome decoding: XOR parity checks form a binary number pointing to the error position.
- Hamming code adds parity bits at power-of-2 positions to detect and correct single-bit errors.
- Each parity bit covers a specific set of data bits; XOR checks produce a syndrome that points to the error location.
- Minimum Hamming distance of 3 allows 1-bit correction; distance d corrects ⌊(d-1)/2⌋ errors.
- Real-world use: ECC RAM corrects memory bit flips silently, preventing crashes.
- Common mistake: assuming Hamming(7,4) handles burst errors — it cannot; it only fixes one bit per block.
ECC RAM Diagnostic Commands
Check if ECC is enabled on the system
sudo dmidecode -t memory | grep -E 'Type:|Error Correction|Supported'dmesg | grep -i edacHigh corrected error count on one DIMM
sudo edac-util --report=allgrep . /sys/devices/system/edac/mc/mc*/csrow*/ce_countUncorrectable memory error (UE) reported
sudo mcelog --client --record | grep -i 'uncorrected'sudo ras-mc-ctl --errorsProduction Incident
Production Debug GuideDiagnosing single-bit and multi-bit errors on Linux
In 1947, Richard Hamming was working at Bell Labs on the Model V relay computer. Every time the computer hit an error while he was away for the weekend, it would simply stop and wait for an operator to fix it — losing his entire Saturday batch job. Frustrated, he thought: 'If the machine can detect an error, why can't it correct it?' Over the next two years he developed Hamming codes — the first error-correcting codes.
Today, ECC (Error Correcting Code) RAM uses Hamming codes on every memory operation. Server-grade machines will not pass certification without ECC RAM. Every time you access memory on a server, a Hamming code is silently verifying and potentially correcting the read. The DRAM inside your laptop likely performs billions of Hamming-protected reads per second.
Hamming Distance
The Hamming distance between two binary strings is the number of positions where they differ. Hamming(7,4) has minimum distance 3 — any two valid codewords differ in at least 3 positions. This means: - Any 1-bit error produces an invalid codeword (detectable) - The invalid codeword is closest to exactly one valid codeword (correctable)
General rule: minimum distance d allows detecting d-1 errors and correcting ⌊(d-1)/2⌋ errors.
- Distance 3 means no two codewords share a nearest neighbour for single-bit errors.
- If a received word has one flipped bit, it's still closer to the original than to any other codeword.
- Two-bit errors can be detected (land halfway between two lights) but not corrected.
Hamming(7,4) Encoding
Place data bits at positions 3,5,6,7 (non-powers-of-2). Parity bits at positions 1,2,4 (powers of 2). Each parity bit covers a specific set of positions. The encoding process uses XOR to compute parity values that make the total number of 1s in each parity's coverage set even (even parity).
def hamming_encode(data: list[int]) -> list[int]: """Encode 4 data bits into 7-bit Hamming code.""" # Positions 1-7 (1-indexed): p1,p2,d1,p4,d2,d3,d4 d = data # [d1, d2, d3, d4] # Build 7-bit codeword (0-indexed internally) code = [0] * 7 code[2] = d[0] # position 3 code[4] = d[1] # position 5 code[5] = d[2] # position 6 code[6] = d[3] # position 7 # Compute parity bits # p1 (pos 1): covers positions 1,3,5,7 code[0] = code[2] ^ code[4] ^ code[6] # p2 (pos 2): covers positions 2,3,6,7 code[1] = code[2] ^ code[5] ^ code[6] # p4 (pos 4): covers positions 4,5,6,7 code[3] = code[4] ^ code[5] ^ code[6] return code def hamming_decode(code: list[int]) -> tuple[list[int], int]: """Decode and correct single-bit errors. Returns (data, error_pos).""" s1 = code[0] ^ code[2] ^ code[4] ^ code[6] # p1 check s2 = code[1] ^ code[2] ^ code[5] ^ code[6] # p2 check s4 = code[3] ^ code[4] ^ code[5] ^ code[6] # p4 check syndrome = s4 * 4 + s2 * 2 + s1 # error position (1-indexed) if syndrome: code[syndrome - 1] ^= 1 # flip the erroneous bit data = [code[2], code[4], code[5], code[6]] return data, syndrome # Encode 1011 original = [1, 0, 1, 1] codeword = hamming_encode(original) print(f'Data: {original} → Codeword: {codeword}') # Introduce error at position 5 erroneous = codeword[:] erroneous[4] ^= 1 print(f'Received (error at pos 5): {erroneous}') decoded, err_pos = hamming_decode(erroneous) print(f'Corrected: {decoded}, error at position {err_pos}')
Received (error at pos 5): [0, 1, 1, 1, 1, 1, 1]
Corrected: [1, 0, 1, 1], error at position 5
Syndrome Decoding and Error Correction
Syndrome decoding is the core of Hamming code correction. After receiving a codeword, we recompute the parity checks. If all three parity checks pass (syndrome = 0), no single-bit error occurred. If one or more fail, the binary number formed by the failed parity checks (p4,p2,p1) points directly to the bit position (1-indexed) that flipped.
For example, if parity check for p1 fails (s1=1), p2 passes (s2=0), p4 passes (s4=0), then syndrome = 001 binary = 1 -> bit 1 (the p1 bit itself) is erroneous. If s1=1, s2=1, s4=0 -> 011 = 3 -> bit 3 (the first data bit) flipped.
This works because each data bit is covered by exactly three of the parity checks — the pattern of 'which parity checks cover this bit' is unique for each bit position. A single-bit error causes the parity checks it participates in to fail, forming a unique syndrome.
Generalization to Larger Codes
Hamming codes exist for any r ≥ 2: Hamming(2^r - 1, 2^r - 1 - r). For 64-bit words (common in modern memory), we need r such that 2^r ≥ 64 + r + 1. r = 7 gives 2^7 = 128 ≥ 64 + 7 + 1 = 72, so Hamming(127,120) is possible, but in practice memory controllers use a smaller block size with interleaving. Common variant: (72,64) SECDED code with 8 parity bits per 64-bit word — that uses a Hamming(127,120) derived code with an extra parity bit for double-error detection.
The generalization follows the same parity coverage pattern: parity bit i covers all positions where the i-th bit of the position's binary representation is 1. This allows systematic construction for any block size.
Applications
ECC RAM: Error Correcting Code memory adds Hamming-like parity bits to each 64-bit word. Used in servers, workstations, and anywhere data corruption is unacceptable.
RAID-2: Uses Hamming codes across disk drives (largely superseded by RAID-5/6).
QR Codes: Use Reed-Solomon codes (a generalisation) allowing correction of up to 30% data loss.
Satellite/Deep Space: Voyager probes use Golay codes (24,12) — correcting 3 errors per 24-bit block across billions of kilometres.
ECC Memory Internals
Modern ECC memory controllers implement a (72,64) SECDED code. For each 64-bit data word, 8 check bits are stored. The extra bit enables detection of two-bit errors (but correction of single-bit only). The controller computes the syndrome on every read; if the syndrome is non-zero and the extra parity check fails, a double-bit error is flagged.
The memory subsystem is organized into multiple banks and ranks; errors may affect one chip (chipkill) or multiple chips. Advanced features like Chipkill Correct and Memory Scrubbing use Hamming-based codes to handle partial failures.
# Simulating (72,64) SECDED syndrome decoding # In hardware, this happens in parallel over all 64+8 bits def check_syndromes(code_bits, parity_bits): # Simplified: compute 8 parity checks for 64-bit data # For demo, we show only 3 checks for Hamming(7,4) style pass # Real implementation uses XOR trees and banks of parity checkers. # Latency is typically 1-2 CPU cycles.
| Code | Block Size (data+parity) | Min Distance | Corrects | Detects | Overhead |
|---|---|---|---|---|---|
| Hamming(7,4) | 7 bits | 3 | 1 bit | 2 bits | 75% data, 25% parity |
| Extended Hamming (SECDED) | 8 bits (7+1) | 4 | 1 bit | 2 bits | ~14% overhead on 64-bit |
| BCH (63,51) | 63 bits | 5 | 2 bits | 3+ bits | ~23% overhead |
| Reed-Solomon (255,223) | 255 bytes | 33 | 16 bytes | 32 bytes | ~14% overhead at byte level |
🎯 Key Takeaways
- Hamming distance: bits that differ between two strings — minimum distance 3 enables 1-bit correction.
- Parity bits at power-of-2 positions each cover a specific subset of data positions.
- Syndrome decoding: XOR parity checks form a binary number pointing to the error position.
- Hamming(7,4): encode 4 bits in 7 — 3 parity bits overhead for single-bit correction.
- ECC RAM, QR codes, RAID, satellite comm all use Hamming or generalised codes (Reed-Solomon).
- Production systems use SECDED (Hamming with extra parity) to detect but not correct double-bit errors.
- Monitor corrected error counts — they reveal hardware failing before catastrophic crashes.
⚠ Common Mistakes to Avoid
Interview Questions on This Topic
- QHow many parity bits are needed to correct a single-bit error in an n-bit codeword?JuniorReveal
- QExplain syndrome decoding — how does the syndrome identify the error position?Mid-levelReveal
- QWhat is the relationship between minimum Hamming distance and error correction capability?SeniorReveal
- QHow does ECC RAM differ from regular RAM?Mid-levelReveal
- QCan Hamming code correct a two-bit error?SeniorReveal
Frequently Asked Questions
How many parity bits does Hamming code need for n data bits?
You need r parity bits where 2^r ≥ n + r + 1. For 4 data bits: 2^3=8 ≥ 4+3+1=8 → 3 parity bits (Hamming(7,4)). For 8 data bits: 2^4=16 ≥ 8+4+1=13 → 4 parity bits (Hamming(12,8)). For 64 data bits: 2^7=128 ≥ 64+7+1=72 → 7 parity bits but used in ECC as (72,64) with 8 bits for SECDED.
What is the difference between parity check and Hamming code?
A single parity bit can only detect an odd number of errors; it cannot correct. Hamming code uses multiple parity bits, each covering a different subset of bits, to both detect and locate a single-bit error. The parity coverage sets are designed so that the syndrome uniquely identifies the error position.
Why is ECC RAM slower than regular RAM?
ECC memory controllers require extra cycles to compute syndrome and, if necessary, correct the data before delivering it to the CPU. The overhead is usually 1-2 additional CPU cycles per read (~2-5ns added latency). Some high-end controllers pipeline this to hide the latency. The trade-off is acceptable for data integrity.
Can I use Hamming code for error correction in wireless communication?
Wireless channels typically experience burst errors (multiple consecutive bits flipped). Hamming code is not ideal for bursts because it assumes random single-bit errors. Interleaving can spread bursts across multiple Hamming blocks, but convolutional codes or Reed-Solomon are more efficient for wireless.
What happens if the syndrome points to a parity bit?
If the syndrome points to a parity bit (e.g., position 1, 2, or 4 in Hamming(7,4)), then that parity bit itself was corrupted. The correction flips that parity bit, which does not affect the data bits. The data is read as correct — the parity bit was only used for detection/correction.
Developer and founder of TheCodeForge. I built this site because I was tired of tutorials that explain what to type without explaining why it works. Every article here is written to make concepts actually click.