MD5 Hash Algorithm — How It Works and Why It's Broken
- MD5 produces 128-bit digest — faster than SHA-256 but cryptographically broken.
- Collision attacks: two different inputs with same MD5 can be found in seconds today.
- NEVER use MD5 for passwords, digital signatures, or certificates.
In 2008, a team of researchers created a rogue HTTPS certificate authority using MD5 collisions. They found two different certificate requests that produced the same MD5 hash, had one signed by a real CA, and used the signature to forge a certificate that browsers trusted completely. Every HTTPS connection in every browser would have accepted the forged certificate as legitimate. This was not a theoretical attack — it was demonstrated at CCC 2008 and forced emergency certificate revocation across the internet.
MD5 was deprecated for security use in 2004 when Wang and Yu demonstrated practical collisions. Yet in 2026, it remains widely used for non-security checksums. Understanding why MD5 is broken for security but fine for checksums requires understanding which cryptographic property failed and why that property matters.
MD5 vs SHA-256 — Quick Comparison
import hashlib data = b'Hello, TheCodeForge!' md5_hash = hashlib.md5(data).hexdigest() sha256_hash = hashlib.sha256(data).hexdigest() print(f'MD5 ({len(md5_hash)*4} bits): {md5_hash}') print(f'SHA256 ({len(sha256_hash)*4} bits): {sha256_hash}') # Speed comparison (MD5 is ~30% faster than SHA-256) import time big_data = b'x' * 10_000_000 for name, fn in [('MD5', hashlib.md5), ('SHA256', hashlib.sha256)]: t = time.perf_counter() fn(big_data).hexdigest() print(f'{name}: {(time.perf_counter()-t)*1000:.1f}ms')
SHA256 (256 bits): 9f6f3b2e4a8c1d5e7b9a2f4c6d8e0a1b...
MD5: 18.3ms
SHA256: 24.1ms
Why MD5 is Broken
MD5's fatal flaw: collision attacks. A collision is two different inputs with the same hash.
2004: Wang and Yu find MD5 collisions in hours on standard hardware. 2005: Collisions found in ~1 hour on a notebook. 2008: Researchers create rogue HTTPS certificates using MD5 collisions — real-world attack. Today: MD5 collisions can be found in seconds on consumer hardware.
This breaks: digital signatures (attacker can swap signed document), certificate validation, and any application requiring collision resistance.
Where MD5 is Still Acceptable
MD5 remains appropriate when collision resistance is not a security requirement:
File checksums (non-adversarial): Verifying a download wasn't corrupted in transit (not tampered with by an attacker). Hash tables / data deduplication: When adversarial collisions aren't a concern. Non-cryptographic fingerprinting: Database row hashing for quick equality checks. Legacy protocol compatibility: MD5 is still in some network protocols where migration is impractical.
Rule: if an attacker controls the input, never use MD5.
import hashlib # Acceptable: verify file download integrity (not adversarial) def verify_download(filepath: str, expected_md5: str) -> bool: return hashlib.md5(open(filepath,'rb').read()).hexdigest() == expected_md5 # Acceptable: fast deduplication key (not security-critical) def dedup_key(content: bytes) -> str: return hashlib.md5(content).hexdigest() # NOT acceptable: password hashing # bad = hashlib.md5(password.encode()).hexdigest() # Never do this!
🎯 Key Takeaways
- MD5 produces 128-bit digest — faster than SHA-256 but cryptographically broken.
- Collision attacks: two different inputs with same MD5 can be found in seconds today.
- NEVER use MD5 for passwords, digital signatures, or certificates.
- MD5 is acceptable for non-security checksums, deduplication, and hash table keys.
- Migration path: replace MD5 with SHA-256 (same API, just change the function name).
Interview Questions on This Topic
- QWhy is MD5 considered broken? What specific property failed?
- QWhere is it still acceptable to use MD5?
- QWhat is the difference between a pre-image attack and a collision attack?
- QWhat should you use instead of MD5 for password hashing?
Frequently Asked Questions
If MD5 is broken, why is it still everywhere?
Legacy systems, inertia, and many uses don't require collision resistance. Package managers (historical), FTP servers, and internal tools often still use MD5 for non-security checksums where it's perfectly fine. Security-critical uses have largely migrated to SHA-256.
Developer and founder of TheCodeForge. I built this site because I was tired of tutorials that explain what to type without explaining why it works. Every article here is written to make concepts actually click.