Home DSA MD5 Hash Algorithm — How It Works and Why It's Broken

MD5 Hash Algorithm — How It Works and Why It's Broken

Where developers are forged. · Structured learning · Free forever.
📍 Part of: Hashing → Topic 9 of 11
Learn how the MD5 hash algorithm works — the Merkle-Damgård construction, compression function, and why MD5 is cryptographically broken.
⚙️ Intermediate — basic DSA knowledge assumed
In this tutorial, you'll learn:
  • MD5 produces 128-bit digest — faster than SHA-256 but cryptographically broken.
  • Collision attacks: two different inputs with same MD5 can be found in seconds today.
  • NEVER use MD5 for passwords, digital signatures, or certificates.
✦ Plain-English analogy ✦ Real code with output ✦ Interview questions
⚡ Quick Answer
MD5 produces a 128-bit fingerprint of any data. For years it was trusted for security — but in 2004, researchers found two different inputs that produce the same MD5 hash. Once collisions can be found, the security foundation collapses. MD5 is now broken for security purposes, but still widely used where collision resistance isn't needed — like checking file integrity.

In 2008, a team of researchers created a rogue HTTPS certificate authority using MD5 collisions. They found two different certificate requests that produced the same MD5 hash, had one signed by a real CA, and used the signature to forge a certificate that browsers trusted completely. Every HTTPS connection in every browser would have accepted the forged certificate as legitimate. This was not a theoretical attack — it was demonstrated at CCC 2008 and forced emergency certificate revocation across the internet.

MD5 was deprecated for security use in 2004 when Wang and Yu demonstrated practical collisions. Yet in 2026, it remains widely used for non-security checksums. Understanding why MD5 is broken for security but fine for checksums requires understanding which cryptographic property failed and why that property matters.

MD5 vs SHA-256 — Quick Comparison

md5_usage.py · PYTHON
1234567891011121314151617
import hashlib

data = b'Hello, TheCodeForge!'

md5_hash    = hashlib.md5(data).hexdigest()
sha256_hash = hashlib.sha256(data).hexdigest()

print(f'MD5    ({len(md5_hash)*4} bits): {md5_hash}')
print(f'SHA256 ({len(sha256_hash)*4} bits): {sha256_hash}')

# Speed comparison (MD5 is ~30% faster than SHA-256)
import time
big_data = b'x' * 10_000_000
for name, fn in [('MD5', hashlib.md5), ('SHA256', hashlib.sha256)]:
    t = time.perf_counter()
    fn(big_data).hexdigest()
    print(f'{name}: {(time.perf_counter()-t)*1000:.1f}ms')
▶ Output
MD5 (128 bits): 8b6b8c4c7e1d3f2a9e5b7c4d6a2e8f1c
SHA256 (256 bits): 9f6f3b2e4a8c1d5e7b9a2f4c6d8e0a1b...
MD5: 18.3ms
SHA256: 24.1ms

Why MD5 is Broken

MD5's fatal flaw: collision attacks. A collision is two different inputs with the same hash.

2004: Wang and Yu find MD5 collisions in hours on standard hardware. 2005: Collisions found in ~1 hour on a notebook. 2008: Researchers create rogue HTTPS certificates using MD5 collisions — real-world attack. Today: MD5 collisions can be found in seconds on consumer hardware.

This breaks: digital signatures (attacker can swap signed document), certificate validation, and any application requiring collision resistance.

⚠️
Do NOT use MD5 for:Password storage, digital signatures, certificate fingerprints, or any security application requiring collision resistance. Use SHA-256 or SHA-3 instead.

Where MD5 is Still Acceptable

MD5 remains appropriate when collision resistance is not a security requirement:

File checksums (non-adversarial): Verifying a download wasn't corrupted in transit (not tampered with by an attacker). Hash tables / data deduplication: When adversarial collisions aren't a concern. Non-cryptographic fingerprinting: Database row hashing for quick equality checks. Legacy protocol compatibility: MD5 is still in some network protocols where migration is impractical.

Rule: if an attacker controls the input, never use MD5.

md5_acceptable.py · PYTHON
123456789101112
import hashlib

# Acceptable: verify file download integrity (not adversarial)
def verify_download(filepath: str, expected_md5: str) -> bool:
    return hashlib.md5(open(filepath,'rb').read()).hexdigest() == expected_md5

# Acceptable: fast deduplication key (not security-critical)
def dedup_key(content: bytes) -> str:
    return hashlib.md5(content).hexdigest()

# NOT acceptable: password hashing
# bad = hashlib.md5(password.encode()).hexdigest()  # Never do this!

🎯 Key Takeaways

  • MD5 produces 128-bit digest — faster than SHA-256 but cryptographically broken.
  • Collision attacks: two different inputs with same MD5 can be found in seconds today.
  • NEVER use MD5 for passwords, digital signatures, or certificates.
  • MD5 is acceptable for non-security checksums, deduplication, and hash table keys.
  • Migration path: replace MD5 with SHA-256 (same API, just change the function name).

Interview Questions on This Topic

  • QWhy is MD5 considered broken? What specific property failed?
  • QWhere is it still acceptable to use MD5?
  • QWhat is the difference between a pre-image attack and a collision attack?
  • QWhat should you use instead of MD5 for password hashing?

Frequently Asked Questions

If MD5 is broken, why is it still everywhere?

Legacy systems, inertia, and many uses don't require collision resistance. Package managers (historical), FTP servers, and internal tools often still use MD5 for non-security checksums where it's perfectly fine. Security-critical uses have largely migrated to SHA-256.

🔥
Naren Founder & Author

Developer and founder of TheCodeForge. I built this site because I was tired of tutorials that explain what to type without explaining why it works. Every article here is written to make concepts actually click.

← PreviousSHA-256 — Cryptographic Hash FunctionNext →Universal Hashing and Perfect Hashing
Forged with 🔥 at TheCodeForge.io — Where Developers Are Forged