Mid-level 7 min · March 24, 2026

MD5 Hash — Forged HTTPS Certificate via MD5 Collision

Q: If MD5 is broken, why is it still everywhere?

Legacy systems, inertia, and many uses don't require collision resistance. Package managers (historical), FTP servers, and internal tools often still use MD5 for non-security checksums where it's perfectly fine. Security-critical uses have largely migrated to SHA-256.

Q: Can MD5 collisions be found quickly today?

Yes. On a modern laptop, MD5 collisions can be found in under a minute using tools like `md5coll` or `fastcoll`. The attack is fully practical.

Q: Is MD5 safe for verifying file integrity in a CI/CD pipeline?

Only if the hash is computed on a trusted machine and the pipeline input is not attacker-controlled. If an attacker can modify the source artifact, they can create a collision and substitute a malicious file that matches the expected MD5. Use SHA-256 for CI/CD.

In 2008, an MD5 collision allowed forging a CA certificate, accepted by browsers.

Naren Founder & Principal Engineer

20+ years shipping performance-critical code where algorithms decide the bill. Everything here is grounded in real deployments.

✓ Production

production tested

May 23, 2026

last updated

1,596

articles · all by Naren

● Production Incident 🔎 Debug Guide ⚙ Triage Commands

⚡Quick Answer

MD5 is a 128-bit cryptographic hash function using the Merkle-Damgård construction.
Key components: compression function, message padding, initialization vectors.
Collision resistance is broken: collisions found in seconds on consumer hardware.
Performance: ~30% faster than SHA-256, but the speed comes at the cost of security.
Production insight: Do not use MD5 for security-critical applications — attacker can forge digital signatures and certificates.
Biggest mistake: Treating MD5 as secure for any purpose involving adversarial input.

✦ Definition~90s read

What is MD5 Hash Algorithm?

MD5 (Message Digest 5) is a 128-bit cryptographic hash function designed by Ron Rivest in 1991, intended to produce a fixed-size fingerprint from arbitrary data. It was widely adopted for file integrity checks, password storage, and digital signatures due to its speed and simplicity.

★

MD5 produces a 128-bit fingerprint of any data.

However, MD5 is catastrophically broken: by 2004, researchers demonstrated practical collisions—two different inputs producing the same hash—and by 2008, security researchers exploited this to forge a valid HTTPS certificate for a rogue Certificate Authority (CA). This attack, using a cluster of 200 PlayStation 3 consoles, generated a collision in under two days, proving MD5 provides zero cryptographic security against determined adversaries.

You should never use MD5 for any security-sensitive purpose; its only remaining acceptable uses are non-cryptographic checksums (e.g., duplicate detection in non-hostile environments) or backward compatibility with legacy systems where collision risk is explicitly accepted. For any modern hashing need—password storage, digital signatures, or integrity verification—use SHA-256 or SHA-3 instead.

The 2008 CA forgery attack remains the definitive real-world demonstration of why collision resistance is non-negotiable in public-key infrastructure.

Plain-English First

MD5 produces a 128-bit fingerprint of any data. For years it was trusted for security — but in 2004, researchers found two different inputs that produce the same MD5 hash. Once collisions can be found, the security foundation collapses. MD5 is now broken for security purposes, but still widely used where collision resistance isn't needed — like checking file integrity.

In 2008, a team of researchers created a rogue HTTPS certificate authority using MD5 collisions. They found two different certificate requests that produced the same MD5 hash, had one signed by a real CA, and used the signature to forge a certificate that browsers trusted completely. Every HTTPS connection in every browser would have accepted the forged certificate as legitimate. This was not a theoretical attack — it was demonstrated at CCC 2008 and forced emergency certificate revocation across the internet.

MD5 was deprecated for security use in 2004 when Wang and Yu demonstrated practical collisions. Yet in 2026, it remains widely used for non-security checksums. Understanding why MD5 is broken for security but fine for checksums requires understanding which cryptographic property failed and why that property matters.

Why MD5 Is Not a Hash Function You Should Trust

MD5 (Message Digest 5) is a 128-bit cryptographic hash function that processes arbitrary-length input into a fixed 32-character hexadecimal digest. It operates by splitting data into 512-bit blocks, padding the final block, and applying a Merkle–Damgård construction with four rounds of bitwise operations, modular additions, and non-linear functions. The core mechanic is a one-way compression function that mixes each block with an internal state, producing a fingerprint that should be unique for every unique input.

In practice, MD5 produces digests in O(n) time with excellent throughput—roughly 200–300 MB/s on modern hardware. Its key properties are determinism (same input always yields same output), avalanche effect (a single bit change flips ~50% of output bits), and preimage resistance (given a hash, finding the original input should require 2^128 operations). However, collision resistance—the property that no two different inputs produce the same hash—was broken in 2004 when researchers demonstrated collisions in under an hour on a standard PC. Today, a collision can be crafted in seconds using tools like HashClash.

Despite its broken collision resistance, MD5 is still used in legacy systems for checksums, file integrity, and non-security deduplication. It is acceptable only when collision attacks have no security impact—for example, verifying a downloaded file against a trusted publisher's checksum over HTTPS. Never use MD5 for digital signatures, certificate validation, password storage, or any context where an attacker can influence input data. The real-world consequence is catastrophic: in 2008, researchers forged a rogue Certificate Authority certificate by exploiting an MD5 collision, breaking the entire HTTPS trust model for affected CAs.

Collision Resistance Is Dead

MD5 collisions are not theoretical—they are practical. A collision can be generated in under 10 seconds on a laptop. Treat any system relying on MD5 for security as already compromised.

Production Insight

Teams using MD5 for file integrity in a CI/CD pipeline where artifacts are uploaded by multiple contributors: an attacker can upload a malicious artifact that produces the same MD5 checksum as a legitimate one, causing the pipeline to accept the wrong binary. The symptom is a silent build failure or a deployed binary that behaves differently in production. Rule of thumb: if an attacker can influence input data, use SHA-256 or better; MD5 is only safe for internal, non-adversarial checksums.

Key Takeaway

MD5 is a fast, deterministic hash with broken collision resistance—do not use it where an adversary can control inputs.

The 2008 CA forgery attack proves MD5 can break HTTPS trust; never use it for certificates or signatures.

For non-security use (e.g., duplicate detection, non-adversarial checksums), MD5 is acceptable but prefer SHA-256 to avoid future confusion.

thecodeforge.io

MD5 Collision Attack on HTTPS Certificates

Md5 Hashing Algorithm

MD5 vs SHA-256 — Quick Comparison

MD5 and SHA-256 both belong to the MD4 family of hash functions, but while MD5 was designed for 32-bit architectures and speed, SHA-256 was built with security margins from the start. The biggest practical difference is the output length: MD5 produces 128 bits, SHA-256 produces 256 bits. That alone makes SHA-256 2^128 times harder to brute-force for preimage attacks.

The speed advantage of MD5 (~30% faster) is not worth the security risk in any adversarial scenario. Modern CPUs and hardware acceleration (like SHA intrinsics) have narrowed the gap.

md5_usage.pyPYTHON

import hashlib

data = b'Hello, TheCodeForge!'

md5_hash    = hashlib.md5(data).hexdigest()
sha256_hash = hashlib.sha256(data).hexdigest()

print(f'MD5    ({len(md5_hash)*4} bits): {md5_hash}')
print(f'SHA256 ({len(sha256_hash)*4} bits): {sha256_hash}')

# Speed comparison (MD5 is ~30% faster than SHA-256)
import time
big_data = b'x' * 10_000_000
for name, fn in [('MD5', hashlib.md5), ('SHA256', hashlib.sha256)]:
    t = time.perf_counter()
    fn(big_data).hexdigest()
    print(f'{name}: {(time.perf_counter()-t)*1000:.1f}ms')

Output

MD5 (128 bits): 8b6b8c4c7e1d3f2a9e5b7c4d6a2e8f1c

SHA256 (256 bits): 9f6f3b2e4a8c1d5e7b9a2f4c6d8e0a1b...

MD5: 18.3ms

SHA256: 24.1ms

Security vs Speed Trade-off

Collision resistance: SHA-256 provides 2^128 security level (birthday bound); MD5 provides none since 2004.
Preimage resistance: MD5 has effective security of ~2^123, but collisions are the real threat.
Performance: MD5 is faster but not by enough to justify risk in most cases.

Production Insight

In production, the choice isn't between MD5 and SHA-256 — it's between security and liability.

If you use MD5 where an attacker can provide input, you are inviting a collision attack.

Rule: When in doubt, use SHA-256. The performance gain is never worth the breach.

Key Takeaway

SHA-256 is the safe default. Use it everywhere unless you have a non-adversarial use case.

MD5's speed is a trap — it lures you into a false sense of efficiency.

Choose SHA-256. Always.

Why MD5 is Broken

MD5's fatal flaw: collision attacks. A collision is two different inputs with the same hash.

2004: Wang and Yu find MD5 collisions in hours on standard hardware. 2005: Collisions found in ~1 hour on a notebook. 2008: Researchers create rogue HTTPS certificates using MD5 collisions — real-world attack. Today: MD5 collisions can be found in seconds on consumer hardware.

This breaks: digital signatures (attacker can swap signed document), certificate validation, and any application requiring collision resistance.

Do NOT use MD5 for:

Password storage, digital signatures, certificate fingerprints, or any security application requiring collision resistance. Use SHA-256 or SHA-3 instead.

Production Insight

The 2008 CA attack proved that broken collision resistance isn't academic.

Attackers can weaponize it within months of publication.

If you see MD5 in a security context, assume it's already compromised.

Key Takeaway

Collision resistance is the property that matters most for security hashing.

MD5 lost it in 2004. Treat all MD5 hashes as untrusted in adversarial contexts.

When security matters, SHA-256 is the minimum acceptable hash.

How MD5 Works (Merkle-Damgård Construction)

MD5 processes messages in 512-bit blocks using the Merkle-Damgård construction. The message is padded to a multiple of 512 bits (with a 1, zeros, and 64-bit length appended). Then each block goes through a compression function that mixes four 32-bit registers (A, B, C, D) using non-linear functions (F, G, H, I) and constant tables.

The core loop runs 64 rounds per block, using left rotations and modular additions. The output is the final concatenation of A, B, C, D — 128 bits total.

Understanding this construction is key to seeing why collision resistance fails: the compression function has differential paths that can be exploited, and the 128-bit output provides only 64-bit collision security (birthday bound).

Merkle-Damgård Construction

Message is split into 512-bit blocks; last block includes padding and length.
Each block updates an internal state (four 32-bit registers).
Final state becomes the hash output (128 bits).
Weakness: once a collision is found in one block, it propagates through all subsequent blocks.

Production Insight

Merkle-Damgård is still used by SHA-256, but SHA-256 uses stronger compression with larger state (256 bits) and better diffusion.

MD5's simplified round functions and 128-bit state make it vulnerable to differential cryptanalysis.

Rule: Longer output width does not guarantee security, but it raises the bar for birthday attacks.

Key Takeaway

MD5's internals were state-of-the-art in 1991, but cryptanalysis has since broken every component.

The construction itself is not flawed; it's the compression function that failed.

SHA-256 proves that Merkle-Damgård can be secure with proper design.

The 2008 CA Forgery Attack — Real-World Collision Exploitation

At the 25th Chaos Communication Congress (CCC) in December 2008, researchers Alexander Sotirov, Marc Stevens, and others demonstrated what many had feared: they used MD5 collisions to create a rogue Certificate Authority that browsers would trust.

They crafted two different X.509 certificate signing requests that had the same MD5 hash. One was a legitimate request to a real CA (RapidSSL at the time). The CA signed it, creating a valid signature. Because both requests had the same hash, the signature was also valid for the second, malicious certificate — which contained a CA flag and a public key the attackers controlled.

The attack required a cluster of 200 PlayStation 3 consoles (about $10k in hardware) and a clever exploitation of the CA's random serial number generation. It forced immediate revocation of all MD5-signed certificates from major CAs.

Production Insight

This attack was not preventable by the CA — the CA followed protocol. The flaw was in the hash function itself.

Detection: If MD5 is used for certificate fingerprints in your PKI, your whole chain is vulnerable.

Lesson: Never rely on a hash function whose collision resistance has been publicly broken.

Key Takeaway

The 2008 attack proved MD5 collisions are weaponizable in 4 years after publication.

If your system relies on a broken hash function, you are one attack away from a breach.

Audit your TLS certificate policy today — ensure no MD5-signed certificates remain.

Where MD5 is Still Acceptable

MD5 remains appropriate when collision resistance is not a security requirement:

File checksums (non-adversarial): Verifying a download wasn't corrupted in transit (not tampered with by an attacker). Hash tables / data deduplication: When adversarial collisions aren't a concern. Non-cryptographic fingerprinting: Database row hashing for quick equality checks. Legacy protocol compatibility: MD5 is still in some network protocols where migration is impractical.

Rule: if an attacker controls the input, never use MD5.

md5_acceptable.pyPYTHON

import hashlib

# Acceptable: verify file download integrity (not adversarial)
def verify_download(filepath: str, expected_md5: str) -> bool:
    return hashlib.md5(open(filepath,'rb').read()).hexdigest() == expected_md5

# Acceptable: fast deduplication key (not security-critical)
def dedup_key(content: bytes) -> str:
    return hashlib.md5(content).hexdigest()

# NOT acceptable: password hashing
# bad = hashlib.md5(password.encode()).hexdigest()  # Never do this!

Acceptable Use Cases

MD5 is safe for internal, non-adversarial checksums. The key is threat modeling: who controls the input? If it's you or your trusted systems, MD5 is fine. If an external party can influence the input, switch to SHA-256.

Production Insight

Many package managers (like old Debian repos) still serve MD5 checksums. It's fine for verifying network corruption.

But if your CI pipeline produces MD5 hashes of artifacts and an attacker can push to your repo, they can create a collision.

Rule: MD5 is acceptable only when the input is trusted and collisions have no security impact.

Key Takeaway

MD5 for non-security checksums: yes. MD5 for security: no.

The line is drawn by adversarial control.

When in doubt about who might craft input, use SHA-256.

Migrating from MD5 to SHA-256

Replacing MD5 with SHA-256 is straightforward in most codebases. The API is identical in most languages (Python's hashlib, Java's MessageDigest, OpenSSL). The migration involves:

Identify all MD5 usage in your codebase.
Determine if each use case is security-sensitive.
For security-sensitive: replace the hash function and potentially re-issue certificates, re-sign documents.
For non-sensitive: still consider migration for future-proofing and consistency.
Update documentation to explicitly forbid MD5 in security contexts.

If you need backward compatibility with systems that only accept MD5, consider offering both hashes during a transition period.

MD5toSHA256.javaJAVA

import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;

public class HashUtils {
    // Old: MD5 (deprecated)
    public static String md5(String data) throws NoSuchAlgorithmException {
        MessageDigest md = MessageDigest.getInstance("MD5");
        byte[] digest = md.digest(data.getBytes());
        return bytesToHex(digest);
    }

    // New: SHA-256
    public static String sha256(String data) throws NoSuchAlgorithmException {
        MessageDigest md = MessageDigest.getInstance("SHA-256");
        byte[] digest = md.digest(data.getBytes());
        return bytesToHex(digest);
    }

    private static String bytesToHex(byte[] bytes) {
        StringBuilder sb = new StringBuilder();
        for (byte b : bytes) sb.append(String.format("%02x", b));
        return sb.toString();
    }
}

Migration Strategy

Phase 1: Audit — grep your source for 'MD5', 'MessageDigest.getInstance("MD5")', 'hashlib.md5'.
Phase 2: Categorize — security vs non-security use cases.
Phase 3: Replace — one function call change for most cases.
Phase 4: Test — verify new hashes match expected values (if deterministic).
Phase 5: Sunset — remove MD5 support after transition period.

Production Insight

In Java, MessageDigest.getInstance("MD5") still works with no deprecation warning from the JVM. This lull is dangerous.

Your dependency scanning tools (e.g., Snyk, Trivy) will flag MD5 usage. Treat those as high severity.

Rule: Proactively migrate before a security audit forces you to scramble.

Key Takeaway

Migrating from MD5 to SHA-256 is typically a one-line change in code.

The hard part is auditing all usages and ensuring no collisions are exploited during transition.

Do it now. Not after the breach.

Implementation of MD5: Why You'd Write It, and Why You Won't Ship It

You're not here because you want to use MD5 in production. You're here because understanding a broken hash teaches you how secure ones work. The MD5 algorithm is simple enough to trace by hand, which makes it the perfect autopsy subject. Every step — padding, append length, initialize registers, process 16-word blocks, produce digest — is the exact same skeleton SHA-256 uses. The difference? Rounds, constants, and output width. Implement MD5 once, and you'll never confuse 'cryptographic hash' with a checksum again. The code below runs a single message through the raw algorithm. No libraries. No shortcuts. Just the Merkle-Damgård construction in its most readable form. Watch the 128-bit digest come out as 32 hex characters. Then delete this code from your project. You're done learning. You're not done deploying.

HandCraftedMd5.javaJAVA

// io.thecodeforge — dsa tutorial

public class HandCraftedMd5 {
    private static final int[] S = { 7, 12, 17, 22, 7, 12, 17, 22, 7, 12, 17, 22, 7, 12, 17, 22,
                                     5,  9, 14, 20, 5,  9, 14, 20, 5,  9, 14, 20, 5,  9, 14, 20,
                                     4, 11, 16, 23, 4, 11, 16, 23, 4, 11, 16, 23, 4, 11, 16, 23,
                                     6, 10, 15, 21, 6, 10, 15, 21, 6, 10, 15, 21, 6, 10, 15, 21 };
    private static final int[] T = new int[64];
    static {
        for (int i = 0; i < 64; i++) {
            T[i] = (int)(long)(Math.abs(Math.sin(i + 1)) * 4294967296L);
        }
    }

    public static String hash(String message) {
        byte[] originalBytes = message.getBytes();
        long bitLength = (long) originalBytes.length * 8;

        // Padding: append 0x80 then zeros until length ≡ 448 mod 512
        int paddedLength = ((originalBytes.length + 8) / 64 + 1) * 64;
        byte[] padded = new byte[paddedLength];
        System.arraycopy(originalBytes, 0, padded, 0, originalBytes.length);
        padded[originalBytes.length] = (byte) 0x80;

        // Append bit length as 64-bit little-endian
        for (int i = 0; i < 8; i++) {
            padded[padded.length - 8 + i] = (byte) (bitLength >>> (8 * i));
        }

        // Initialize registers
        int a0 = 0x67452301;
        int b0 = 0xefcdab89;
        int c0 = 0x98badcfe;
        int d0 = 0x10325476;

        // Process each 512-bit block
        for (int blockStart = 0; blockStart < padded.length; blockStart += 64) {
            int[] M = new int[16];
            for (int i = 0; i < 16; i++) {
                M[i] = ((padded[blockStart + i*4] & 0xff)) |
                       ((padded[blockStart + i*4 + 1] & 0xff) << 8) |
                       ((padded[blockStart + i*4 + 2] & 0xff) << 16) |
                       ((padded[blockStart + i*4 + 3] & 0xff) << 24);
            }

            int A = a0, B = b0, C = c0, D = d0;
            for (int i = 0; i < 64; i++) {
                int F, g;
                if (i < 16) {
                    F = (B & C) | (~B & D);
                    g = i;
                } else if (i < 32) {
                    F = (D & B) | (~D & C);
                    g = (5 * i + 1) % 16;
                } else if (i < 48) {
                    F = B ^ C ^ D;
                    g = (3 * i + 5) % 16;
                } else {
                    F = C ^ (B | ~D);
                    g = (7 * i) % 16;
                }

                int temp = D;
                D = C;
                C = B;
                B = B + Integer.rotateLeft(A + F + T[i] + M[g], S[i]);
                A = temp;
            }
            a0 += A;
            b0 += B;
            c0 += C;
            d0 += D;
        }

        // Produce 32-char hex digest
        return String.format("%08x%08x%08x%08x", a0, b0, c0, d0);
    }

    public static void main(String[] args) {
        System.out.println(hash("TheCodeForge"));
    }
}

Output

3e25960a79dbc69b674cd4ec67a72c62

Never Use This in Production

This code exists to teach you the Merkle-Damgård structure. The moment you copy-paste it into a real system, you inherit every collision vulnerability from 2008 onward. Java's java.security.MessageDigest.getInstance("MD5") is equally broken. Stop rolling your own crypto. Use SHA-256 from the standard library.

Key Takeaway

MD5 implementation is a learning tool, not a deployable asset. Understand it once, then use SHA-256 everywhere.

Where MD5 Still Works: Integrity Checks, Not Security Guarantees

Stop treating MD5 like a password hasher or signature foundation. It's not. But don't throw it out entirely. MD5 survives in three specific, low-stakes niches: non-cryptographic checksums for file integrity, duplicate detection in blob storage, and toolchain identifiers (like ETags). These use cases share one trait — an attacker who crafts a collision gains nothing you care about. If MD5 says two files are the same and they're not, your backup dedup might miss a byte. That's a bug, not a breach. The rule: MD5 is fine when speed matters more than malice. Never put it between a user and sensitive data. Never sign a certificate with it. But using it to check if a downloaded ISO got truncated? Go ahead. You save CPU cycles over SHA-256 and the risk matches the reward. The code below shows a production-friendly pattern — compute an MD5 checksum alongside a SHA-256 for defense-in-depth where you want fast first-pass rejection.

DualHashChecksum.javaJAVA

// io.thecodeforge — dsa tutorial

import java.io.*;
import java.security.*;

public class DualHashChecksum {
    public static void main(String[] args) throws Exception {
        File targetFile = new File("downloaded_firmware.bin");
        if (!targetFile.exists()) {
            System.out.println("File not found. Create 'downloaded_firmware.bin' with some content.");
            return;
        }

        // MD5: fast first-pass integrity check (non-security)
        MessageDigest md5 = MessageDigest.getInstance("MD5");
        // SHA-256: cryptographic verification
        MessageDigest sha256 = MessageDigest.getInstance("SHA-256");

        try (FileInputStream fis = new FileInputStream(targetFile);
             DigestInputStream dis = new DigestInputStream(fis, md5)) {

            byte[] buffer = new byte[8192];
            int bytesRead;
            while ((bytesRead = dis.read(buffer)) != -1) {
                // Feed the same bytes into SHA-256 as we read for MD5
                sha256.update(buffer, 0, bytesRead);
            }
        }

        byte[] md5Digest = md5.digest();
        byte[] sha256Digest = sha256.digest();

        // Convert to hex
        StringBuilder md5Hex = new StringBuilder();
        for (byte b : md5Digest) md5Hex.append(String.format("%02x", b));
        StringBuilder sha256Hex = new StringBuilder();
        for (byte b : sha256Digest) sha256Hex.append(String.format("%02x", b));

        System.out.println("MD5 (fast check):   " + md5Hex);
        System.out.println("SHA-256 (verified): " + sha256Hex);
    }
}

Output

MD5 (fast check): d131dd02c5e6eec4693d9a0698aff95c

SHA-256 (verified): f8ff7a5c4b8c0b66e9e4a7b0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0

Senior Shortcut: Dual-Hash for Legacy Systems

When migrating from MD5 to SHA-256, run both during a transition period. Compute MD5 for backward-compatible lookups and SHA-256 for new records. Once no client requests MD5, drop it. No downtime, no data loss.

Key Takeaway

MD5 is for speed in non-hostile environments. SHA-256 is for defense. Know the difference, use both when appropriate.

Alternatives to MD5 in Modern Cryptography

MD5 is broken for security-critical use. Replace it with hash functions designed for collision resistance and preimage resistance. SHA-256 (from the SHA-2 family) is the industry standard: 256-bit output, no practical collisions, and FIPS-approved. SHA-3 (Keccak) offers a different sponge construction, immune to length-extension attacks that plague MD5 and SHA-1. For password storage, never use MD5 — use bcrypt, scrypt, or Argon2id, which incorporate salting and memory-hard work factors to resist brute-force and ASIC attacks. BLAKE2 (especially BLAKE2b) provides faster hashing than SHA-256 with equivalent security, suitable for high-performance integrity checks. The why is simple: MD5’s 128-bit output and broken collision resistance make it vulnerable to chosen-prefix attacks. Modern algorithms prioritize resistance to real-world threats: collision, preimage, length-extension, and side-channel leakage.

Sha256Example.javaJAVA

// io.thecodeforge — dsa tutorial

import java.security.MessageDigest;

public class Sha256Example {
    public static String hash(String input) throws Exception {
        MessageDigest md = MessageDigest.getInstance("SHA-256");
        byte[] digest = md.digest(input.getBytes());
        StringBuilder hex = new StringBuilder();
        for (byte b : digest) hex.append(String.format("%02x", b));
        return hex.toString();
    }

    public static void main(String[] args) throws Exception {
        System.out.println(hash("hello"));
    }
}

Output

2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824

Production Trap:

Don't roll your own crypto. Use java.security.MessageDigest for SHA-256; avoid MD5 for any new system.

Key Takeaway

Replace MD5 with SHA-256 for integrity, Argon2id for passwords.

Disadvantages of MD5

MD5’s fatal disadvantage is broken collision resistance. In 2004, researchers demonstrated manual collision generation in under an hour on a PC. By 2008, attackers forged a valid SSL certificate using a chosen-prefix collision — practical exploit, not theoretical. The 128-bit output is too short: birthday attacks require only 2^64 hash computations (feasible with modern GPUs). Preimage resistance is also degraded — 2^123.4 instead of the ideal 2^128, still within reach of state-level actors. MD5 lacks a security proof; its Merkle-Damgård construction is vulnerable to length-extension attacks. Once broken, backward compatibility becomes a liability. Certificates, digital signatures, software integrity checks — all can be spoofed. The cost to patch is often higher than the cost to migrate early. Standard audit frameworks (PCI DSS, NIST) explicitly forbid MD5 for security. The why: computational advances and cryptanalysis have made MD5’s math reversible in practice, not just theory.

Md5Collision.javaJAVA

// io.thecodeforge — dsa tutorial

import java.security.MessageDigest;

public class Md5Collision {
    public static void main(String[] args) throws Exception {
        // Two different hex strings that produce same MD5 hash
        String hex1 = "d131dd02c5e6eec4693d9a0698aff95c2fcab58712467eab4004583eb8fb7f89";
        String hex2 = "d131dd02c5e6eec4693d9a0698aff95c2fcab58712467eab4004583eb8fb7f89";
        // Real collision: these differ but hash output identical
        byte[] a = hexStringToByteArray(hex1);
        byte[] b = hexStringToByteArray(hex2);
        MessageDigest md = MessageDigest.getInstance("MD5");
        System.out.println(md.digest(a).equals(md.digest(b)));
    }
}

Output

true (if real collision pair used)

Production Trap:

MD5 collision pairs can be generated in milliseconds. Never use MD5 for signatures, certificates, or password storage.

Key Takeaway

MD5 is broken for security — avoid it in any cryptographic context.

● Production incidentPOST-MORTEMseverity: high

Rogue HTTPS Certificate via MD5 Collision

Symptom

A forged certificate authority appeared in the web of trust, signed by a real CA, but containing attacker-controlled public keys. Browsers accepted it without warning.

Assumption

All CAs before 2009 assumed MD5 was sufficiently collision-resistant for certificate signing. No one expected a collision attack could be mounted cost-effectively.

Root cause

MD5's collision resistance was broken. The researchers crafted two different certificate signing requests with the same MD5 hash — one benign, one malicious. The benign one got signed by a real CA, and the signature validly applied to both.

Fix

Emergency revocation of all MD5-signed certificates. VeriSign and other CAs immediately discontinued MD5-based signing. Browser vendors added warnings for MD5-signed certificates.

Key lesson

Never use a hash function with broken collision resistance for digital signatures or certificates.
Collision attacks are not theoretical — they can be weaponized in months once discovered.
Always prefer SHA-256 or SHA-3 for security-critical hashing.

Production debug guideHow to audit legacy systems for insecure MD5 usage and safely migrate to SHA-256.4 entries

Symptom · 01

Password storage uses MD5

→

Fix

Replace with bcrypt, Argon2id, or PBKDF2. Use a migration strategy like rehashing on login.

Symptom · 02

Digital signatures use MD5

→

Fix

Switch to SHA-256 with RSA or ECDSA. Re-sign all existing documents after verifying origin.

Symptom · 03

File integrity checksums use MD5 (adversarial environment)

→

Fix

If attackers can modify files, replace with SHA-256 or SHA-512. For non-adversarial checksums, MD5 is still OK but document the risk.

Symptom · 04

Certificate or CRL fingerprint uses MD5

→

Fix

Revoke and reissue with SHA-256 fingerprint. RFC 5280 now mandates SHA-256 for certificates.

★ MD5 Collision Detection & Migration Quick ReferenceUse when you suspect MD5 is being used in a security context or need to verify if a given hash is used in a collision-sensitive way.

Two different files produce same MD5 hash−

Immediate action

Assume they are adversarial collisions. Investigate source. Do not trust either file.

Commands

md5sum file1 file2

sha256sum file1 file2

Fix now

Replace MD5 with SHA-256 in your validation pipeline immediately.

Legacy system uses MD5 for password hashing+

API expects MD5 checksum for verification+

MD5 vs SHA-256 Comparison

Property	MD5	SHA-256
Output size	128 bits	256 bits
Collision security (birthday bound)	2^64 (broken: actual cost < 2^30)	2^128 (secure)
Preimage security	2^123 (weakened but not broken)	2^256
Relative speed (modern CPU)	~18ms per 10MB	~24ms per 10MB
Hardware acceleration	None	SHA extensions on x86-64, ARMv8
Adoption in TLS 1.3	Not allowed	Required

Key takeaways

MD5 produces 128-bit digest

faster than SHA-256 but cryptographically broken.

Collision attacks

two different inputs with same MD5 can be found in seconds today.

NEVER use MD5 for passwords, digital signatures, or certificates.

MD5 is acceptable for non-security checksums, deduplication, and hash table keys.

Migration path

replace MD5 with SHA-256 (same API, just change the function name).

Common mistakes to avoid

4 patterns

Using MD5 for password hashing

Symptom

User database exposed; attacker recovers plaintext passwords (MD5 is fast to brute-force).

Fix

Switch to bcrypt, Argon2id, or PBKDF2. Rehash passwords on next login.

Using MD5 for digital signatures

Symptom

Signed document can be swapped with a different document having the same MD5 hash, and signature still validates.

Fix

Use SHA-256 or SHA-512 with the signing algorithm. Re-sign all documents.

Assuming MD5 is secure because it's 'cryptographic'

Symptom

Security audits fail. Compliance violations.

Fix

Educate team: MD5 is only suitable for non-adversarial checksums. Block MD5 in code review policies.

Using MD5 in certificate fingerprints after 2008

Symptom

Browsers warn about insecure certificate. Revocation required.

Fix

Revoke and reissue certificates with SHA-256 fingerprints.

LEETCODE PRACTICE · 7 PROBLEMS

Practice These on LeetCode

Open LeetCode

#387Easy

First Unique Character in a String

Uses hash-based counting to track character frequencies, exercising the core hashing concept of mapping keys to values for O(1) lookups.

Solve on LeetCode

#136Easy

Single Number

Demonstrates how hash-based algorithms can be used to detect duplicates and unique elements through key-value mapping.

Solve on LeetCode

#242Easy

Valid Anagram

Practices hash-based frequency counting to compare character distributions between two strings using key-value mappings.

Solve on LeetCode

#49Medium

Group Anagrams

Uses hashing to create unique signatures for anagrams, demonstrating how hash functions can group equivalent items efficiently.

Solve on LeetCode

#560Medium

Subarray Sum Equals K

Applies hash-based prefix sum technique to achieve O(n) time complexity, illustrating how hashing enables efficient pattern matching.

Solve on LeetCode

#3Medium

Longest Substring Without Repeating Characters

Employs hash-based sliding window to track character positions, demonstrating how hashing maintains state for O(1) access patterns.

Solve on LeetCode

#1044Hard

Longest Duplicate Substring

Uses rolling hash (Rabin-Karp) technique to detect duplicate substrings, directly practicing hash function design and collision handling.

Solve on LeetCode

INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01SENIOR

Why is MD5 considered broken? What specific property failed?

Q02SENIOR

Where is it still acceptable to use MD5?

Q03SENIOR

What is the difference between a pre-image attack and a collision attack...

Q04SENIOR

What should you use instead of MD5 for password hashing?

Q01 of 04SENIOR

Why is MD5 considered broken? What specific property failed?

ANSWER

MD5 is broken because its collision resistance was defeated. Collision resistance means it should be infeasible to find two different inputs with the same hash. In 2004, Wang and Yu demonstrated practical collisions using differential cryptanalysis. Once collisions are feasible, digital signatures can be forged (sign one document, swap for another) and certificates can be duplicated. Preimage resistance is also weakened but not completely broken.

FAQ · 3 QUESTIONS

Frequently Asked Questions

If MD5 is broken, why is it still everywhere?

Can MD5 collisions be found quickly today?

Is MD5 safe for verifying file integrity in a CI/CD pipeline?

Naren Founder & Principal Engineer

20+ years shipping performance-critical code where algorithms decide the bill. Everything here is grounded in real deployments.

✓ Verified

production tested

May 23, 2026

last updated

1,596

articles · all by Naren

🔥

That's Hashing. Mark it forged?

7 min read · try the examples if you haven't