DSA Beginner

Run-Length Encoding — Lossless Compression

📅 March 24, 2026 ⏱ 8 min read 🎯 Beginner

Where developers are forged. · Structured learning · Free forever.

📍 Part of: Data Compression → Topic 1 of 4

Learn Run-Length Encoding (RLE) compression.

🧑‍💻 Beginner-friendly — no prior DSA experience needed

In this tutorial, you'll learn:

RLE replaces consecutive identical symbols with (count, symbol) — O(n) encode and decode.
Compression ratio depends entirely on run length — 720 bytes of fax → 6 bytes (120x).
Fails on alternating or random data — can expand the input.

✦ Plain-English analogy ✦ Real code with output ✦ Interview questions

⚡ Quick Answer

Run-Length Encoding is the simplest compression idea: instead of storing AAABBBBBCCCC, store 3A5B4C. Consecutive runs of the same character are replaced by a count and the character. It works brilliantly on data with long runs (fax images, simple graphics) and fails badly on random data (English text) where it can make things larger.

Run-Length Encoding is the first compression algorithm most people encounter, and it teaches the fundamental tension in all compression: the algorithm's effectiveness depends entirely on the structure of the data. For a fax transmission of a printed page — mostly white pixels with sparse black text — RLE achieves 50:1 compression. For a natural photograph, RLE can make the file larger.

This data-dependency is not a bug; it is the nature of compression. Shannon's source coding theorem says you can only compress data if it contains redundancy. RLE exploits one specific type of redundancy: runs of identical symbols. Understanding when your data has this structure — and when it doesn't — is the practical skill.

Basic RLE Implementation

rle.py · PYTHON

123456789101112131415161718192021222324252627282930313233343536

def rle_encode(data: str) -> list[tuple]:
    """Returns list of (count, char) pairs."""
    if not data:
        return []
    encoded = []
    count = 1
    for i in range(1, len(data)):
        if data[i] == data[i-1]:
            count += 1
        else:
            encoded.append((count, data[i-1]))
            count = 1
    encoded.append((count, data[-1]))
    return encoded

def rle_decode(encoded: list[tuple]) -> str:
    return ''.join(ch * count for count, ch in encoded)

def rle_ratio(data: str) -> float:
    encoded = rle_encode(data)
    # Each pair needs 1 byte count + 1 byte char = 2 bytes
    compressed_size = len(encoded) * 2
    return len(data) / compressed_size

# Good case: long runs
fax_line = 'W' * 400 + 'B' * 20 + 'W' * 300  # typical fax line
print(f'Fax: {len(fax_line)} → {len(rle_encode(fax_line))*2} bytes ({rle_ratio(fax_line):.1f}x)')

# Bad case: alternating
alternating = 'ABABABABABABABAB'
print(f'Alternating: {len(alternating)} → {len(rle_encode(alternating))*2} bytes ({rle_ratio(alternating):.2f}x)')

# Typical bitmap
bitmap = 'W'*50+'B'*3+'W'*47
enc = rle_encode(bitmap)
print(f'Encoded: {enc}')

▶ Output

Fax: 720 → 6 bytes (120.0x)
Alternating: 16 → 32 bytes (0.50x)
Encoded: [(50, 'W'), (3, 'B'), (47, 'W')]

Binary RLE and Escape Codes

For binary data, a compact format uses escape codes to handle runs of 1 in the literal representation.

binary_rle.py · PYTHON

123456789101112131415161718

def rle_encode_bytes(data: bytes) -> bytes:
    """Compact binary RLE: [count][byte] pairs, count in 1 byte (max 255)."""
    if not data:
        return b''
    result = bytearray()
    i = 0
    while i < len(data):
        byte = data[i]
        count = 1
        while i + count < len(data) and data[i+count] == byte and count < 255:
            count += 1
        result.extend([count, byte])
        i += count
    return bytes(result)

raw = bytes([255]*100 + [0]*200 + [128]*50)
compressed = rle_encode_bytes(raw)
print(f'Raw: {len(raw)} bytes → Compressed: {len(compressed)} bytes ({len(raw)/len(compressed):.1f}x)')

▶ Output

Raw: 350 bytes → Compressed: 6 bytes (58.3x)

When RLE Helps and Hurts

Helps dramatically: - Fax transmission (CCITT T4 standard uses modified Huffman + RLE) - BMP images with large uniform regions - Simple pixel art / game sprites - PCX image format - Icon files

Hurts (expands data): - Natural language text (alternating characters) - Already-compressed data - Random/encrypted data - Photos (JPEG does not use RLE alone — uses DCT + Huffman)

Used as a building block: - Burrows-Wheeler Transform pipeline: BWT → MTF → RLE → Huffman (bzip2) - DEFLATE (gzip/zip): LZ77 + Huffman (implicitly handles runs)

🎯 Key Takeaways

RLE replaces consecutive identical symbols with (count, symbol) — O(n) encode and decode.
Compression ratio depends entirely on run length — 720 bytes of fax → 6 bytes (120x).
Fails on alternating or random data — can expand the input.
Used in fax (T4 standard), BMP, PCX, TIFF, and as a building block in bzip2.
Simplest possible compressor — good teaching introduction to the compression/entropy tradeoff.

Interview Questions on This Topic

QWhat type of data does RLE compress well, and what data does it expand?
QImplement RLE encode and decode for a string.
QHow is RLE used in the bzip2 compression pipeline?
QWhat is the worst-case expansion ratio for RLE?

Frequently Asked Questions

Does JPEG use RLE?

JPEG uses RLE for zero-coefficient runs in its zig-zag scan step (encoding runs of zeros after quantisation), combined with Huffman coding. It is not pure RLE — the DCT and quantisation stages do the heavy lifting.

🔥

Naren Founder & Author

Developer and founder of TheCodeForge. I built this site because I was tired of tutorials that explain what to type without explaining why it works. Every article here is written to make concepts actually click.

About Naren Get in touch

Forged with 🔥 at TheCodeForge.io — Where Developers Are Forged