Run-Length Encoding β Lossless Compression
- RLE replaces consecutive identical symbols with (count, symbol) β O(n) encode and decode.
- Compression ratio depends entirely on run length β 720 bytes of fax β 6 bytes (120x).
- Fails on alternating or random data β can expand the input.
Run-Length Encoding is the first compression algorithm most people encounter, and it teaches the fundamental tension in all compression: the algorithm's effectiveness depends entirely on the structure of the data. For a fax transmission of a printed page β mostly white pixels with sparse black text β RLE achieves 50:1 compression. For a natural photograph, RLE can make the file larger.
This data-dependency is not a bug; it is the nature of compression. Shannon's source coding theorem says you can only compress data if it contains redundancy. RLE exploits one specific type of redundancy: runs of identical symbols. Understanding when your data has this structure β and when it doesn't β is the practical skill.
Basic RLE Implementation
def rle_encode(data: str) -> list[tuple]: """Returns list of (count, char) pairs.""" if not data: return [] encoded = [] count = 1 for i in range(1, len(data)): if data[i] == data[i-1]: count += 1 else: encoded.append((count, data[i-1])) count = 1 encoded.append((count, data[-1])) return encoded def rle_decode(encoded: list[tuple]) -> str: return ''.join(ch * count for count, ch in encoded) def rle_ratio(data: str) -> float: encoded = rle_encode(data) # Each pair needs 1 byte count + 1 byte char = 2 bytes compressed_size = len(encoded) * 2 return len(data) / compressed_size # Good case: long runs fax_line = 'W' * 400 + 'B' * 20 + 'W' * 300 # typical fax line print(f'Fax: {len(fax_line)} β {len(rle_encode(fax_line))*2} bytes ({rle_ratio(fax_line):.1f}x)') # Bad case: alternating alternating = 'ABABABABABABABAB' print(f'Alternating: {len(alternating)} β {len(rle_encode(alternating))*2} bytes ({rle_ratio(alternating):.2f}x)') # Typical bitmap bitmap = 'W'*50+'B'*3+'W'*47 enc = rle_encode(bitmap) print(f'Encoded: {enc}')
Alternating: 16 β 32 bytes (0.50x)
Encoded: [(50, 'W'), (3, 'B'), (47, 'W')]
Binary RLE and Escape Codes
For binary data, a compact format uses escape codes to handle runs of 1 in the literal representation.
def rle_encode_bytes(data: bytes) -> bytes: """Compact binary RLE: [count][byte] pairs, count in 1 byte (max 255).""" if not data: return b'' result = bytearray() i = 0 while i < len(data): byte = data[i] count = 1 while i + count < len(data) and data[i+count] == byte and count < 255: count += 1 result.extend([count, byte]) i += count return bytes(result) raw = bytes([255]*100 + [0]*200 + [128]*50) compressed = rle_encode_bytes(raw) print(f'Raw: {len(raw)} bytes β Compressed: {len(compressed)} bytes ({len(raw)/len(compressed):.1f}x)')
When RLE Helps and Hurts
Helps dramatically: - Fax transmission (CCITT T4 standard uses modified Huffman + RLE) - BMP images with large uniform regions - Simple pixel art / game sprites - PCX image format - Icon files
Hurts (expands data): - Natural language text (alternating characters) - Already-compressed data - Random/encrypted data - Photos (JPEG does not use RLE alone β uses DCT + Huffman)
Used as a building block: - Burrows-Wheeler Transform pipeline: BWT β MTF β RLE β Huffman (bzip2) - DEFLATE (gzip/zip): LZ77 + Huffman (implicitly handles runs)
π― Key Takeaways
- RLE replaces consecutive identical symbols with (count, symbol) β O(n) encode and decode.
- Compression ratio depends entirely on run length β 720 bytes of fax β 6 bytes (120x).
- Fails on alternating or random data β can expand the input.
- Used in fax (T4 standard), BMP, PCX, TIFF, and as a building block in bzip2.
- Simplest possible compressor β good teaching introduction to the compression/entropy tradeoff.
Interview Questions on This Topic
- QWhat type of data does RLE compress well, and what data does it expand?
- QImplement RLE encode and decode for a string.
- QHow is RLE used in the bzip2 compression pipeline?
- QWhat is the worst-case expansion ratio for RLE?
Frequently Asked Questions
Does JPEG use RLE?
JPEG uses RLE for zero-coefficient runs in its zig-zag scan step (encoding runs of zeros after quantisation), combined with Huffman coding. It is not pure RLE β the DCT and quantisation stages do the heavy lifting.
Developer and founder of TheCodeForge. I built this site because I was tired of tutorials that explain what to type without explaining why it works. Every article here is written to make concepts actually click.