Skip to content
Home DSA Huffman Coding

Huffman Coding

Where developers are forged. · Structured learning · Free forever.
📍 Part of: Greedy & Backtracking → Topic 3 of 13
Huffman coding explained — how variable-length encoding works, building the Huffman tree with a min-heap, encoding and decoding, and compression ratio with examples.
🔥 Advanced — solid DSA foundation required
In this tutorial, you'll learn
Huffman coding explained — how variable-length encoding works, building the Huffman tree with a min-heap, encoding and decoding, and compression ratio with examples.
  • Huffman coding assigns shorter codes to more frequent characters — provably optimal prefix-free code.
  • Build a min-heap of leaf nodes, repeatedly merge the two lowest-frequency nodes.
  • Prefix-free: no code is a prefix of another, so decoding is unambiguous — walk the tree bit by bit.
✦ Plain-English analogy ✦ Real code with output ✦ Interview questions
Quick Answer

Huffman coding assigns shorter bit patterns to more frequent characters and longer patterns to rarer ones. Build a min-heap of (frequency, character) pairs, repeatedly merge the two lowest-frequency nodes into a parent, until one tree remains. The path from root to each leaf (0 for left, 1 for right) gives the code for that character.

Building the Huffman Tree

Example · PYTHON
12345678910111213141516171819202122232425262728293031323334353637383940
# Package: io.thecodeforge.python.dsa

import heapq
from collections import Counter

class HuffmanNode:
    def __init__(self, char, freq):
        self.char  = char
        self.freq  = freq
        self.left  = None
        self.right = None

    # Min-heap ordering by frequency
    def __lt__(self, other):
        return self.freq < other.freq

def build_huffman_tree(text):
    freq = Counter(text)

    # Min-heap of leaf nodes
    heap = [HuffmanNode(char, f) for char, f in freq.items()]
    heapq.heapify(heap)

    while len(heap) > 1:
        # Take two nodes with lowest frequency
        left  = heapq.heappop(heap)
        right = heapq.heappop(heap)

        # Merge into a parent node
        parent = HuffmanNode(None, left.freq + right.freq)
        parent.left  = left
        parent.right = right

        heapq.heappush(heap, parent)

    return heap[0]  # root of Huffman tree

text = 'aabbbccddddeeeeee'
root = build_huffman_tree(text)
print(f'Root frequency: {root.freq}')  # 17 (total characters)
▶ Output
Root frequency: 17

Generating Codes and Encoding

Example · PYTHON
123456789101112131415161718192021222324252627
def get_codes(node, prefix='', codes={}):
    if node.char is not None:  # leaf
        codes[node.char] = prefix or '0'  # handle single-char case
    else:
        get_codes(node.left,  prefix + '0', codes)
        get_codes(node.right, prefix + '1', codes)
    return codes

def huffman_encode(text):
    root  = build_huffman_tree(text)
    codes = get_codes(root, '', {})
    encoded = ''.join(codes[c] for c in text)
    return encoded, codes, root

text = 'aabbbccddddeeeeee'
encoded, codes, root = huffman_encode(text)

print('Codes:')
for char in sorted(codes):
    freq = text.count(char)
    print(f'  {repr(char)}: {codes[char]:5s}  (freq={freq}, bits={freq*len(codes[char])})')

original_bits = len(text) * 8
compressed_bits = len(encoded)
print(f'\nOriginal:   {original_bits} bits')
print(f'Compressed: {compressed_bits} bits')
print(f'Ratio:      {compressed_bits/original_bits:.1%}')
▶ Output
Codes:
'a': 110 (freq=2, bits=6)
'b': 111 (freq=3, bits=9)
'c': 100 (freq=2, bits=6)
'd': 01 (freq=4, bits=8)
'e': 0 (freq=6, bits=6)

Original: 136 bits
Compressed: 35 bits
Ratio: 25.7%

🎯 Key Takeaways

  • Huffman coding assigns shorter codes to more frequent characters — provably optimal prefix-free code.
  • Build a min-heap of leaf nodes, repeatedly merge the two lowest-frequency nodes.
  • Prefix-free: no code is a prefix of another, so decoding is unambiguous — walk the tree bit by bit.
  • Time complexity: O(n log n) where n is the number of distinct characters.
  • Real-world use: DEFLATE (ZIP/PNG), JPEG quantisation tables, HTTP/2 header compression (HPACK).

Interview Questions on This Topic

  • QHow do you build a Huffman tree?
  • QWhat does 'prefix-free code' mean and why does Huffman coding produce one?
  • QWhat is the time complexity of building a Huffman tree?

Frequently Asked Questions

What does prefix-free mean and why is it important?

Prefix-free means no code is a prefix of any other code. If 'a' is '10' and 'ab' is '101', then seeing '10' in the bit stream is ambiguous — is it 'a' or the start of 'ab'? Huffman codes are inherently prefix-free because they are derived from leaf nodes in a tree — no leaf is an ancestor of another.

How is Huffman coding different from run-length encoding?

Run-length encoding replaces runs of identical characters with (character, count) pairs — efficient for data with long runs (like simple images). Huffman coding works on frequency of individual characters regardless of position. Real compression formats use both: DEFLATE (used in ZIP and PNG) combines Huffman coding with LZ77 sliding window compression.

🔥
Naren Founder & Author

Developer and founder of TheCodeForge. I built this site because I was tired of tutorials that explain what to type without explaining why it works. Every article here is written to make concepts actually click.

← PreviousActivity Selection ProblemNext →Backtracking Introduction
Forged with 🔥 at TheCodeForge.io — Where Developers Are Forged