DSA Intermediate

Boyer-Moore String Search Algorithm — Bad Character and Good Suffix

📅 March 24, 2026 ⏱ 8 min read 🎯 Intermediate

Where developers are forged. · Structured learning · Free forever.

📍 Part of: Strings → Topic 4 of 8

Learn the Boyer-Moore string search algorithm.

⚙️ Intermediate — basic DSA knowledge assumed

In this tutorial, you'll learn:

Boyer-Moore compares right-to-left within the pattern — this is the key difference from KMP/Z. Right-to-left enables skipping forward in the text on mismatch.
Bad character rule: align rightmost occurrence of mismatched text char with current pattern position. If char not in pattern, skip entire pattern length past current position.
Boyer-Moore-Horspool (bad char only) is what most production implementations use — simpler than full Boyer-Moore, nearly as fast for typical text.

✦ Plain-English analogy ✦ Real code with output ✦ Interview questions

⚡ Quick Answer

Most string search algorithms compare left to right. Boyer-Moore does something counterintuitive — it compares the pattern right to left. When it finds a mismatch, it uses two powerful rules to skip ahead by many characters at once, sometimes jumping the entire pattern length. This is why Boyer-Moore gets faster as the pattern gets longer — the opposite of what you'd expect.

Open a terminal and run: grep -c 'the' /usr/share/dict/words. Grep finds all matches faster than you can think about it. The reason: Boyer-Moore (specifically Boyer-Moore-Horspool). The longer your search pattern, the faster it runs — counterintuitively, Boyer-Moore gets faster as patterns get longer because it skips more characters per comparison.

This sub-linear behaviour is the key insight. KMP guarantees O(n+m) worst case. Boyer-Moore achieves O(n/m) in practice on natural language text — for a 10-character pattern, it examines roughly 1 in 10 characters in the text. This is why Boyer-Moore is the default in text editors, grep, and most real-world text search implementations despite KMP being theoretically simpler.

Right-to-Left Matching

Boyer-Moore aligns the pattern at the left of the text and compares right-to-left — from the last character of the pattern backward. When it finds a mismatch, it applies whichever of the two heuristics gives the larger skip.

This right-to-left comparison is key: when we find a mismatch at pattern position j (text position i+j), we know nothing about text[i+j+1..i+m-1] — but we know the characters to the right are already matched. This asymmetry is what makes skipping possible.

The Bad Character Heuristic

When a mismatch occurs at text position i+j (pattern position j), look at the mismatched text character c = text[i+j]. The bad character rule says: shift the pattern right so that the rightmost occurrence of c in the pattern aligns with position i+j.

If c doesn't appear in the pattern at all, shift the entire pattern past position i+j — a shift of j+1. If c appears in the pattern to the left of j, shift to align the rightmost such occurrence. If c only appears to the right of j, the bad character rule gives a negative shift — ignore it (use 0 or the good suffix rule).

bad_char_table.py · PYTHON

12345678910111213

def bad_char_table(pattern: str) -> dict[str, int]:
    """
    Returns last occurrence index of each character in pattern.
    If char not in pattern, returns -1 (handled at call site).
    """
    table = {}
    for i, ch in enumerate(pattern):
        table[ch] = i
    return table

pattern = 'ABCBAB'
table = bad_char_table(pattern)
print(table)  # {'A': 4, 'B': 5, 'C': 2}

▶ Output

{'A': 4, 'B': 5, 'C': 2}

Boyer-Moore with Bad Character Only (Simplified)

A simplified version using only the bad character heuristic is called Boyer-Moore-Horspool and is often used in practice for its simplicity.

boyer_moore_simple.py · PYTHON

123456789101112131415161718192021222324252627282930

def boyer_moore_horspool(text: str, pattern: str) -> list[int]:
    n, m = len(text), len(pattern)
    if m == 0 or m > n:
        return []

    # Build bad character table
    bad_char = {ch: i for i, ch in enumerate(pattern)}
    matches = []
    s = 0  # shift of the pattern w.r.t. text

    while s <= n - m:
        j = m - 1  # start from rightmost character

        # Move left while characters match
        while j >= 0 and pattern[j] == text[s + j]:
            j -= 1

        if j < 0:
            matches.append(s)  # full match found
            # Shift by 1 (or use good suffix for full Boyer-Moore)
            s += 1
        else:
            # Bad character shift
            bc = bad_char.get(text[s + j], -1)
            s += max(1, j - bc)

    return matches

print(boyer_moore_horspool('ABAAABCD', 'ABC'))      # [4]
print(boyer_moore_horspool('AABCAABXAAB', 'AABX'))  # [4]

▶ Output

[4]
[4]

The Good Suffix Heuristic

When a mismatch occurs after matching t characters (the 'good suffix'), the good suffix rule shifts the pattern to align the next occurrence of that suffix. This is more complex to implement but provides better worst-case guarantees.

The full Boyer-Moore with both heuristics achieves: - Best case: O(n/m) — each character in text examined at most once every m steps - Worst case: O(nm) for the simple version, O(n) with the full good suffix rule - Typical English text: Much faster than KMP in practice

For most interview purposes, the bad-character-only version (Horspool) is sufficient to explain and implement.

🔥

Boyer-Moore in the Real WorldGNU grep uses a Boyer-Moore variant as its default algorithm. When you run 'grep pattern file', the longer your pattern, the faster grep runs — a direct consequence of Boyer-Moore's sub-linear behaviour.

Worked Example

Text: AAABBBCCCABC Pattern: ABC

Step 1: Align at position 0. Compare right-to-left: text[2]='A' vs pattern[2]='C' → mismatch. Bad char 'A' is at index 0 in pattern. Shift = max(1, 2-0) = 2. Move to s=2.

Step 2: s=2. Compare right-to-left: text[4]='B' vs pattern[2]='C' → mismatch. Bad char 'B' at index 1. Shift = max(1, 2-1) = 1. Move to s=3.

Step 3: s=3. Compare: text[5]='B' vs 'C' → mismatch. Bad char 'B' at 1. Shift=1. s=4.

... continues until s=9 where ABC matches. Total: only examined ~8 positions instead of 30 naive comparisons.

Property	Boyer-Moore-Horspool	Full Boyer-Moore	KMP
Preprocessing	O(m + σ)	O(m + σ)	O(m)
Search worst case	O(nm)	O(n)	O(n)
Search best case	O(n/m)	O(n/m)	O(n)
Implementation	Simple	Complex	Moderate
Best for	Long patterns	General purpose	Repetitive patterns

🎯 Key Takeaways

Boyer-Moore compares right-to-left within the pattern — this is the key difference from KMP/Z. Right-to-left enables skipping forward in the text on mismatch.
Bad character rule: align rightmost occurrence of mismatched text char with current pattern position. If char not in pattern, skip entire pattern length past current position.
Boyer-Moore-Horspool (bad char only) is what most production implementations use — simpler than full Boyer-Moore, nearly as fast for typical text.
O(n/m) in practice: longer patterns = faster search. A 20-character pattern in English text typically examines fewer than n/10 characters.
Choose KMP for: DNA sequences, binary data, repetitive patterns, guaranteed worst case. Choose Boyer-Moore for: English text, long patterns, when you can profile workload.

Interview Questions on This Topic

QWhy does Boyer-Moore compare right-to-left instead of left-to-right?
QExplain the bad character heuristic with an example.
QWhy is Boyer-Moore faster for longer patterns?
QWhat is the difference between Boyer-Moore and Boyer-Moore-Horspool?

Frequently Asked Questions

Which is faster in practice — KMP or Boyer-Moore?

Boyer-Moore is significantly faster in practice for long patterns in natural language text. KMP is preferred when worst-case guarantees are needed or patterns are short with lots of repetition.

Does Python's str.find() use Boyer-Moore?

CPython uses a two-way string matching algorithm combined with Boyer-Moore-Horspool heuristics for longer patterns. The exact implementation varies by Python version.

🔥

Naren Founder & Author

Developer and founder of TheCodeForge. I built this site because I was tired of tutorials that explain what to type without explaining why it works. Every article here is written to make concepts actually click.

About Naren Get in touch

Forged with 🔥 at TheCodeForge.io — Where Developers Are Forged