Skip to content
Home DSA KMP String Matching — LPS Off-by-One Bug in Production

KMP String Matching — LPS Off-by-One Bug in Production

Where developers are forged. · Structured learning · Free forever.
📍 Part of: Strings → Topic 2 of 8
A pattern of length 15 matched 97% of logs, but an LPS off-by-one sporadically missed overlapping matches.
⚙️ Intermediate — basic DSA knowledge assumed
In this tutorial, you'll learn
A pattern of length 15 matched 97% of logs, but an LPS off-by-one sporadically missed overlapping matches.
  • KMP avoids redundant comparisons by precomputing an LPS (failure function) array for the pattern.
  • LPS[i] = length of the longest proper prefix of pattern[0..i] that is also a suffix.
  • On mismatch at pattern[j], set j = lps[j-1] instead of restarting from 0.
✦ Plain-English analogy ✦ Real code with output ✦ Interview questions
Quick Answer
  • KMP precomputes an LPS array to avoid re-examining matched characters
  • LPS[i] = length of longest proper prefix of pattern[0..i] that is also a suffix
  • On mismatch, fallback to lps[j-1] instead of restarting from index 0
  • Text pointer never moves backward — guaranteed O(n+m) runtime
  • Build LPS in O(m) using a two-pointer backtracking loop
  • Biggest mistake: using lps[j] instead of lps[j-1] — causes off-by-one silent failures
🚨 START HERE

KMP Quick Debug Cheat Sheet

If your KMP search just isn't working, run through these five checks in order.
🟡

No matches found (but pattern exists)

Immediate ActionPrint the LPS array and verify it matches a known correct array for your pattern.
Commands
print(build_lps('ABCAB')) # Expected: [0,0,0,1,2]
print(kmp_search('ABABCABCABAB', 'ABCAB')) # Expected: [2,5]
Fix NowIf LPS is wrong, fix the build loop. If LPS is correct but search fails, check the fallback index (must be lps[j-1]).
🟠

Infinite loop or very slow on small text

Immediate ActionPause the debugger at the start of the search loop. Check i and j after each iteration.
Commands
Add breakpoint at 'while i < n:' and step through with text='AAAB', pattern='AAB'
Verify that i increments at least once per cycle — if not, you're missing the else branch.
Fix NowAdd the missing 'else: i += 1' when j==0 and mismatch occurs.
🟡

Matches found but at wrong positions

Immediate ActionCheck the match index calculation: i - j at the moment j == m. Ensure i was incremented after the last match before checking j.
Commands
print(f'Match at {i-j}') just before appending — confirm against manual tracing.
Test with pattern 'AA' and text 'AAA' — expected matches at 0 and 1.
Fix NowIf matches are shifted by 1, your loop increments i and j after matching but before checking j==m. Swap order: check j==m before incrementing for the fallback.
Production Incident

LPS Off-by-One: The Silent Match Fail in Production Log Parsing

A 200MB compressed log file processed every night — and once a week, a known error pattern was missed. The LPS array was built correctly, but the search logic used lps[j] instead of lps[j-1] on mismatch.
SymptomPattern of length 15 appeared in the log 97% of the time, but sporadically didn't match. No error was thrown — the search simply returned fewer matches than expected. The missed matches always occurred after a partial overlap (e.g., pattern 'ABABAB' would miss a match when the text had 'ABABABAB').
AssumptionThe team assumed that since the LPS array values were correct, the search loop was also correct. They also assumed that unit tests covering the exact patterns from the spec were enough.
Root causeIn the search loop, when a mismatch occurred and j > 0, the code set j = lps[j] instead of j = lps[j-1]. The off-by-one caused the fallback to be one position too far, skipping potential matches. This bug only manifests when the pattern has overlapping prefix-suffix structures (e.g., 'ABAB', 'AAA').
FixChanged the fallback line from j = lps[j] to j = lps[j-1]. Added a randomised fuzz test that generates random patterns and texts and compares against a naive search — catches exactly these edge cases.
Key Lesson
Always test string matching implementations against patterns with repeating prefixes and suffixes.Use randomised testing to uncover algorithm bugs that simple unit tests miss.The LPS array is indexed by position; after mismatch at j, the fallback is lps[j-1], not lps[j] — write it down on a sticky note until it's muscle memory.
Production Debug Guide

Diagnose a broken KMP implementation in minutes

Matches found are shifted by 1 or 2 positionsCheck the fallback index in the search loop: ensure you use lps[j-1] when j>0, not lps[j]. Test with pattern 'AAAA' against text 'AAAA' — expected matches at 0,1,2,3.
Infinite loop in search (hangs on large texts)Verify that when j=0 and mismatch occurs, you increment i. The condition must be 'if j>0 then j=lps[j-1] else i++' — missing the else branch causes the loop to never advance i.
LPS array values are all zeroCheck the LPS build loop: when pattern[i] != pattern[len] and len>0, set len = lps[len-1] and do NOT increment i. If i increments without matching, you'll never build proper values.
Only matches at the first occurrence, then stopsAfter a match (j == m), ensure you set j = lps[j-1] to continue searching for overlapping matches. If you set j=0, you miss all overlapping matches.

KMP is one of those algorithms that every senior engineer has encountered in a 1am incident. You're processing a 500MB log file looking for a pattern — maybe a specific error signature or a token format — and your naive substr search is burning through CPU. KMP is the fix, and understanding why it works is what separates engineers who reach for the right tool versus engineers who reach for regex and hope for the best.

Published in 1977 by Knuth, Morris, and Pratt, KMP solves string pattern matching in O(n+m) time — guaranteed, not amortised, not average-case. It works by never re-examining a character in the text. Once the text pointer moves right, it never goes left. This single invariant is why the algorithm runs in linear time regardless of what adversarial pattern you throw at it. The mechanism that enforces this invariant is the LPS array — and once you truly understand LPS, you'll see it hidden inside Z-algorithm, Aho-Corasick, and even suffix arrays.

How KMP Works — Plain English and Algorithm

The naive string search checks every position in the text and compares the pattern character by character — O(n*m) worst case. KMP (Knuth-Morris-Pratt) is faster because it never re-examines characters it has already matched. When a mismatch occurs, KMP uses a precomputed table to skip ahead intelligently instead of restarting from scratch.

The key insight: when a mismatch occurs at pattern[j], the substring pattern[0..j-1] was already matched. The 'failure function' (or LPS array — Longest Proper Prefix which is also Suffix) tells us the longest proper prefix of pattern[0..j-1] that is also a suffix. We can restart matching from that prefix instead of from the beginning.

Step 1 — Build the LPS (failure function) array for the pattern: 1. lps[0] = 0 always. 2. Use two pointers: len=0 (length of previous longest prefix-suffix), i=1. 3. If pattern[i] == pattern[len]: lps[i] = len+1, increment both i and len. 4. Else if len > 0: len = lps[len-1] (try a shorter prefix — do NOT increment i). 5. Else: lps[i] = 0, increment i.

Step 2 — Search the text using the LPS array: 1. Use two pointers: i (text index), j (pattern index), both start at 0. 2. If text[i] == pattern[j]: increment both i and j. 3. If j == len(pattern): match found at index i-j. Set j = lps[j-1]. 4. Else if i < len(text) and text[i] != pattern[j]: - If j > 0: j = lps[j-1] (skip via failure function, do NOT move i). - Else: increment i.

Time complexity: O(n + m) where n = text length, m = pattern length. The LPS build is O(m) and the search is O(n).

Mental Model
The KMP Intuition
Think of the pattern as a rope tied to a fixed point. When you pull it forward and hit a mismatch, the rope doesn't snap back to the start — it only recoils to where it last overlapped with itself.
  • The LPS array measures the 'slack' in the rope — how far it can recoil.
  • If pattern[0..k] matches the suffix of the matched prefix, you can skip k characters of the text.
  • The text pointer never goes backward — it's a unidirectional scan with smart jumps on the pattern side.
  • KMP works because the pattern knows its own symmetry — it doesn't need to re-examine what it already learned.
📊 Production Insight
The two-pointer backtracking in the LPS build is the most common source of bugs.
A single off-by-one in the len backtrack leads to an incorrect LPS, which silently causes missed matches without any exception.
Always unit-test the LPS array against multiple patterns before relying on KMP in production.
🎯 Key Takeaway
KMP leverages precomputed self-overlap information to skip redundant comparisons.
The LPS array is built once and reused for all searches against the same pattern.
Correctness depends on the fallback index being lps[j-1], never lps[j].

Visual LPS Table Construction

The LPS array is built by walking through the pattern and determining the longest proper prefix that is also a suffix for each prefix of the pattern. The algorithm maintains a length variable that tracks the length of the previous longest prefix-suffix. At each step, the table shows the current state of the LPS array and the positions of pattern[i] and pattern[len]. This visual breaks down the construction for the pattern 'ABABCAB' (length 7) step by step.

Step-by-step
  • i=0: lps[0]=0 by definition. len=0.
  • i=1: pattern[1]='B' != pattern[0]='A' and len=0 → lps[1]=0.
  • i=2: pattern[2]='A' == pattern[0]='A' → len=1, lps[2]=1.
  • i=3: pattern[3]='B' == pattern[1]='B' → len=2, lps[3]=2.
  • i=4: pattern[4]='C' != pattern[2]='A' and len>0 → len=lps[1]=0; then pattern[4]='C' != pattern[0]='A' → lps[4]=0.
  • i=5: pattern[5]='A' == pattern[0]='A' → len=1, lps[5]=1.
  • i=6: pattern[6]='B' == pattern[1]='B' → len=2, lps[6]=2.

The diagram below shows the LPS array evolving after each comparison.

📊 Production Insight
When debugging an LPS array, manually compare it against a table like this for your pattern. The key is verifying that after a mismatch (pattern[i] != pattern[len]), the backtracking correctly uses lps[len-1] before trying again. Mistaking the order of operations here leads to wrong LPS values that cascade into search failures.
🎯 Key Takeaway
Trace through the LPS construction with a small pattern on paper until the backtracking logic becomes automatic. The moment a mismatch occurs, the algorithm never stops — it shrinks len and compares again.
LPS Construction for 'ABABCAB'
Step 1
0
0
1
2
0
1
2
Final LPS array. i=4: mismatch → backtrack len to 0, then set lps[4]=0.
Step 2
0
0
1
2
0
0
0
i=2: match 'A' with pattern[0] → len=1, lps[2]=1. i=3: match 'B' with pattern[1] → len=2, lps[3]=2.
Step 3
0
0
1
0
0
0
0
i=1: mismatch 'B' vs 'A', len=0 → lps[1]=0.
Step 4
0
0
0
0
0
0
0
Initial: lps[0]=0.

Worked Example — KMP Search for 'ABCAB' in 'ABABCABCABAB'

Pattern: ABCAB Text: ABABCABCABAB

Step 1 — Build LPS for pattern 'ABCAB': i=0: lps[0]=0 (always) i=1: pattern[1]='B' vs pattern[0]='A'. No match. lps[1]=0. i=2: pattern[2]='C' vs pattern[0]='A'. No match. lps[2]=0. i=3: pattern[3]='A' vs pattern[0]='A'. Match! lps[3]=1, len=1. i=4: pattern[4]='B' vs pattern[1]='B'. Match! lps[4]=2, len=2. LPS = [0, 0, 0, 1, 2]

Step 2 — Search (i=text index, j=pattern index): i=0,j=0: text[0]='A'==pattern[0]='A'. i=1,j=1. i=1,j=1: text[1]='B'==pattern[1]='B'. i=2,j=2. i=2,j=2: text[2]='A'!=pattern[2]='C'. j=lps[1]=0. i=2,j=0: text[2]='A'==pattern[0]='A'. i=3,j=1. i=3,j=1: text[3]='B'==pattern[1]='B'. i=4,j=2. i=4,j=2: text[4]='C'==pattern[2]='C'. i=5,j=3. i=5,j=3: text[5]='A'==pattern[3]='A'. i=6,j=4. i=6,j=4: text[6]='B'==pattern[4]='B'. j==5==len(pattern)! MATCH at index i-j=6-5=1... wait, i-j=6-5=1? No: i=7 after increment, j=5 equals pattern length. Match at i-j=7-5=2. j=lps[4]=2. Continue search from i=7,j=2... i=7,j=2: text[7]='C'==pattern[2]='C'. i=8,j=3. i=8,j=3: text[8]='A'==pattern[3]='A'. i=9,j=4. i=9,j=4: text[9]='B'==pattern[4]='B'. Match at i-j=10-5=5. j=lps[4]=2.

Matches found at indices 2 and 5 in the text.

Notice that after finding a match, KMP uses lps[j-1] to slide the pattern forward by the right amount — it does not restart from the beginning of the pattern.

📊 Production Insight
Manual tracing like this is exactly how you'll debug a production KMP failure.
When the LPS array has non-zero values (prefix-suffix overlaps), the fallback logic during search is the most likely bug source.
Run through a simple pattern manually before trusting it on a 500MB log file.
🎯 Key Takeaway
Trace at least one full search by hand to verify your implementation.
Matches can overlap — after a match, use lps[j-1] to find the next one.
The text index i only moves forward, which guarantees linear runtime.

KMP Implementation

The two-function structure (build_lps and kmp_search) matches the two-phase algorithm exactly. Study build_lps carefully — the len backtrack using lps[len-1] is the subtlest part.

kmp.py · PYTHON
1234567891011121314151617181920212223242526272829303132333435363738394041
def build_lps(pattern):
    m   = len(pattern)
    lps = [0] * m
    length = 0  # length of previous longest prefix-suffix
    i = 1
    while i < m:
        if pattern[i] == pattern[length]:
            length += 1
            lps[i] = length
            i += 1
        elif length > 0:
            length = lps[length - 1]  # try shorter prefix
            # do NOT increment i here
        else:
            lps[i] = 0
            i += 1
    return lps

def kmp_search(text, pattern):
    n, m = len(text), len(pattern)
    if m == 0: return []
    lps = build_lps(pattern)
    matches = []
    i = j = 0
    while i < n:
        if text[i] == pattern[j]:
            i += 1; j += 1
        if j == m:
            matches.append(i - j)  # match at this index
            j = lps[j - 1]
        elif i < n and text[i] != pattern[j]:
            if j > 0:
                j = lps[j - 1]
            else:
                i += 1
    return matches

text    = 'ABABCABCABAB'
pattern = 'ABCAB'
print('LPS:', build_lps(pattern))
print('Matches at indices:', kmp_search(text, pattern))
▶ Output
LPS: [0, 0, 0, 1, 2]
Matches at indices: [2, 5]
📊 Production Insight
The code looks simple, but edge cases like pattern length 1 or pattern with all same characters (e.g., 'AAA') are notorious for revealing bugs.
Always test with patterns that have repeated prefixes: 'AAAA', 'ABAB', 'ABACAB'.
The case m==0 must be handled separately — an empty pattern should return no matches, but many implementations incorrectly treat it as a match at every position.
🎯 Key Takeaway
Isolate the LPS build and search into separate functions.
Test patterns with overlapping prefixes — they expose the most bugs.
Handle empty pattern as a special case early.

C++ Implementation

The C++ implementation of KMP mirrors the Python version but requires careful handling of array indices and the std::vector container. The LPS construction and search functions are split similarly. Key differences: C++ uses size_t for indices, and the LPS array is built with a while loop that may not increment i in every iteration. The search function returns a vector of match positions. Below is a complete, production-ready C++ implementation that follows the same logic as the Python reference.

kmp.cpp · CPP
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061
#include <vector>
#include <string>
#include <iostream>

std::vector<int> build_lps(const std::string& pattern) {
    int m = pattern.length();
    std::vector<int> lps(m, 0);
    int len = 0;  // length of previous longest prefix-suffix
    int i = 1;
    while (i < m) {
        if (pattern[i] == pattern[len]) {
            len++;
            lps[i] = len;
            i++;
        } else if (len > 0) {
            len = lps[len - 1];  // backtrack without incrementing i
        } else {
            lps[i] = 0;
            i++;
        }
    }
    return lps;
}

std::vector<int> kmp_search(const std::string& text, const std::string& pattern) {
    int n = text.length();
    int m = pattern.length();
    if (m == 0) return {};
    std::vector<int> lps = build_lps(pattern);
    std::vector<int> matches;
    int i = 0, j = 0;
    while (i < n) {
        if (text[i] == pattern[j]) {
            i++; j++;
        }
        if (j == m) {
            matches.push_back(i - j);
            j = lps[j - 1];
        } else if (i < n && text[i] != pattern[j]) {
            if (j > 0)
                j = lps[j - 1];
            else
                i++;
        }
    }
    return matches;
}

int main() {
    std::string text = "ABABCABCABAB";
    std::string pattern = "ABCAB";
    std::vector<int> lps = build_lps(pattern);
    std::cout << "LPS: ";
    for (int v : lps) std::cout << v << " ";
    std::cout << std::endl;
    std::vector<int> matches = kmp_search(text, pattern);
    std::cout << "Matches at indices: ";
    for (int idx : matches) std::cout << idx << " ";
    std::cout << std::endl;
    return 0;
}
▶ Output
LPS: 0 0 0 1 2
Matches at indices: 2 5
📊 Production Insight
In C++ always use std::vector<int> for dynamic arrays rather than raw int* allocations. The same off-by-one bugs apply: double-check the fallback index lps[j-1]. Also note that C++ string indices are size_t, so casting to int for the LPS vector is safe for typical problem sizes, but avoid using signed/unsigned mismatches in larger loops.
🎯 Key Takeaway
The C++ version is a direct translation of the Python logic but requires explicit type handling. Always test with the same edge cases: empty pattern, single char, overlapping patterns.

Common KMP Mistakes and How to Debug Them

Even experienced engineers slip up on KMP's fallback logic. The algorithm has three high-risk areas: the LPS backbuilding loop, the mismatch fallback in search, and the match-continuation handler.

  1. Off-by-one in LPS fallback: When len is not equal to 0 and pattern[i] != pattern[len], you must set len = lps[len-1]. Using lps[len] or lps[len-2] produces wrong values that cascade into missed matches.
  2. Confusing j and i in search: After a mismatch, you must NOT increment i when j > 0. This is the most common infinite-loop trigger. Only increment i when j == 0.
  3. After a match, setting j = 0 instead of j = lps[j-1]: This kills overlap detection. If your pattern is 'AA' and text is 'AAA', you'll miss the match at index 1.
  4. Forgetting that LPS lengths are always < j: lps[j-1] is always less than j. If your fallback index ever equals or exceeds j, you have a bug in LPS construction.
📊 Production Insight
In production, an incorrect KMP doesn't crash — it silently returns fewer matches.
A regression test that compares KMP's match count against the naive approach on a random set of pattern/text pairs catches nearly all implementation bugs.
Add a fuzz test that runs in your CI pipeline — it's five lines of code and saves hours of debugging.
🎯 Key Takeaway
The three most common bugs: fallback index off-by-one, missing else branch, and resetting j=0 after match.
Add a randomised fuzz test comparing to naive search to catch regressions.
When debugging, print the LPS array and verify it matches a known correct array.

KMP Variants, Complexity, and When to Use Alternatives

KMP is not always the fastest string matching algorithm for every scenario. Here's how it compares:

  • Boyer-Moore: Usually faster for large alphabets because it can skip more characters per mismatch (bad character and good suffix heuristics). Worst-case O(n*m) but rarely hits it.
  • Rabin-Karp: Uses rolling hash. Good for multiple pattern search (plagiarism detection), but O(n*m) worst-case from hash collisions.
  • Aho-Corasick: Builds a finite automaton from multiple patterns simultaneously. Ideal for virus signature detection or filtering many keywords.
  • Z-algorithm: Computes an array Z[i] = longest substring starting at i that matches prefix. Can be used to find all matches in O(n+m) with simpler code than KMP.

KMP remains the best choice when you need guaranteed linear time and low overhead for a single pattern search. It's also the foundation for understanding more complex string algorithms.

📊 Production Insight
Don't default to KMP for every pattern search. Boyer-Moore with a large alphabet (e.g., Unicode logs) can be 3-5x faster in practice.
But if you need worst-case predictable performance (e.g., real-time systems, security checks), KMP's O(n+m) guarantee is safer than Boyer-Moore's worst-case.
Measure, don't guess: profile both on your actual data distribution.
🎯 Key Takeaway
KMP is proven O(n+m) but not always fastest in practice.
Use Boyer-Moore for large alphabets, Aho-Corasick for multiple patterns, Rabin-Karp for rolling hash.
KMP's real value: it teaches the prefix-suffix idea that powers many advanced algorithms.

Formal O(n) Complexity Proof

The KMP search algorithm guarantees O(n) time because the outer while loop runs at most n times where n is the text length. The key invariant: the text index i never decreases; it either increments (when a character matches or when j=0 and mismatch occurs) or stays the same (when j>0 and a mismatch occurs, causing j to fallback). However, each time j falls back, j decreases by at least 1, and j can only increase when i increments. Since j is bounded by m (pattern length), the total number of fallbacks across the entire search is also bounded by the total number of i increments, which is at most n. Therefore the total number of operations in the search loop is O(n).

More formally: let the search loop have two types of operations: - Type A: i increases by 1 (and j may increase at most 1). This can happen at most n times. - Type B: j decreases by at least 1 (via lps[j-1]). Since j starts at 0 and never exceeds m, and each increase of j (in Type A) is paired with an increase in i, the total number of Type B operations is at most the total number of Type A operations, i.e., at most n. Hence total operations ≤ 2n = O(n).

The LPS construction has an analogous proof: the pointer i only moves forward (at most m times), and the variable len only decreases when backtracking. Each backtrack reduces len by at least 1, and each increase of len is paired with an increment of i. So total backtrack steps ≤ m, giving O(m) for the build phase. Thus overall O(n+m).

📊 Production Insight
Understanding the formal proof helps when you need to guarantee worst-case latency. In high-frequency trading or real-time log processing, KMP's O(n+m) bound means you can predict the maximum time per search regardless of input. This is not true for Boyer-Moore or Rabin-Karp, which can degenerate with adversarial inputs.
🎯 Key Takeaway
KMP's linear time is not just average case — it's a proven worst-case bound. The proof relies on the fact that the text pointer never moves backward and the pattern pointer's fallbacks are amortized by forward matches.

String Period and Compression with KMP

One powerful application of the LPS (prefix function) is computing the smallest period of a string. The period of a string s is the smallest integer p such that s[i] = s[i+p] for all i where i+p < |s|. This is equivalent to saying that s is made up of repetitions of a substring of length p. Using the LPS array (or equivalently the prefix function), the smallest period of s is |s| - LPS[|s|-1] if |s| % (|s| - LPS[|s|-1]) == 0, otherwise the string is not periodic.

More concretely: let π denote the prefix function (same as LPS but defined for the whole string, not just prefixes). Then the length of the smallest period is n - π[n-1] if n % (n - π[n-1]) == 0. For example, for s = "ABABAB", n=6, LPS last value = 4 (since "ABAB" is prefix-suffix), period = 6-4=2. Indeed "AB" repeated 3 times.

This property allows string compression: a periodic string can be stored as a base substring and a repeat count, reducing memory. It also helps in finding repeated substrings in DNA sequences or log patterns. In competitive programming, you can solve problems like "Find the smallest period of a string" or "Check if a string can be formed by repeating a substring" using this technique.

📊 Production Insight
In production, this application is used for log compression: if logs follow a template like "ERROR: disk full at node X" with only X varying, the prefix function can identify the recurrent pattern and enable dictionary compression. It also appears in network packet inspection for detecting repeated signatures.
🎯 Key Takeaway
The LPS array directly gives the smallest period of a string via period = n - lps[n-1] (if divisible). Use this to compress repetitive data or detect repeated patterns in streams.

Practice Problems

Solidify your understanding of KMP by solving these problems that leverage the algorithm and its variants. They range from direct application to more creative uses of the prefix function. Attempt them in order of difficulty and test against edge cases like overlapping patterns.

📊 Production Insight
These problems mirror real-world tasks: pattern recognition in logs, compression of repetitive data, and palindrome detection for input validation. Building a library of KMP-based solutions speeds up incident response when you need to parse large text corpora.
🎯 Key Takeaway
KMP is not just one algorithm; its prefix function unlocks period detection, palindrome construction, and string compression. Practice these problems to internalize the pattern.
🗂 String Search Algorithm Comparison
Choose the right algorithm based on your pattern count, alphabet size, and performance requirements.
AlgorithmTime ComplexitySpaceBest ForWorst-Case
KMPO(n+m)O(m)Single pattern, guaranteed linearAlways O(n+m)
Boyer-MooreO(n+m) average, O(n*m) worstO(σ+m)Large alphabet, long textAdversarial patterns
Rabin-KarpO(n+m) average, O(n*m) worstO(1)Multiple patterns, rolling hashHash collisions
Aho-CorasickO(n+m+total matches)O(Σ * total pattern length)Multiple patterns simultaneouslyVery large automaton
Z-algorithmO(n+m)O(n+m)Simple implementation, pattern matchingSame as KMP

🎯 Key Takeaways

  • KMP avoids redundant comparisons by precomputing an LPS (failure function) array for the pattern.
  • LPS[i] = length of the longest proper prefix of pattern[0..i] that is also a suffix.
  • On mismatch at pattern[j], set j = lps[j-1] instead of restarting from 0.
  • Time complexity: O(n+m). Space: O(m) for the LPS array.
  • KMP never moves the text pointer i backward — it only advances, guaranteeing linear time.

⚠ Common Mistakes to Avoid

    Using lps[j] instead of lps[j-1] on mismatch
    Symptom

    Matches are missed when the pattern has overlapping prefixes (e.g., 'ABAB'). Incorrect fallback causes the search to skip over a potential match.

    Fix

    Always use j = lps[j-1] when j > 0 and mismatch occurs. The LPS array is 0-indexed; the longest prefix-suffix for the matched prefix is stored at position j-1.

    Not incrementing i when j == 0 and mismatch occurs
    Symptom

    Infinite loop: the search gets stuck on the same character because j is already 0 and you keep setting j = lps[-1] (which is invalid) or just skip i++.

    Fix

    Add an else branch: after checking j > 0, if j == 0, increment i. The pattern must have at least one character, so if the first char doesn't match, move on in the text.

    Setting j = 0 after a match instead of j = lps[j-1]
    Symptom

    Overlapping matches are missed. For pattern 'AA' and text 'AAA', only the match at index 0 is found, not at index 1.

    Fix

    After match (j == m), set j = lps[j-1] to allow overlapping matches. The pattern may overlap with itself; the LPS array tells you how far back to slide.

    Incorrect LPS build when pattern[i] != pattern[len] and len == 0
    Symptom

    LPS array values are incorrectly zero or skipping one. This can cause later fallbacks to be out of range or incorrect.

    Fix

    In the LPS build loop, the sequence must be: if match -> len++, lps[i]=len, i++. else if len>0 -> len = lps[len-1] (no i++). else -> lps[i]=0, i++. Do not skip the else branch.

Interview Questions on This Topic

  • QExplain what the LPS array stores and how it is computed.Mid-levelReveal
    LPS stands for Longest Proper Prefix which is also Suffix. For each position i in the pattern, LPS[i] gives the length of the longest substring starting at pattern[0] that is also a suffix of pattern[0..i], where the prefix cannot be the whole substring. It is computed using two pointers: len tracks the length of the previous longest prefix-suffix, and i iterates. When pattern[i] matches pattern[len], we extend: lps[i]=len+1, increment both. When it doesn't match and len>0, we backtrack: len = lps[len-1], keeping i unchanged. Otherwise, lps[i]=0 and i increments.
  • QWhy does KMP run in O(n+m) rather than O(n*m)?SeniorReveal
    Because the text pointer i never moves backward. It only ever increments. The pattern pointer j can decrease (via LPS fallback), but each decrement is bounded by the total number of increments, which is at most m. So total operations across the entire search is O(n + number of fallbacks) = O(n + m). The LPS build is similarly O(m) because the backtracking is bounded by the number of increments.
  • QHow would you use KMP to check if one string is a rotation of another?Mid-levelReveal
    To check if string B is a rotation of string A, concatenate A with itself (A+A) and use KMP to search for B in A+A. If KMP finds a match, B is a rotation of A. However, you must first check that lengths are equal and non-zero. This works because any rotation of A appears as a contiguous substring of the concatenated string.
  • QWhat is the LPS array for the pattern 'ABACABA'?SeniorReveal
    The LPS array for 'ABACABA' is [0, 0, 1, 0, 1, 2, 3]. Let's compute: i=0:0; i=1: 'B' vs 'A' ->0; i=2: 'A' vs 'A' ->1; i=3: 'C' vs 'B' -> backtrack len to lps[1]=0, then C vs A ->0; i=4: 'A' vs 'A' ->1; i=5: 'B' vs 'B' ->2; i=6: 'A' vs 'A' ->3.

Frequently Asked Questions

What does the LPS array represent?

LPS[i] is the length of the longest proper prefix of pattern[0..i] that is also a suffix of pattern[0..i]. 'Proper' means it cannot be the whole string itself. For pattern 'ABCAB', lps[4]=2 because 'AB' (length 2) is both a prefix and a suffix of 'ABCAB'. This tells KMP: after a mismatch at position j, restart matching from position lps[j-1] instead of 0.

Why is KMP O(n+m) when it seems like the inner loop could run multiple times?

The variable j can only be decremented via lps lookups, and it can only be incremented when a character matches (which also increments i). Since i only ever increases and is bounded by n, and j's total number of decrements is bounded by its total number of increments, the total work across the entire search is O(n). The LPS build is similarly O(m).

When should I use KMP vs other string matching algorithms?

Use KMP when searching for a single pattern in a long text — it is simple to implement and O(n+m). Use Aho-Corasick when searching for multiple patterns simultaneously. Use Rabin-Karp when you need rolling hash (e.g., plagiarism detection over sliding windows). Use Boyer-Moore when the alphabet is large — it skips more characters per mismatch.

How do I verify my KMP implementation is correct?

Create a fuzz test that generates random patterns and texts, then compare KMP's match results with a naive substring search. Test edge cases: empty pattern, single character pattern, pattern with all same characters (e.g., 'AAA'), pattern with overlapping prefixes (e.g., 'ABAB'), and text that is shorter than the pattern.

🔥
Naren Founder & Author

Developer and founder of TheCodeForge. I built this site because I was tired of tutorials that explain what to type without explaining why it works. Every article here is written to make concepts actually click.

← PreviousString Searching Algorithms OverviewNext →Z-Algorithm for Pattern Matching
Forged with 🔥 at TheCodeForge.io — Where Developers Are Forged