KMP String Matching — LPS Off-by-One Bug in Production
A pattern of length 15 matched 97% of logs, but an LPS off-by-one sporadically missed overlapping matches.
20+ years shipping performance-critical code where algorithms decide the bill. Drawn from code that ran under real load.
- KMP precomputes an LPS array to avoid re-examining matched characters
- LPS[i] = length of longest proper prefix of pattern[0..i] that is also a suffix
- On mismatch, fallback to lps[j-1] instead of restarting from index 0
- Text pointer never moves backward — guaranteed O(n+m) runtime
- Build LPS in O(m) using a two-pointer backtracking loop
- Biggest mistake: using lps[j] instead of lps[j-1] — causes off-by-one silent failures
Imagine you're reading a novel looking for the phrase 'ABCABD'. You start matching: A-B-C-A-B... then the next letter is wrong. A naive approach restarts from scratch. KMP says: wait — you already matched 'AB' at the start of the pattern, and the last two letters you matched were also 'AB'. So skip back to position 2, not position 0. That intelligence — knowing how much of the pattern is a prefix that's also a suffix — is the entire insight behind KMP.
KMP is one of those algorithms that every senior engineer has encountered in a 1am incident. You're processing a 500MB log file looking for a pattern — maybe a specific error signature or a token format — and your naive substr search is burning through CPU. KMP is the fix, and understanding why it works is what separates engineers who reach for the right tool versus engineers who reach for regex and hope for the best.
Published in 1977 by Knuth, Morris, and Pratt, KMP solves string pattern matching in O(n+m) time — guaranteed, not amortised, not average-case. It works by never re-examining a character in the text. Once the text pointer moves right, it never goes left. This single invariant is why the algorithm runs in linear time regardless of what adversarial pattern you throw at it. The mechanism that enforces this invariant is the LPS array — and once you truly understand LPS, you'll see it hidden inside Z-algorithm, Aho-Corasick, and even suffix arrays.
Why Naive String Search Fails at Scale
KMP (Knuth-Morris-Pratt) string matching is a linear-time algorithm that finds all occurrences of a pattern in a text without backtracking. Its core mechanic is the failure function (LPS array — Longest proper Prefix which is also Suffix), which tells the algorithm how many characters it can skip when a mismatch occurs. This eliminates the O(n*m) worst-case of naive search, guaranteeing O(n + m) time where n is text length and m is pattern length.
The algorithm precomputes the LPS array in O(m) time, then scans the text once. When a mismatch happens at position i in the pattern, instead of resetting the text pointer, it uses LPS[i-1] to shift the pattern intelligently. This means every character in the text is examined at most once — no backtracking, no wasted comparisons. The space overhead is just O(m) for the LPS array.
Use KMP when you need deterministic worst-case performance — for example, in intrusion detection systems scanning network payloads, or in DNA sequence alignment where patterns are long and texts are massive. It's also the foundation for more advanced algorithms like Aho-Corasick for multi-pattern matching. In production, KMP's predictability beats the average-case speed of Boyer-Moore when patterns are short or alphabets are small.
How KMP Works — Plain English and Algorithm
The naive string search checks every position in the text and compares the pattern character by character — O(n*m) worst case. KMP (Knuth-Morris-Pratt) is faster because it never re-examines characters it has already matched. When a mismatch occurs, KMP uses a precomputed table to skip ahead intelligently instead of restarting from scratch.
The key insight: when a mismatch occurs at pattern[j], the substring pattern[0..j-1] was already matched. The 'failure function' (or LPS array — Longest Proper Prefix which is also Suffix) tells us the longest proper prefix of pattern[0..j-1] that is also a suffix. We can restart matching from that prefix instead of from the beginning.
Step 1 — Build the LPS (failure function) array for the pattern: 1. lps[0] = 0 always. 2. Use two pointers: len=0 (length of previous longest prefix-suffix), i=1. 3. If pattern[i] == pattern[len]: lps[i] = len+1, increment both i and len. 4. Else if len > 0: len = lps[len-1] (try a shorter prefix — do NOT increment i). 5. Else: lps[i] = 0, increment i.
Step 2 — Search the text using the LPS array: 1. Use two pointers: i (text index), j (pattern index), both start at 0. 2. If text[i] == pattern[j]: increment both i and j. 3. If j == len(pattern): match found at index i-j. Set j = lps[j-1]. 4. Else if i < len(text) and text[i] != pattern[j]: - If j > 0: j = lps[j-1] (skip via failure function, do NOT move i). - Else: increment i.
Time complexity: O(n + m) where n = text length, m = pattern length. The LPS build is O(m) and the search is O(n).
- The LPS array measures the 'slack' in the rope — how far it can recoil.
- If pattern[0..k] matches the suffix of the matched prefix, you can skip k characters of the text.
- The text pointer never goes backward — it's a unidirectional scan with smart jumps on the pattern side.
- KMP works because the pattern knows its own symmetry — it doesn't need to re-examine what it already learned.
Visual LPS Table Construction
The LPS array is built by walking through the pattern and determining the longest proper prefix that is also a suffix for each prefix of the pattern. The algorithm maintains a length variable that tracks the length of the previous longest prefix-suffix. At each step, the table shows the current state of the LPS array and the positions of pattern[i] and pattern[len]. This visual breaks down the construction for the pattern 'ABABCAB' (length 7) step by step.
- i=0: lps[0]=0 by definition. len=0.
- i=1: pattern[1]='B' != pattern[0]='A' and len=0 → lps[1]=0.
- i=2: pattern[2]='A' == pattern[0]='A' → len=1, lps[2]=1.
- i=3: pattern[3]='B' == pattern[1]='B' → len=2, lps[3]=2.
- i=4: pattern[4]='C' != pattern[2]='A' and len>0 → len=lps[1]=0; then pattern[4]='C' != pattern[0]='A' → lps[4]=0.
- i=5: pattern[5]='A' == pattern[0]='A' → len=1, lps[5]=1.
- i=6: pattern[6]='B' == pattern[1]='B' → len=2, lps[6]=2.
The diagram below shows the LPS array evolving after each comparison.
Worked Example — KMP Search for 'ABCAB' in 'ABABCABCABAB'
Pattern: ABCAB Text: ABABCABCABAB
Step 1 — Build LPS for pattern 'ABCAB': i=0: lps[0]=0 (always) i=1: pattern[1]='B' vs pattern[0]='A'. No match. lps[1]=0. i=2: pattern[2]='C' vs pattern[0]='A'. No match. lps[2]=0. i=3: pattern[3]='A' vs pattern[0]='A'. Match! lps[3]=1, len=1. i=4: pattern[4]='B' vs pattern[1]='B'. Match! lps[4]=2, len=2. LPS = [0, 0, 0, 1, 2]
Step 2 — Search (i=text index, j=pattern index): i=0,j=0: text[0]='A'==pattern[0]='A'. i=1,j=1. i=1,j=1: text[1]='B'==pattern[1]='B'. i=2,j=2. i=2,j=2: text[2]='A'!=pattern[2]='C'. j=lps[1]=0. i=2,j=0: text[2]='A'==pattern[0]='A'. i=3,j=1. i=3,j=1: text[3]='B'==pattern[1]='B'. i=4,j=2. i=4,j=2: text[4]='C'==pattern[2]='C'. i=5,j=3. i=5,j=3: text[5]='A'==pattern[3]='A'. i=6,j=4. i=6,j=4: text[6]='B'==pattern[4]='B'. j==5==len(pattern)! MATCH at index i-j=6-5=1... wait, i-j=6-5=1? No: i=7 after increment, j=5 equals pattern length. Match at i-j=7-5=2. j=lps[4]=2. Continue search from i=7,j=2... i=7,j=2: text[7]='C'==pattern[2]='C'. i=8,j=3. i=8,j=3: text[8]='A'==pattern[3]='A'. i=9,j=4. i=9,j=4: text[9]='B'==pattern[4]='B'. Match at i-j=10-5=5. j=lps[4]=2.
Matches found at indices 2 and 5 in the text.
Notice that after finding a match, KMP uses lps[j-1] to slide the pattern forward by the right amount — it does not restart from the beginning of the pattern.
KMP Implementation
The two-function structure (build_lps and kmp_search) matches the two-phase algorithm exactly. Study build_lps carefully — the len backtrack using lps[len-1] is the subtlest part.
m==0 must be handled separately — an empty pattern should return no matches, but many implementations incorrectly treat it as a match at every position.C++ Implementation
The C++ implementation of KMP mirrors the Python version but requires careful handling of array indices and the std::vector container. The LPS construction and search functions are split similarly. Key differences: C++ uses size_t for indices, and the LPS array is built with a while loop that may not increment i in every iteration. The search function returns a vector of match positions. Below is a complete, production-ready C++ implementation that follows the same logic as the Python reference.
std::vector<int> for dynamic arrays rather than raw int* allocations. The same off-by-one bugs apply: double-check the fallback index lps[j-1]. Also note that C++ string indices are size_t, so casting to int for the LPS vector is safe for typical problem sizes, but avoid using signed/unsigned mismatches in larger loops.Common KMP Mistakes and How to Debug Them
Even experienced engineers slip up on KMP's fallback logic. The algorithm has three high-risk areas: the LPS backbuilding loop, the mismatch fallback in search, and the match-continuation handler.
- Off-by-one in LPS fallback: When len is not equal to 0 and pattern[i] != pattern[len], you must set len = lps[len-1]. Using lps[len] or lps[len-2] produces wrong values that cascade into missed matches.
- Confusing j and i in search: After a mismatch, you must NOT increment i when j > 0. This is the most common infinite-loop trigger. Only increment i when j == 0.
- After a match, setting j = 0 instead of j = lps[j-1]: This kills overlap detection. If your pattern is 'AA' and text is 'AAA', you'll miss the match at index 1.
- Forgetting that LPS lengths are always < j: lps[j-1] is always less than j. If your fallback index ever equals or exceeds j, you have a bug in LPS construction.
KMP Variants, Complexity, and When to Use Alternatives
KMP is not always the fastest string matching algorithm for every scenario. Here's how it compares:
- Boyer-Moore: Usually faster for large alphabets because it can skip more characters per mismatch (bad character and good suffix heuristics). Worst-case O(n*m) but rarely hits it.
- Rabin-Karp: Uses rolling hash. Good for multiple pattern search (plagiarism detection), but O(n*m) worst-case from hash collisions.
- Aho-Corasick: Builds a finite automaton from multiple patterns simultaneously. Ideal for virus signature detection or filtering many keywords.
- Z-algorithm: Computes an array Z[i] = longest substring starting at i that matches prefix. Can be used to find all matches in O(n+m) with simpler code than KMP.
KMP remains the best choice when you need guaranteed linear time and low overhead for a single pattern search. It's also the foundation for understanding more complex string algorithms.
Formal O(n) Complexity Proof
The KMP search algorithm guarantees O(n) time because the outer while loop runs at most n times where n is the text length. The key invariant: the text index i never decreases; it either increments (when a character matches or when j=0 and mismatch occurs) or stays the same (when j>0 and a mismatch occurs, causing j to fallback). However, each time j falls back, j decreases by at least 1, and j can only increase when i increments. Since j is bounded by m (pattern length), the total number of fallbacks across the entire search is also bounded by the total number of i increments, which is at most n. Therefore the total number of operations in the search loop is O(n).
More formally: let the search loop have two types of operations: - Type A: i increases by 1 (and j may increase at most 1). This can happen at most n times. - Type B: j decreases by at least 1 (via lps[j-1]). Since j starts at 0 and never exceeds m, and each increase of j (in Type A) is paired with an increase in i, the total number of Type B operations is at most the total number of Type A operations, i.e., at most n. Hence total operations ≤ 2n = O(n).
The LPS construction has an analogous proof: the pointer i only moves forward (at most m times), and the variable len only decreases when backtracking. Each backtrack reduces len by at least 1, and each increase of len is paired with an increment of i. So total backtrack steps ≤ m, giving O(m) for the build phase. Thus overall O(n+m).
String Period and Compression with KMP
One powerful application of the LPS (prefix function) is computing the smallest period of a string. The period of a string s is the smallest integer p such that s[i] = s[i+p] for all i where i+p < |s|. This is equivalent to saying that s is made up of repetitions of a substring of length p. Using the LPS array (or equivalently the prefix function), the smallest period of s is |s| - LPS[|s|-1] if |s| % (|s| - LPS[|s|-1]) == 0, otherwise the string is not periodic.
More concretely: let π denote the prefix function (same as LPS but defined for the whole string, not just prefixes). Then the length of the smallest period is n - π[n-1] if n % (n - π[n-1]) == 0. For example, for s = "ABABAB", n=6, LPS last value = 4 (since "ABAB" is prefix-suffix), period = 6-4=2. Indeed "AB" repeated 3 times.
This property allows string compression: a periodic string can be stored as a base substring and a repeat count, reducing memory. It also helps in finding repeated substrings in DNA sequences or log patterns. In competitive programming, you can solve problems like "Find the smallest period of a string" or "Check if a string can be formed by repeating a substring" using this technique.
Practice Problems
Solidify your understanding of KMP by solving these problems that leverage the algorithm and its variants. They range from direct application to more creative uses of the prefix function. Attempt them in order of difficulty and test against edge cases like overlapping patterns.
Naive Algorithm — The Baseline You Should Outgrow
The naive approach is simple: slide the pattern over the text and check every character. Start at index 0 of the text. Compare pat[0] with txt[0]. If they match, move both pointers. If they mismatch, reset the pattern pointer to 0 and slide the text pointer to the next starting position. Repeat until you've covered the text.
Why does this fail? Redundant re-scans. When you hit a mismatch, you throw away all the progress you made. If the pattern has repeated prefixes, you re-compare characters you already matched. Take pattern "AAAAB" and text "AAAAAAAAAB". The naive approach compares each position from scratch, yielding O(n*m) time. Wasteful.
You need to know this because it frames the problem KMP solves. KMP doesn't invent new magic — it just refuses to redo work it already did. The naive algorithm is the production trap every junior falls into: "It works on small data, so it's fine." Until your log files hit a gigabyte.
The complexity is O(n*m) worst-case. For n = 10^6 and m = 10^4, that's 10^10 operations — seconds become hours.
How Does KMP Actually Work? — The Mechanism That Eliminates Backtracking
KMP works because it exploits the pattern's own structure. When a mismatch occurs, the naive algorithm resets the pattern pointer to 0. KMP says: "I already matched some characters. Let me reuse that match."
The key is the LPS array (Longest Proper Prefix which is also Suffix). This array tells you, for each prefix of the pattern, the length of the longest proper prefix that is also a suffix of that prefix. Why should you care? When you encounter a mismatch, you don't start from zero. You jump to the length stored in LPS[j-1] and continue comparing from there. That's the entire trick.
Here's the flow: scan text with index i and pattern with index j. If characters match, increment both. If mismatch and j > 0, set j = lps[j-1] — don't move i. If mismatch and j == 0, increment i. That's it. The text pointer never goes backward. That's what gives you O(n).
Think of it as a state machine: the pattern is the state, the LPS table is the transition table. When you fail, you transition to an earlier state that still matches the suffix you've already seen. You never re-examine the text.
The LPS construction also runs in O(m) using a similar idea. It's self-referential — you build it by comparing the pattern to itself. Any overlap you find becomes the jump distance.
What the LPS Table Buys You — A Preprocessing Overview That Actually Matters
The LPS (Longest Prefix Suffix) table is the engine of KMP. Without it, you're just doing a clever backtrack. With it, you skip characters that you already know match, shifting the pattern without re-scanning the text. The preprocessing step constructs this table in O(m) time, where m is the pattern length. That's it — linear time to build, constant-time lookups per mismatch.
Here's the mental model: for each position in the pattern, you ask, "If I fail here, how far back can I jump?" The answer is the length of the longest proper prefix that is also a suffix of the substring ending at that position. Don't compute this on the fly — precompute it. Every production KMP implementation does this upfront. The table is small, cheap, and separates the amateurs from engineers who ship code that doesn't degrade on repeated searches.
Why this matters in production: if your pattern is fixed and you search millions of documents, preprocessing is a one-time cost. You amortize O(m) over O(n) per search. The naive alternative re-scans text, creating O(n*m) worst-case. That's the difference between a microservice handling 10k requests per second and one that dies under load.
Real-World KMP—Where String Matching Actually Pays the Bills
KMP isn't just an interview chestnut. It's baked into tools you use daily. grep? Uses Boyer-Moore mostly, but KMP variants pop up in libc strstr implementations on embedded systems where memory is tight and patterns are short. Plagiarism checkers? They tokenize documents into strings, then use KMP to find exact substring matches across submissions. Network intrusion detection systems (like Snort) scan packet payloads for attack signatures — KMP gives guaranteed O(n) per packet, no worst-case explosion.
Text editors. When you hit Ctrl+F in Vim or VS Code, the incremental search isn't simple linear scan. It uses variants like Knuth-Morris-Pratt or Two-Way string matching, which bounds backtracking to O(pattern length). If you're building a search feature for a database index or log aggregation tool, KMP is your safety net when the input isn't random — adversarial logs with repeated patterns won't tank your latency.
The real takeaway: use KMP when you need predictable latency on arbitrary input. For random text with average case good, Boyer-Moore is faster. But production systems must handle edge cases — someone WILL paste a million 'a's followed by 'b'. KMP handles that in O(n).
u_strstr() do exactly this — switch strategies based on input size.LPS Off-by-One: The Silent Match Fail in Production Log Parsing
j = lps[j] to j = lps[j-1]. Added a randomised fuzz test that generates random patterns and texts and compares against a naive search — catches exactly these edge cases.- Always test string matching implementations against patterns with repeating prefixes and suffixes.
- Use randomised testing to uncover algorithm bugs that simple unit tests miss.
- The LPS array is indexed by position; after mismatch at j, the fallback is lps[j-1], not lps[j] — write it down on a sticky note until it's muscle memory.
print(build_lps('ABCAB')) # Expected: [0,0,0,1,2]print(kmp_search('ABABCABCABAB', 'ABCAB')) # Expected: [2,5]Key takeaways
Common mistakes to avoid
4 patternsUsing lps[j] instead of lps[j-1] on mismatch
j = lps[j-1] when j > 0 and mismatch occurs. The LPS array is 0-indexed; the longest prefix-suffix for the matched prefix is stored at position j-1.Not incrementing i when j == 0 and mismatch occurs
Setting j = 0 after a match instead of j = lps[j-1]
Incorrect LPS build when pattern[i] != pattern[len] and len == 0
Practice These on LeetCode
Interview Questions on This Topic
Explain what the LPS array stores and how it is computed.
len tracks the length of the previous longest prefix-suffix, and i iterates. When pattern[i] matches pattern[len], we extend: lps[i]=len+1, increment both. When it doesn't match and len>0, we backtrack: len = lps[len-1], keeping i unchanged. Otherwise, lps[i]=0 and i increments.Frequently Asked Questions
20+ years shipping performance-critical code where algorithms decide the bill. Drawn from code that ran under real load.
That's Strings. Mark it forged?
13 min read · try the examples if you haven't