Z-Algorithm — Direct Match Lengths vs KMP
Replace KMP's failure function with Z-algorithm's Z-array for exact match lengths — see O(n+m) pattern matching in Python with clear examples.
Imagine a string like 'aabxaabaabx'. The Z-algorithm asks: at each position i, how long is the longest substring starting at i that matches the very beginning of the string? Position 4 gives 'aabx' which matches the start — so Z[4]=4. This single idea, computed efficiently for every position, is enough to find any pattern in any text.
The Z-algorithm is competitive programming's go-to string tool. It solves the same problem as KMP in O(n+m), but the Z-array is directly interpretable — you see match lengths explicitly rather than reasoning about failure functions. Engineers who learned through competitive programming reach for Z-algorithm first because it is memorisable under interview pressure.
The Z-box insight that makes it linear is the exact same structural trick as KMP's failure function: maintain a rightmost matched boundary, initialise from the mirror position, only expand what has not yet been proven. Once you see this pattern, you recognise it in KMP, Manacher, and suffix array construction.
How the Z-Algorithm Works — Plain English and Steps
The Z-algorithm computes a Z-array for a string S where Z[i] is the length of the longest substring starting at S[i] that is also a prefix of S. For example, in S='aabxaa', Z[4]=2 because S[4..5]='aa' matches the prefix 'aa'.
For pattern matching, concatenate pattern + '$' + text (the '$' is a separator not in either string). Compute the Z-array. Any position i in the text portion where Z[i] == len(pattern) is a match.
Algorithm to build the Z-array in O(n): 1. Z[0] is undefined (or set to n by convention). Maintain a window [l, r] — the rightmost Z-box found so far. 2. For each i from 1 to n-1: a. If i > r: compute Z[i] naively by matching S[i..] against S[0..]. Update l=i, r=i+Z[i]-1. b. Else (i is inside the current Z-box [l,r]): - Let k = i - l (position of i within the box, mirrored in prefix). - If Z[k] < r - i + 1: Z[i] = Z[k] (the match is entirely inside the box). - Else: extend naively from r+1 and update l and r.
The Z-box [l,r] allows us to reuse previous match results — this is what gives O(n) time instead of O(n^2).
Worked Example — Z-Array for 'aabcaab'
String S = 'aabcaab' (indices 0-6)
Z[0] = 7 by convention (entire string matches itself).
i=1: S[1..]='abcaab'. Match against prefix 'aabcaab'. S[1]='a'==S[0]='a', S[2]='b'==S[1]='a'? No. Z[1]=1. l=1,r=1. i=2: S[2..]='bcaab'. S[2]='b'!=S[0]='a'. Z[2]=0. i=3: S[3..]='caab'. S[3]='c'!=S[0]='a'. Z[3]=0. i=4: i>r(=1). S[4..]='aab'. S[4]='a'==S[0], S[5]='a'==S[1], S[6]='b'==S[2]? Yes! Z[4]=3. l=4,r=6. i=5: i=5 is inside [4,6]. k=5-4=1. Z[k]=Z[1]=1. Check: r-i+1=6-5+1=2. Z[1]=1 < 2. Z[5]=Z[1]=1. i=6: i=6 is inside [4,6]. k=6-4=2. Z[k]=Z[2]=0. Z[2]=0 < r-i+1=1. Z[6]=0.
Z-array: [7, 1, 0, 0, 3, 1, 0]
Pattern matching example: find 'aa' in text 'aabcaab'. Concatenated: 'aa$aabcaab' (length 10) Z-array: [10,1,0,2,1,0,0,3,1,0] Pattern length = 2. Look for Z[i] == 2 in the text portion (indices >= 3). Z[3]=2 -> match at text index 3-3=0. Z[7]=3 >= 2 -> match at text index 7-3=4. Matches at indices 0 and 4 in the original text.
Z-Algorithm Implementation
The implementation maintains the rightmost Z-box [l, r] to avoid redundant comparisons.
Key takeaways
Interview Questions on This Topic
Frequently Asked Questions
That's Strings. Mark it forged?
3 min read · try the examples if you haven't