Z-Algorithm — Direct Match Lengths vs KMP
- Z[i] = length of the longest prefix of S that matches S starting at position i.
- For pattern matching: concatenate pattern + '$' + text. Any position with Z[i] == m is a match.
- The Z-box [l,r] allows O(n) construction by reusing previously computed matches.
Imagine a string like 'aabxaabaabx'. The Z-algorithm asks: at each position i, how long is the longest substring starting at i that matches the very beginning of the string? Position 4 gives 'aabx' which matches the start — so Z[4]=4. This single idea, computed efficiently for every position, is enough to find any pattern in any text.
The Z-algorithm is competitive programming's go-to string tool. It solves the same problem as KMP in O(n+m), but the Z-array is directly interpretable — you see match lengths explicitly rather than reasoning about failure functions. Engineers who learned through competitive programming reach for Z-algorithm first because it is memorisable under interview pressure.
The Z-box insight that makes it linear is the exact same structural trick as KMP's failure function: maintain a rightmost matched boundary, initialise from the mirror position, only expand what has not yet been proven. Once you see this pattern, you recognise it in KMP, Manacher, and suffix array construction.
How the Z-Algorithm Works — Plain English and Steps
The Z-algorithm computes a Z-array for a string S where Z[i] is the length of the longest substring starting at S[i] that is also a prefix of S. For example, in S='aabxaa', Z[4]=2 because S[4..5]='aa' matches the prefix 'aa'.
For pattern matching, concatenate pattern + '$' + text (the '$' is a separator not in either string). Compute the Z-array. Any position i in the text portion where Z[i] == len(pattern) is a match.
Algorithm to build the Z-array in O(n): 1. Z[0] is undefined (or set to n by convention). Maintain a window [l, r] — the rightmost Z-box found so far. 2. For each i from 1 to n-1: a. If i > r: compute Z[i] naively by matching S[i..] against S[0..]. Update l=i, r=i+Z[i]-1. b. Else (i is inside the current Z-box [l,r]): - Let k = i - l (position of i within the box, mirrored in prefix). - If Z[k] < r - i + 1: Z[i] = Z[k] (the match is entirely inside the box). - Else: extend naively from r+1 and update l and r.
The Z-box [l,r] allows us to reuse previous match results — this is what gives O(n) time instead of O(n^2).
Worked Example — Z-Array for 'aabcaab'
String S = 'aabcaab' (indices 0-6)
Z[0] = 7 by convention (entire string matches itself).
i=1: S[1..]='abcaab'. Match against prefix 'aabcaab'. S[1]='a'==S[0]='a', S[2]='b'==S[1]='a'? No. Z[1]=1. l=1,r=1. i=2: S[2..]='bcaab'. S[2]='b'!=S[0]='a'. Z[2]=0. i=3: S[3..]='caab'. S[3]='c'!=S[0]='a'. Z[3]=0. i=4: i>r(=1). S[4..]='aab'. S[4]='a'==S[0], S[5]='a'==S[1], S[6]='b'==S[2]? Yes! Z[4]=3. l=4,r=6. i=5: i=5 is inside [4,6]. k=5-4=1. Z[k]=Z[1]=1. Check: r-i+1=6-5+1=2. Z[1]=1 < 2. Z[5]=Z[1]=1. i=6: i=6 is inside [4,6]. k=6-4=2. Z[k]=Z[2]=0. Z[2]=0 < r-i+1=1. Z[6]=0.
Z-array: [7, 1, 0, 0, 3, 1, 0]
Pattern matching example: find 'aa' in text 'aabcaab'. Concatenated: 'aa$aabcaab' (length 10) Z-array: [10,1,0,2,1,0,0,3,1,0] Pattern length = 2. Look for Z[i] == 2 in the text portion (indices >= 3). Z[3]=2 -> match at text index 3-3=0. Z[7]=3 >= 2 -> match at text index 7-3=4. Matches at indices 0 and 4 in the original text.
Z-Algorithm Implementation
The implementation maintains the rightmost Z-box [l, r] to avoid redundant comparisons.
def build_z_array(s): n = len(s) z = [0] * n z[0] = n l = r = 0 for i in range(1, n): if i < r: z[i] = min(r - i, z[i - l]) while i + z[i] < n and s[z[i]] == s[i + z[i]]: z[i] += 1 if i + z[i] > r: l, r = i, i + z[i] return z def z_search(text, pattern): if not pattern: return [] combined = pattern + '$' + text z = build_z_array(combined) m = len(pattern) return [i - m - 1 for i in range(m + 1, len(combined)) if z[i] == m] text = 'aabcaab' pattern = 'aa' print('Z-array:', build_z_array('aabcaab')) print('Matches at:', z_search(text, pattern))
Matches at: [0, 4]
🎯 Key Takeaways
- Z[i] = length of the longest prefix of S that matches S starting at position i.
- For pattern matching: concatenate pattern + '$' + text. Any position with Z[i] == m is a match.
- The Z-box [l,r] allows O(n) construction by reusing previously computed matches.
- Time complexity: O(n+m). Space: O(n+m) for the combined string and Z-array.
- Conceptually simpler than KMP — one array instead of two phases.
Interview Questions on This Topic
- QWhat does Z[i] represent in the Z-array?
- QWhy is the Z-algorithm O(n) despite having a while loop inside a for loop?
- QHow do you use the Z-algorithm to find all occurrences of a pattern in a text?
- QWhat is the minimum period of 'abcabcabcabc' — find it using the Z-array.
Frequently Asked Questions
What is the Z-array used for besides pattern matching?
Z-arrays are used for: finding all occurrences of a pattern, computing string periods (the smallest period of S is the smallest p where Z[p] = n-p), string compression problems, and as a building block for suffix arrays.
How does the Z-algorithm compare to KMP?
Both are O(n+m) for pattern matching. Z-algorithm is arguably simpler to implement correctly once you understand the Z-box concept. KMP's failure function is more widely taught in interviews. For pattern matching, they are equivalent in performance. Z-algorithm generalizes more naturally to multiple pattern problems when combined with Aho-Corasick.
Why use '$' as a separator in pattern matching?
The separator prevents the Z-values at the junction of pattern and text from being inflated by matching across the boundary. The separator character must not appear in either the pattern or the text. Any character not in the alphabet works — '$' and '#' are common choices.
Developer and founder of TheCodeForge. I built this site because I was tired of tutorials that explain what to type without explaining why it works. Every article here is written to make concepts actually click.