Z-Algorithm for Pattern Matching β Explained with Examples
- Z[i] = length of the longest substring starting at i that matches the prefix of the string.
- The Z-box [l,r) enables O(n) computation by reusing previous comparisons.
- Pattern matching: concatenate P + '$' + T, then positions where Z[i] == len(P) are matches.
The Z-algorithm is competitive programming's go-to string tool. It solves the same problem as KMP in O(n+m), but the Z-array is directly interpretable β you see match lengths explicitly rather than reasoning about failure functions. Engineers who learned through competitive programming reach for Z-algorithm first because it is memorisable under interview pressure.
The Z-box insight that makes it linear is the exact same structural trick as KMP's failure function: maintain a rightmost matched boundary, initialise from the mirror position, only expand what has not yet been proven. Once you see this pattern, you recognise it in KMP, Manacher, and suffix array construction.
The Z-Array
For a string S of length n, the Z-array Z has Z[i] = length of the longest substring starting at S[i] that matches a prefix of S. By convention, Z[0] is undefined (or set to n, the whole string).
Example for S = 'aabxaabxcaabx': - Z[0] = 13 (undefined / whole string) - Z[1] = 1 (a matches prefix 'a') - Z[2] = 0 ('b' β 'a') - Z[3] = 0 ('x' β 'a') - Z[4] = 4 ('aabx' matches prefix 'aabx') - Z[5] = 3 ('abx...' wait, 'a' then 'b'? No β 'abxc' β Z[5]=3)
The Z-algorithm computes this entire array in O(n) time using the Z-box concept.
def build_z_array(s: str) -> list[int]: n = len(s) z = [0] * n z[0] = n l, r = 0, 0 # current Z-box window [l, r) for i in range(1, n): if i < r: # i is inside the Z-box β use previously computed value z[i] = min(r - i, z[i - l]) # Try to extend the match while i + z[i] < n and s[z[i]] == s[i + z[i]]: z[i] += 1 # Update Z-box if extended if i + z[i] > r: l, r = i, i + z[i] return z print(build_z_array('aabxaabxcaabx')) # [13, 1, 0, 0, 8, 1, 0, 0, 0, 4, 1, 0, 0]
The Z-Box β Why it's O(n)
The key to O(n) is the Z-box [l, r) β a window representing the rightmost Z-match found so far. When we compute Z[i] and i is inside [l, r), we already know that s[i..r) matches s[i-l..r-l). So we can initialise Z[i] = min(r-i, Z[i-l]) without re-comparing those characters.
Characters are only compared when we try to extend beyond r. Each successful extension moves r rightward. Since r only moves forward and never back, the total number of character comparisons is O(n).
Pattern Matching with the Concatenation Trick
To find all occurrences of pattern P in text T, create the combined string S = P + '$' + T where '$' is a separator character not in either string. Build the Z-array of S. Any position i in the text portion where Z[i] == len(P) is a match.
The separator '$' ensures Z[i] never exceeds len(P) for positions in the text portion.
def z_search(text: str, pattern: str) -> list[int]: m = len(pattern) # Concatenate: pattern + sentinel + text combined = pattern + '$' + text z = build_z_array(combined) matches = [] for i in range(m + 1, len(combined)): if z[i] == m: matches.append(i - m - 1) # position in original text return matches print(z_search('AABAACAADAABAABA', 'AABA')) # [0, 9, 12] print(z_search('abcabcabc', 'abc')) # [0, 3, 6]
[0, 3, 6]
String Periodicity with Z-Array
The Z-array directly reveals a string's periodic structure. A string S of length n has a period p if S[i] == S[i % p] for all i. The minimum period can be found using the Z-array β the smallest p such that Z[p] == n-p (or Z[p] + p >= n).
def min_period(s: str) -> int: n = len(s) z = build_z_array(s) for p in range(1, n): if z[p] + p >= n: # s[p..] matches prefix of length n-p return p return n # no period shorter than n print(min_period('abababab')) # 2 print(min_period('abcabcabc')) # 3 print(min_period('abcdef')) # 6
3
6
Complexity and Comparison with KMP
Time: O(n) to build Z-array, O(n+m) total for pattern search Space: O(n+m) for the combined string and Z-array
Z vs KMP: - Same asymptotic complexity - Z-array is more direct β Z[i] directly tells you match length - KMP failure function requires more careful reasoning about what it encodes - Z uses more memory (O(n+m) vs O(m) for KMP) - For competitive programming: Z-algorithm often preferred for clarity - For production code: KMP preferred for lower memory
π― Key Takeaways
- Z[i] = length of the longest substring starting at i that matches the prefix of the string.
- The Z-box [l,r) enables O(n) computation by reusing previous comparisons.
- Pattern matching: concatenate P + '$' + T, then positions where Z[i] == len(P) are matches.
- Z-array directly reveals string periodicity β useful in many competitive programming problems.
- Z-algorithm and KMP are equivalent in power; Z is often simpler to implement from memory.
Interview Questions on This Topic
- QWhat does Z[i] represent in the Z-array?
- QWhy is the Z-algorithm O(n) despite having a while loop inside a for loop?
- QHow do you use the Z-algorithm to find all occurrences of a pattern in a text?
- QWhat is the minimum period of 'abcabcabcabc' β find it using the Z-array.
Frequently Asked Questions
Why use '$' as a separator in the concatenation trick?
The separator must be a character not present in either the pattern or text. This ensures the Z-value at any text position cannot exceed len(pattern) β preventing false matches that span the separator boundary.
What happens if we don't use a separator?
Without a separator, Z[i] might exceed len(pattern) by matching into the pattern portion, giving incorrect match positions.
Developer and founder of TheCodeForge. I built this site because I was tired of tutorials that explain what to type without explaining why it works. Every article here is written to make concepts actually click.