Homeβ€Ί DSAβ€Ί Z-Algorithm for Pattern Matching β€” Explained with Examples

Z-Algorithm for Pattern Matching β€” Explained with Examples

Where developers are forged. Β· Structured learning Β· Free forever.
πŸ“ Part of: Strings β†’ Topic 3 of 8
Learn the Z-Algorithm for string pattern matching.
βš™οΈ Intermediate β€” basic DSA knowledge assumed
In this tutorial, you'll learn:
  • Z[i] = length of the longest substring starting at i that matches the prefix of the string.
  • The Z-box [l,r) enables O(n) computation by reusing previous comparisons.
  • Pattern matching: concatenate P + '$' + T, then positions where Z[i] == len(P) are matches.
✦ Plain-English analogy ✦ Real code with output ✦ Interview questions
⚑ Quick Answer
Imagine a string like 'aabxaabaabx'. The Z-algorithm asks: at each position i, how long is the longest substring starting at i that matches the very beginning of the string? Position 4 gives 'aabx' which matches the start β€” so Z[4]=4. This single idea, computed efficiently for every position, is enough to find any pattern in any text.

The Z-algorithm is competitive programming's go-to string tool. It solves the same problem as KMP in O(n+m), but the Z-array is directly interpretable β€” you see match lengths explicitly rather than reasoning about failure functions. Engineers who learned through competitive programming reach for Z-algorithm first because it is memorisable under interview pressure.

The Z-box insight that makes it linear is the exact same structural trick as KMP's failure function: maintain a rightmost matched boundary, initialise from the mirror position, only expand what has not yet been proven. Once you see this pattern, you recognise it in KMP, Manacher, and suffix array construction.

The Z-Array

For a string S of length n, the Z-array Z has Z[i] = length of the longest substring starting at S[i] that matches a prefix of S. By convention, Z[0] is undefined (or set to n, the whole string).

Example for S = 'aabxaabxcaabx': - Z[0] = 13 (undefined / whole string) - Z[1] = 1 (a matches prefix 'a') - Z[2] = 0 ('b' β‰  'a') - Z[3] = 0 ('x' β‰  'a') - Z[4] = 4 ('aabx' matches prefix 'aabx') - Z[5] = 3 ('abx...' wait, 'a' then 'b'? No β€” 'abxc' β†’ Z[5]=3)

The Z-algorithm computes this entire array in O(n) time using the Z-box concept.

z_array.py Β· PYTHON
1234567891011121314151617181920
def build_z_array(s: str) -> list[int]:
    n = len(s)
    z = [0] * n
    z[0] = n
    l, r = 0, 0  # current Z-box window [l, r)

    for i in range(1, n):
        if i < r:
            # i is inside the Z-box β€” use previously computed value
            z[i] = min(r - i, z[i - l])
        # Try to extend the match
        while i + z[i] < n and s[z[i]] == s[i + z[i]]:
            z[i] += 1
        # Update Z-box if extended
        if i + z[i] > r:
            l, r = i, i + z[i]
    return z

print(build_z_array('aabxaabxcaabx'))
# [13, 1, 0, 0, 8, 1, 0, 0, 0, 4, 1, 0, 0]
β–Ά Output
[13, 1, 0, 0, 8, 1, 0, 0, 0, 4, 1, 0, 0]

The Z-Box β€” Why it's O(n)

The key to O(n) is the Z-box [l, r) β€” a window representing the rightmost Z-match found so far. When we compute Z[i] and i is inside [l, r), we already know that s[i..r) matches s[i-l..r-l). So we can initialise Z[i] = min(r-i, Z[i-l]) without re-comparing those characters.

Characters are only compared when we try to extend beyond r. Each successful extension moves r rightward. Since r only moves forward and never back, the total number of character comparisons is O(n).

πŸ”₯
Z-Box vs KMP Failure FunctionBoth the Z-box and KMP's failure function are mechanisms to reuse previously computed match information. The Z-array is often considered more intuitive β€” you directly see how far each position matches the prefix.

Pattern Matching with the Concatenation Trick

To find all occurrences of pattern P in text T, create the combined string S = P + '$' + T where '$' is a separator character not in either string. Build the Z-array of S. Any position i in the text portion where Z[i] == len(P) is a match.

The separator '$' ensures Z[i] never exceeds len(P) for positions in the text portion.

z_search.py Β· PYTHON
12345678910111213
def z_search(text: str, pattern: str) -> list[int]:
    m = len(pattern)
    # Concatenate: pattern + sentinel + text
    combined = pattern + '$' + text
    z = build_z_array(combined)
    matches = []
    for i in range(m + 1, len(combined)):
        if z[i] == m:
            matches.append(i - m - 1)  # position in original text
    return matches

print(z_search('AABAACAADAABAABA', 'AABA'))  # [0, 9, 12]
print(z_search('abcabcabc', 'abc'))           # [0, 3, 6]
β–Ά Output
[0, 9, 12]
[0, 3, 6]

String Periodicity with Z-Array

The Z-array directly reveals a string's periodic structure. A string S of length n has a period p if S[i] == S[i % p] for all i. The minimum period can be found using the Z-array β€” the smallest p such that Z[p] == n-p (or Z[p] + p >= n).

z_periodicity.py Β· PYTHON
1234567891011
def min_period(s: str) -> int:
    n = len(s)
    z = build_z_array(s)
    for p in range(1, n):
        if z[p] + p >= n:  # s[p..] matches prefix of length n-p
            return p
    return n  # no period shorter than n

print(min_period('abababab'))  # 2
print(min_period('abcabcabc')) # 3
print(min_period('abcdef'))    # 6
β–Ά Output
2
3
6

Complexity and Comparison with KMP

Time: O(n) to build Z-array, O(n+m) total for pattern search Space: O(n+m) for the combined string and Z-array

Z vs KMP: - Same asymptotic complexity - Z-array is more direct β€” Z[i] directly tells you match length - KMP failure function requires more careful reasoning about what it encodes - Z uses more memory (O(n+m) vs O(m) for KMP) - For competitive programming: Z-algorithm often preferred for clarity - For production code: KMP preferred for lower memory

🎯 Key Takeaways

  • Z[i] = length of the longest substring starting at i that matches the prefix of the string.
  • The Z-box [l,r) enables O(n) computation by reusing previous comparisons.
  • Pattern matching: concatenate P + '$' + T, then positions where Z[i] == len(P) are matches.
  • Z-array directly reveals string periodicity β€” useful in many competitive programming problems.
  • Z-algorithm and KMP are equivalent in power; Z is often simpler to implement from memory.

Interview Questions on This Topic

  • QWhat does Z[i] represent in the Z-array?
  • QWhy is the Z-algorithm O(n) despite having a while loop inside a for loop?
  • QHow do you use the Z-algorithm to find all occurrences of a pattern in a text?
  • QWhat is the minimum period of 'abcabcabcabc' β€” find it using the Z-array.

Frequently Asked Questions

Why use '$' as a separator in the concatenation trick?

The separator must be a character not present in either the pattern or text. This ensures the Z-value at any text position cannot exceed len(pattern) β€” preventing false matches that span the separator boundary.

What happens if we don't use a separator?

Without a separator, Z[i] might exceed len(pattern) by matching into the pattern portion, giving incorrect match positions.

πŸ”₯
Naren Founder & Author

Developer and founder of TheCodeForge. I built this site because I was tired of tutorials that explain what to type without explaining why it works. Every article here is written to make concepts actually click.

← PreviousKMP Algorithm β€” Knuth-Morris-Pratt String MatchingNext β†’Boyer-Moore String Search Algorithm
Forged with πŸ”₯ at TheCodeForge.io β€” Where Developers Are Forged