Search Autocomplete — Cache Poisoning & Trie Rollback
10 IPs poisoned our autocomplete cache with offensive suggestions in one hour.
- A search autocomplete system predicts completions as users type, serving top-K suggestions within 50ms
- Core data structure: trie for prefix lookups, often combined with prefix hashing for memory/speed trade-off
- Ranking blends global popularity, personal history, and recency — not just frequency
- Caching layers: CDN for static top queries, Redis for hot prefixes, in-memory L1 cache on app servers
- Sharding must preserve prefix locality — consistent hashing on prefix hash minimizes cross-node lookups
- Biggest mistake: assuming depends_on ordering — containers start but DB isn't ready, causing connection pool exhaustion
Imagine you're writing a letter and your friend looks over your shoulder, guessing the next word before you finish. They're not psychic — they've just read thousands of letters and know what words commonly follow others. Search autocomplete is exactly that friend, except instead of one friend it's a system serving millions of people at once, finishing your search query before your fingers can type it.
Every time you type a single character into Google, YouTube, or Amazon's search bar and watch suggestions instantly appear, you're experiencing one of the most latency-sensitive distributed systems in production software. That instant feel isn't magic — it's a carefully engineered pipeline involving in-memory data structures, aggressive multi-layer caching, real-time ranking signals, and sub-50ms response time SLAs enforced globally. Getting it wrong means users feel lag, and lag means they leave. Getting it right means you're invisibly influencing what billions of people search for every day.
The core problem autocomplete solves is bridging the gap between intent and query. Users often don't know exactly what they want to type. A partial string like 'how to fix my' could complete in a hundred directions. The system must predict the most relevant and popular completions in real time, rank them by a blend of global popularity, personal history, and contextual signals — all before the user lifts their finger from the keyboard. It also needs to handle billions of daily queries, top-K ranking, typo tolerance, personalization, and graceful degradation when parts of the system fail.
By the end of this article you'll be able to walk into any senior system design interview and confidently design a production-grade autocomplete system from scratch. You'll understand why a trie beats a relational database for prefix lookups, how to scale it with sharding and caching layers, how ranking signals layer on top of raw prefix matching, and what real failure modes look like in production — including the ones that bite engineers at 3am.
What is a Search Autocomplete System?
A search autocomplete system (also called typeahead) provides real-time search suggestions as the user types. At its core, it's a prefix-matching problem: given a partial input, return the most likely completions. But production reality adds layers: handle billions of unique prefixes daily, rank by multiple signals, support multiple languages, tolerate typos, and do it all under 50ms at p99.
You'll find autocomplete in Google Search, YouTube, Amazon, Twitter, and any large-scale product where reducing friction improves user engagement. The design principles here apply equally to code completion in IDEs and command-line autocomplete.
- Each node represents a character — not a string.
- Shared prefixes share nodes — memory efficient for common prefixes.
- Top-K suggestions per node avoid traversing entire subtree on every keystroke.
- Frequency at leaf nodes aggregates across user sessions.
Data Structures: Trie vs Prefix Hashing
Two dominant approaches for prefix matching: the classic trie and prefix hashing. A trie offers deterministic lookups and easy prefix traversal, but memory can balloon. Prefix hashing uses a hash map keyed by prefix strings (e.g., first 3 characters) and stores lists of completions per key. It's simpler and more cache-friendly but loses prefix-sharing benefits.
Real systems often hybridize: use a compressed trie (radix tree) for hot prefixes and fall back to prefix hash tables for rare prefixes. Google's approach, for example, uses a multi-tiered index with a memory-mapped trie for the top million queries and disk-backed hash tables for long-tail completions. Twitter's autocomplete uses a similar hybrid — in-memory trie for trending topics, disk-based for historical queries.
Ranking: What Order to Show Suggestions?
Users don't just want any completion — they want the most relevant one. Relevance is a blend of signals: global popularity (most searched queries with that prefix), personal history (your past searches), recency (trending queries), and context (location, device). Ranking in real time at 50ms is the hard part.
Most systems precompute a ranking score per query-prefix pair offline using MapReduce or Spark jobs. The score is a weighted sum: w1 log(frequency) + w2 (1 / days_since_last_search) + w3 * personal_boost. Real-time personalization adds an extra fetch from user profile cache at query time, merging personal top queries into the global top-K list before returning.
Caching for Sub-50ms Latency
Autocomplete is the poster child for multi-layer caching. Every microsecond counts. You'll typically see three layers: (1) Browser cache — service worker caches recent autocomplete responses, (2) CDN — edge caches for top 1% of prefixes (e.g., 'a', 'b'), (3) Application cache — Redis or Memcached for hot prefixes not in CDN, (4) In-memory L1 cache in application server — LRU cache with TTL, (5) The trie itself as the source of truth.
Cache invalidation is tricky: when a new query becomes popular (e.g., after a news event), must invalidate all prefixes that contain it. Use a bloom filter to quickly test if a prefix might need invalidation — avoids scanning entire cache.
Sharding and Scaling Across Servers
A single server can't hold the entire autocomplete index for billions of queries. You need to shard. The challenge: prefix lookups cross shards unless you shard intelligently. Common strategy: consistent hashing on the first 2-3 characters of the query. All queries with prefix 'ab' go to shard 7, 'ac' to shard 3, etc. This keeps prefix locality — most lookups hit one shard.
But prefix skew is real: 'a' prefixes are 40% of traffic. Solution: replicate hot shards (e.g., 10 copies of 'a*' shard), and use a load balancer that routes by prefix hash to one of the replicas. For cold prefixes, a single shard per hash range suffices.
To handle high write throughput (ingesting new queries), use a write-ahead log (WAL) before updating the trie, and batch updates to the shard every few seconds. This also handles replica consistency.
Production Incident: Autocomplete Cache Poisoning
A real incident from a large e-commerce platform: autocomplete started showing offensive suggestions for normal queries. Root cause was a cache poisoning attack — attacker sent malicious queries that got cached, then served to all users. The fix required purging all caches and adding input sanitization before caching. This incident led to a new practice: never cache a query's suggestions without validating the output against toxicity filters.
- Always sanitize output before caching: strip profanity, PII, and SQL injection patterns.
- Use a separate freshness score for user-generated queries vs aggregated popular queries.
- Rate-limit query ingestion per user to prevent poisoning at input.
Autocomplete Cache Poisoning at Scale
- Never trust user input — sanitize before storing in any data structure.
- Cache poisoning can happen even without direct write access if the cache is fed by user activity.
- Always have a cache rollback mechanism — versioning is cheap, data loss is expensive.
Key takeaways
Common mistakes to avoid
4 patternsUsing a relational database for prefix lookups
Not handling personalization at query time
Ignoring typo tolerance
No rate limiting on query ingestion
Interview Questions on This Topic
How would you design a search autocomplete system for a global search engine like Google? Focus on data structures, scaling, and ranking.
Frequently Asked Questions
That's Real World. Mark it forged?
4 min read · try the examples if you haven't