TinyURL generates short codes via Base62 encoding of a unique 64-bit ID, guaranteeing no collisions.
The system is read-heavy (100:1 ratio) — choose NoSQL (Cassandra) with Redis LRU caching.
Distributed ID generation (Snowflake/ZooKeeper) is the backbone for collision-free scale.
301 redirects let browsers cache the mapping, reducing server load; 302 redirects pass through for analytics.
Biggest mistake: using MD5 hashing for code generation — collisions force retry loops at scale.
✦ Definition~90s read
What is Design TinyURL?
This article tackles the TinyURL system design interview question, but not as a simple URL shortening exercise. The real test is your ability to handle a cache stampede — the moment a shortened link goes viral and thousands of requests hit your service simultaneously before the cache is warm.
★
Imagine every website address is a long home address like '123 Sunflower Lane, Apartment 4B, Springfield, Illinois, 62701, USA'.
Most candidates can describe Base62 encoding and hash-based key generation, but they fail to explain how to survive the first 10 seconds of a Twitter-scale spike. This article focuses on that failure mode: what happens when your Redis cluster gets hammered by 100k concurrent reads for a key that doesn't exist yet, and every request falls through to the database, taking it down in seconds.
You'll learn concrete strategies like request coalescing, pre-warming caches via analytics pipelines, and using distributed ID generation (Snowflake-style) to avoid collision and enable sharding. The article also covers why you'd choose hashing over Base62 for real-world systems (hint: Base62 is a toy for interviews, not production), and how to build a click-tracking pipeline that doesn't degrade write performance during a viral event.
By the end, you'll understand that TinyURL design is a microcosm of distributed systems failure modes — not just a CRUD app with short strings.
Plain-English First
Imagine every website address is a long home address like '123 Sunflower Lane, Apartment 4B, Springfield, Illinois, 62701, USA'. TinyURL is like a nickname system — you tell the post office 'call that address #XK9' and now anyone who says '#XK9' gets redirected to the full address instantly. The post office (the server) keeps a giant lookup book that maps short nicknames to long addresses. That's the whole system — a glorified, globally-distributed lookup book that has to handle billions of lookups per day without breaking a sweat.
Every senior engineer has sat across from an interviewer who says 'design a URL shortener' with a calm smile. It sounds trivial — take a long URL, make it short. But behind that smile is a question that probes distributed systems, database design, caching strategy, hash collision handling, rate limiting, analytics, and horizontal scaling simultaneously. Bit.ly processes over 600 million redirects per day. TinyURL has been alive since 2002. These systems are deceptively simple on the surface and genuinely hard to build correctly at scale.
The core problem is a deceptively asymmetric one: writes are rare, reads are overwhelmingly frequent. When you shorten a URL, that's a one-time write. But that short link might be embedded in a viral tweet and hit 10 million times in an hour. Your design has to reflect this read-heavy reality — every architectural choice from your hashing scheme to your cache eviction policy flows from that single insight.
By the end of this article you'll be able to walk into any system design interview and design TinyURL end-to-end: justify your short code generation strategy, design a DB schema that survives traffic spikes, build a caching layer that handles 99% of reads from memory, handle custom aliases and expiration, discuss analytics pipelines, and correctly answer every follow-up an interviewer throws at you. Let's build it.
Why TinyURL Design Tests More Than URL Shortening
The TinyURL design interview asks you to architect a URL shortening service — a system that maps long URLs to short, unique aliases and redirects clients on access. The core mechanic is a key-value lookup: given a short key (e.g., 7 characters from base62), return the original URL and issue an HTTP 302 redirect. This problem is a systems design classic because it forces you to reason about read-heavy workloads, collision-free key generation, and caching under extreme traffic.
In practice, the service must handle billions of writes (new URLs) and tens of billions of reads (redirects). Key properties that matter: key generation must be idempotent and collision-resistant (using distributed counters or pre-generated keys), redirect latency must stay under 10ms at P99, and the system must survive traffic spikes from viral links. A naive cache with a single Redis instance will collapse under a cache stampede when a popular link goes viral — every miss triggers a database read, overwhelming the DB and causing cascading failures.
You use this design pattern when you need a globally unique, short identifier for a resource and expect asymmetric read/write ratios (100:1 or higher). It matters in real systems because the same principles apply to CDN edge caching, distributed ID generation (Snowflake), and rate-limited API gateways. Getting the cache invalidation and key distribution wrong is the #1 cause of production outages in URL shorteners.
Cache Stampede Is Not a Cache Miss
A cache stampede occurs when thousands of concurrent requests miss cache simultaneously — the DB sees a sudden flood, not a single miss.
Production Insight
A viral tweet drives 50k req/s to a single short link; the cache layer (Redis) has a TTL of 1 hour, but the link was created 59 minutes ago — all requests miss simultaneously and hit the database.
The symptom: database connection pool exhaustion, 5xx errors for all redirects, and a 10-minute outage until the cache repopulates.
Rule of thumb: never rely on TTL alone for viral links — use a background reaper or probabilistic early expiration (e.g., XFetch) to refresh the cache before expiry.
Key Takeaway
Key generation must be collision-free and idempotent — use a distributed counter or pre-generated key pool, not random strings.
Cache stampede is the primary failure mode — design for it with early refresh, not just longer TTLs.
Redirect latency is the SLA — every hop (DNS, cache, DB) must be optimized for sub-10ms P99.
thecodeforge.io
TinyURL Design: Cache Stampede & Viral Link Failures
Design Tinyurl Interview
The Core Logic: Base62 Encoding vs. Hashing
In a URL shortener, the 'Magic' is how we generate the tiny string. You have two main paths: Hashing (MD5/SHA-256) or Base62 Encoding a unique ID. Hashing often leads to collisions that require complex 'check-and-retry' logic. The industry-standard approach is to use a distributed ID generator (like a Snowflake ID or a centralized Range Manager) and convert that numeric ID into a Base62 string (a-z, A-Z, 0-9).
For example, an ID like 125 converted to Base62 results in a short, predictable, and unique string. To prevent predictability (so people can't guess the 'next' URL), we can add a bit of salt or shuffle our Base62 alphabet.
io.thecodeforge.shortener.Base62Encoder.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
package io.thecodeforge.shortener;
/**
* TheCodeForgeProduction-GradeBase62Encoder
* Converts a unique LongID into a 7-character short code.
*/
publicclassBase62Encoder {
privatestaticfinalStringALPHABET = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";
privatestaticfinalintBASE = ALPHABET.length();
publicstaticStringencode(long id) {
StringBuilder sb = newStringBuilder();
while (id > 0) {
sb.append(ALPHABET.charAt((int) (id % BASE)));
id /= BASE;
}
// Pad to ensure consistent length if required by business logicwhile (sb.length() < 7) {
sb.append(ALPHABET.charAt(0));
}
return sb.reverse().toString();
}
publicstaticvoidmain(String[] args) {
long uniqueId = 56800235584L; // Example ID from a distributed generatorSystem.out.println("Short Code for " + uniqueId + ": " + encode(uniqueId));
}
}
Output
Short Code for 56800235584: dXp8Baa
Forge Tip: Collision Prevention
If you use MD5, even the first 7 characters will eventually collide. Using a Counter-based approach with Base62 encoding guarantees uniqueness as long as your counter is globally unique (e.g., using ZooKeeper to manage ID ranges).
Production Insight
Using MD5 for short code generation leads to collision retries that spike write latency from <10ms to >100ms at scale.
A production system using MD5 with retries once hit a 50% fail rate under 100k writes/min because collision checks consumed DB connections.
Rule: always use a unique ID generator + deterministic Base62 encoding for write paths.
Key Takeaway
Base62 encoding of a unique ID is the most robust way to generate short codes.
Hashing leads to collisions that become a scaling bottleneck.
Rule: never use hash-based codes for a shortener beyond prototyping.
Choosing Code Generation Strategy
IfNeed deterministic, collision-free codes
→
UseUse Base62 encoding of a globally unique 64-bit ID
IfNeed stateless, no external ID generator
→
UseUse hash (MD5/SHA) with retry logic — but accept collision overhead
IfUser wants a custom alias (e.g., /mybrand)
→
UseReserve a namespace in DB, check uniqueness, allow manual input
Data Layer Strategy: Handling Scale and Redirection
Since this is a read-heavy system (100:1 read/write ratio), our database choice and caching strategy are critical. We use a NoSQL database like Cassandra or a sharded MongoDB for the URL mappings because we don't need complex joins—just a simple Key-Value lookup.
To achieve sub-millisecond redirects, we put a Redis cache in front of the database. We use an LRU (Least Recently Used) eviction policy because in the real world, 20% of the links (the viral ones) will generate 80% of the traffic.
SchemaDesign.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
-- io.thecodeforge.shortener - Database Schema-- Optimized for NoSQL or Sharded SQLCREATETABLE io_thecodeforge.url_mapping (
short_key VARCHAR(7) PRIMARYKEY,
original_url TEXTNOTNULL,
user_id BIGINT,
created_at TIMESTAMPDEFAULT CURRENT_TIMESTAMP,
expires_at TIMESTAMP,
click_count BIGINTDEFAULT0
);
-- Secondary Index for User ManagementCREATEINDEX idx_user_urls ON io_thecodeforge.url_mapping(user_id);
Output
Table created. In production, 'short_key' would be the shard key.
Interview Gold:
Mention 301 vs 302 redirects. Use 301 (Permanent) if you want the browser to cache the redirect and reduce server load. Use 302 (Temporary) if you need to track every single click for analytics.
Production Insight
A 302 redirect on a viral link can cause up to 10x more requests to your server than a 301.
Bit.ly uses 301 for the first hit to let browsers cache, then uses 302 for subsequent requests with a cookie to track users.
Trade-off: 301 means you lose ability to update the destination URL without changing the short code.
Key Takeaway
Choose your HTTP status code based on analytics needs.
301 reduces load at the cost of flexibility.
Rule: for production shorteners, prefer 302 with a smart client-side caching strategy.
Redirect Status Code Decision
IfNeed high throughput, minimal analytics
→
UseUse 301 — browsers cache it, reducing server load
IfNeed per-click analytics or URL updatability
→
UseUse 302 — every request hits your server
IfHybrid approach
→
UseUse 301 for first redirect, then client-side redirect with 302 for subsequent visits
Distributed ID Generation: The Backbone of Uniqueness
The unique ID that feeds into Base62 encoding must be globally unique across all servers. A simple auto-increment DB column doesn't scale — you'd have a single point of contention. The standard pattern is to use a distributed ID generator. Two common approaches: Snowflake (Twitter's algorithm) and ZooKeeper-managed ID ranges.
Snowflake generates 64-bit IDs: timestamp (41 bits) + machine ID (10 bits) + sequence (12 bits). This gives 4096 IDs per millisecond per machine, and the IDs are time-sortable. ZooKeeper assigns a range of IDs (e.g., 0-100000) to each app server; when exhausted, the server requests a new range. Both avoid collisions without a central DB write bottleneck.
In production, you'll also want to make the short code appear random. You can shuffle the Base62 alphabet permanently or XOR the ID with a secret before encoding. That prevents users from guessing sequential short codes and scraping all URLs.
Snowflake: each server gets a unique machine ID and produces tickets from its own counter — no coordination needed.
ZooKeeper: servers request fresh ticket blocks from a central coordinator. ZooKeeper is the single source of truth for block allocation.
Both methods guarantee collision-free IDs without a central DB sequence bottleneck.
Shuffle the Base62 alphabet to obscure sequential IDs from users.
Clock skew in Snowflake can cause ID collisions or negative timestamps — use NTP and monitor clock drift.
Production Insight
A production Snowflake ID generator experienced a 2-second clock skew causing 2000 duplicate IDs in 10 minutes.
The fix was to add a clock skew detection and alerting, plus a fallback to ZooKeeper for critical writes.
Rule: always monitor clock drift in Snowflake deployments and have a fallback ID generator for edge cases.
Key Takeaway
Distributed ID generation (Snowflake/ZooKeeper) is the heart of a collision-free system.
Clock skew is your enemy — monitor it.
Rule: never use a single DB sequence for ID generation in a distributed shortener.
ID Generation Strategy
IfNeed time-sortable IDs and low latency
→
UseUse Snowflake — fast, no network calls, but requires clock monitoring
IfNeed simple, no clock dependency
→
UseUse ZooKeeper ID ranges — more network overhead but safer
IfRunning on cloud with perfect NTP
→
UseSnowflake is fine. Add a ZooKeeper-based fallback for safety.
Caching Strategy: Surviving the Viral Spike
We already mentioned a Redis cluster with LRU eviction. But to really survive a viral spike, you need a multi-layer caching strategy. The first layer is an in-memory cache (like Caffeine or Guava on each application server) that holds the hottest entries with a very short TTL (1-2 seconds). The second layer is a Redis cluster, and the third is the database.
When a request arrives, the app server checks its local cache first. On miss, it queries Redis. On Redis miss, it queries the database and then populates both caches. To prevent a stampede (thundering herd) when a cached key expires, use a distributed lock or a get-or-compute pattern. Only one thread should reload a cache entry; others should wait or serve a stale value.
For short codes that go viral, you can proactively pin them to dedicated cache nodes or increase their priority. Use consistent hashing for the Redis cluster so that adding nodes doesn't cause mass cache invalidation.
Local cache returns <1ms, Redis ~5ms, DB ~50ms. Two-layer cache ensures 99.9% hits.
Cache Stampede Prevention
Use a 'probabilistic early recompute' or a distributed lock (Redis SETNX with TTL) to ensure only one thread reloads a cache entry. Otherwise, 10k concurrent requests all hit the DB when a hot key expires.
Production Insight
A 100ms DB query multiplied by 10k concurrent requests = 1000 seconds of cumulative DB time. That's how a cache stampede takes down your database.
Add a local cache with a 2-second TTL to absorb this spike; the DB only sees ~1 request per 2 seconds per key.
Rule: always layer caches and use stampede protection for any key that can go viral.
Key Takeaway
Read-heavy systems require heavy caching (Redis + local) and stampede protection.
A two-layer cache with coalescing is the minimum for production scale.
Rule: if you only have one cache layer, you have a single point of failure.
Cache Layer Placement
IfApplication servers < 100, request rate < 50k/s
→
UseSingle Redis cluster with LRU eviction is sufficient
IfNeed to survive 10x spikes
→
UseAdd local in-memory cache with short TTL
IfViral links generate >100k req/s on one key
→
UsePin that key to a dedicated Redis node, use consistent hashing
Analytics and Click Tracking Pipeline
A URL shortener is not just about redirection — it's a data business. Every click is valuable analytical data: geo-location, referrer, user agent, timestamp. You can't afford to write this data synchronously during a redirect (that would add latency). The pattern is asynchronous: the web server publishes a click event to a message queue (Kafka) and returns the 302/301 immediately. A separate consumer processes these events and updates the click_count in the database and aggregates data for dashboards.
Kafka topics can be partitioned by short key to maintain ordering per URL. The consumer can batch updates to the database (e.g., update click_count = click_count + 1 for 100 events at once). For real-time analytics, use a stream processor (Spark Streaming, Flink) to compute counters down to 1-minute granularity.
We also need to handle deduplication: users may refresh or multiple bots may click. Use a combination of IP + user agent + timestamp window to filter duplicates, or accept a small error percentage (most shorteners tolerate 1-2% overcount).
Click event published asynchronously. Redirect latency stays under 5ms.
Analytics Precision vs Latency
Using 302 redirects for every click ensures accurate analytics but increases server load. A compromise: use 301 for the first visit (browser caches) and a JavaScript pixel or service worker for subsequent visits to track them without server overhead.
Production Insight
A production shortener used synchronous click counting in the redirect handler. When a viral link hit, the DB write caused the redirect to take 2 seconds, making Twitter's crawler time out and report the link as broken.
Moving click counting to an async queue resolved the issue and cut redirect latency from 2s to 4ms.
Rule: never write synchronously in a redirect path — use async queues for analytics.
Key Takeaway
Analytics pipeline must be async to decouple from redirect performance.
Use Kafka/Flink for scalable click processing.
Rule: never mix the read path with the write path for analytics.
Click Tracking Strategy
IfClick accuracy critical, low request volume
→
UseUse synchronous DB update on redirect (acceptable for <10k req/s)
IfHigh volume, need accurate per-click data
→
UsePublish to Kafka, batch updates to DB every 1 second
IfHundreds of millions of clicks daily
→
UseUse stream processing (Flink) for real-time aggregation, write only aggregated counts to DB
Why Your First Base62 Implementation Will Burn in Production
Every junior engineer starts with the same trap: integer ID → Base62 string → short URL. Simple. Elegant. Wrong for any system that survives more than a single server reboot.
The problem? The conversion is reversible. Anyone who gets a short URL can enumerate your entire ID space. They can scrape every URL you've ever shortened. Your competitor can map your traffic patterns. Your private links become public.
Production systems don't use sequential IDs for exactly this reason. You need unpredictable short codes. The industry standard is a random token (system-generated UUID or Snowflake ID) that has zero correlation to the storage key. Base62 only enters the picture when you need a human-readable representation of that random token.
But wait — random tokens collide. That's fine. You detect the collision, regenerate, and retry. At 62^6 possibilities, collisions are statistically irrelevant at any sane scale. The real cost is the retry overhead in your write path.
UnpredictableShortCode.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
// io.thecodeforge — interview tutorial
import uuid
import base62
from typing importOptionalclassShortCodeGenerator:
def__init__(self, max_retries: int = 3):
self.max_retries = max_retries
self._seen_codes = set()
defgenerate(self) -> str:
for attempt inrange(self.max_retries):
token = uuid.uuid4().int & ((1 << 36) - 1) # 36-bit random
code = base62.encode(token)[:6]
if code notinself._seen_codes:
self._seen_codes.add(code)
return code
raiseRuntimeError(f"Collision after {self.max_retries} retries — improbable at scale")
gen = ShortCodeGenerator()
for _ inrange(5):
print(f"Short code: {gen.generate()}")
Output
Short code: aB3xYz
Short code: 9kLmNp
Short code: qRsT7U
Short code: wXyZ01
Short code: v2W3Xy
Production Trap:
Never expose your database primary key as a short URL. It's not security through obscurity — it's no security at all. Load balancer logs + a weekend of enumeration = your competitor has your entire URL inventory.
Key Takeaway
Short codes must be cryptographically random and independently generated from storage IDs. Base62 is a presentation layer, not a security primitive.
How TikTok Handles the Viral Spike That Kills Naive Caches
Your caching strategy looks great on paper. 80% cache hit rate. Redis cluster with replication. Eviction policy set to LRU. Then a celebrity tweets your shortened link and your cache gets eviscerated.
The problem isn't the hot key — it's the thundering herd of cold keys. A viral event means millions of requests for URLs that have never been cached. Every one of those requests hits your database. The database melts. The site goes dark.
The fix is shockingly simple: cache-aside with a distributed mutex. Before hitting the database for a cache miss, acquire a lightweight lock (Redis SETNX) scoped to the short code. Only the first requestor actually queries the database. The rest wait a few milliseconds and retry the cache.
TikTok's approach goes further: they pre-warm the cache for known high-traffic content. For TinyURL, that means tracking URL creation velocity. If a new short URL gets 100 redirects in its first minute, it's categorized as "viral candidate" and all its metadata gets promoted to the L1 cache tier proactively.
CacheMutexRedirect.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
// io.thecodeforge — interview tutorial
import redis
import time
from typing importOptional
cache = redis.Redis(connection_pool=redis.ConnectionPool(max_connections=100))
MUTEX_TTL = 5# seconds
CACHE_TTL = 3600defresolve_short_url(short_code: str) -> Optional[str]:
long_url = cache.get(f"short:{short_code}")
if long_url:
return long_url
# Distributed mutex — only one process hits the DB
lock_key = f"lock:{short_code}"if cache.setnx(lock_key, "1"):
cache.expire(lock_key, MUTEX_TTL)
long_url = query_database(short_code) # real DB callif long_url:
cache.setex(f"short:{short_code}", CACHE_TTL, long_url)
cache.delete(lock_key)
return long_url
# Wait and retry — up to 50ms typical
time.sleep(0.01)
returnresolve_short_url(short_code)
Output
Redirect path for 'aB3xYz':
1. Cache miss
2. Acquire mutex lock
3. Database hit (1ms)
4. Cache write
5. Release lock
6. Redirect (HTTP 302)
Non-lock holder: 10ms retry -> cache hit
Senior Shortcut:
Use Redis pipeline for the mutex + cache read in one round trip. Cuts latency from ~5ms to ~1ms. Don't write your own distributed lock — Redlock is overkill; SETNX + expire is fine for this use case.
Key Takeaway
Cache every short URL redirect with a distributed mutex to prevent thundering herd meltdowns under viral traffic spikes.
The Database Sharding Strategy Nobody Teaches You
Every blog post tells you to shard by user ID. Great for Instagram. Terrible for TinyURL. A single user creating 10,000 URLs per second is a normal day. Sharding by user means one hot shard handles all writes for power users while others sit idle.
The better approach: shard by the short code's first character. With 62 possible first characters, you get automatic load distribution. The write throughput is uniform because short codes are random. Read throughput follows the same pattern — viral URLs spread evenly across shards.
But here's the gotcha: range queries on creation time become impossible. Need to find all URLs created in the last hour? You must query all shards. That's fine for analytics — you batch those queries and accept the latency. The redirect path stays fast because it's a point lookup.
Pro tip: use consistent hashing with virtual nodes on the short code. If you add a shard, only 1/62nd of your data moves. You don't need to rebalance the entire cluster.
ShardResolver.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
// io.thecodeforge — interview tutorial
import hashlib
from typing importList, DictclassShardRouter:
def__init__(self, shard_endpoints: List[str]):
self.virtual_nodes: Dict[int, str] = {}
for shard in shard_endpoints:
for vnode in range(128): # 128 virtual nodes per shard
key = hashlib.md5(f"{shard}:{vnode}".encode()).hexdigest()
self.virtual_nodes[int(key[:8], 16)] = shard
defget_shard(self, short_code: str) -> str:
# Hash the short code to find its virtual node
hash_val = int(hashlib.md5(short_code.encode()).hexdigest()[:8], 16)
sorted_keys = sorted(self.virtual_nodes.keys())
for key in sorted_keys:
if hash_val <= key:
returnself.virtual_nodes[key]
return self.virtual_nodes[sorted_keys[0]] # wrap around
router = ShardRouter(["shard-db-01", "shard-db-02", "shard-db-03"])
print(f"aB3xYz -> {router.get_shard('aB3xYz')}")
print(f"9kLmNp -> {router.get_shard('9kLmNp')}")
Output
aB3xYz -> shard-db-02
9kLmNp -> shard-db-01
Production Insight:
Don't shard by user ID for URL shorteners. 80% of writes come from 5% of users. Shard by the short code's first character, or use consistent hashing. Your read path stays constant-time, and writes distribute evenly.
Key Takeaway
Shard URL shorteners by short code prefix (not user ID) for uniform load distribution. Consistent hashing with virtual nodes keeps rebalancing costs minimal.
● Production incidentPOST-MORTEMseverity: high
Cache Stampede Took Down Viral Link
Symptom
Users reported 503 and 504 errors. Response times jumped from <5ms to >10s. The application servers were at 100% CPU waiting for database queries.
Assumption
The assumption was that cache warming during deployment was sufficient. The team believed the 99th percentile load would stay under 50k req/s because historical data showed that pattern.
Root cause
A single short code became globally viral. The Redis cluster was sized for the average load, not for a 10x spike. The LRU cache evicted the hot key just before the spike, causing every subsequent request to hit Cassandra. The database read replicas were overwhelmed, and the primary node was occupied with write requests.
Fix
1) Implement a cache-aside pattern with a distributed read lock (Redis SETNX) so only one app server reloads a cache miss. 2) Add a local in-memory cache (Caffeine) as a second layer. 3) Increase the read replica pool and use connection pooling with high max connections.
Key lesson
Always design for traffic spikes that are 50x your mean load.
Cache stampedes are silent until they kill your DB.
A two-layer cache (local + distributed) with coalescing is necessary for viral scenarios.
Production debug guideCommon failures and the exact commands to diagnose them4 entries
Symptom · 01
Short link returns 404
→
Fix
Check Redis: GET <short_key>. If miss, query Cassandra: SELECT * FROM url_mapping WHERE short_key='<key>'. If not in DB, the link was never created or expired.
Symptom · 02
Redirect takes >1s
→
Fix
Ping Redis latency: redis-cli -p 6379 PING. Check Cassandra read latency via nodetool cfstats. Verify cache hit ratio: redis-cli INFO stats | grep hit_rate.
Symptom · 03
Custom alias already taken
→
Fix
Check user's custom alias mapping: SELECT * FROM url_mapping WHERE short_key='<custom>' AND user_id IS NOT NULL. If exists, return 409. Consider prefixing custom aliases with a separate namespace (e.g., '@').
Symptom · 04
Click count doesn't increment
→
Fix
Check analytics pipeline: ensure the async consumer (Kafka) is running and not lagging. Verify the click event was published: kafka-console-consumer --bootstrap-server localhost:9092 --topic clicks --from-beginning | grep <short_key>.
★ Quick Debug Cheat Sheet: TinyURL Production IncidentsThree most common production issues and their immediate fixes
Cache miss flood — all requests hitting DB−
Immediate action
Scale up read replicas: use AWS RDS read replica promotion or add Cassandra nodes. Reduce cache TTL to 0 and force warm-up with a batch job.
Commands
redis-cli -p 6379 INFO stats | grep 'keyspace_hits|keyspace_misses'
nodetool cfhistograms url_mapping url_mapping
Fix now
Increase Redis maxmemory and enable allkeys-lru eviction. Add a local in-memory cache with a short TTL (e.g., Caffeine, 2s).
Short code collision on write+
Immediate action
Ensure ID generator is returning unique values. Check for clock skew in Snowflake nodes. If using DB sequence, verify no two writers got same ID.
Commands
Check ID generator: select max(id) from id_sequence (if using DB).
Test with insert ignore and check affected_rows: if zero, collision occurred.
Fix now
Switch to ranged ID generation (ZooKeeper/etcd). Add a retry loop with a different salt after each collision.
Viral link causing DB outage+
Immediate action
Enable circuit breaker on DB reads. Serve stale cache (allow expired entries) until DB recovers. Add a rate limiter per short_code to throttle requests.
Deploy a standalone hot-link cache (dedicated Redis instance for viral short codes). Use consistent hashing to pin viral links to specific cache servers.
URL Shortening Approaches
Approach
Pros
Cons
Hashing (MD5/SHA)
Stateless, simple implementation
Collisions require check-before-insert
Base62 Encoding
Guaranteed unique, no collisions
Requires a centralized ID generator
Custom Aliases
Better UX/Branding
Requires manual check for availability
Key takeaways
1
Base62 encoding of a unique 64-bit ID is the most robust way to generate short codes.
2
Read-heavy systems require heavy caching (Redis + local) and stampede prevention.
3
Distributed ID generation (Snowflake/ZooKeeper) is the heart of a collision-free system.
4
Choose your HTTP status code (301 vs 302) based on analytics and caching requirements.
Using a single relational database without sharding
Symptom
Database becomes a bottleneck at 10k+ req/s. Read replicas can't keep up, connection pool exhaustion, queries queue up.
Fix
Use sharded NoSQL (Cassandra, DynamoDB) with short_key as partition key. Or shard MySQL/PostgreSQL by hash of short_key.
×
Ignoring URL validation
Symptom
Malicious links (phishing, malware) are shortened and shared, causing your domain to be blacklisted. Recursive TinyURLs create infinite redirects.
Fix
Validate original URL: check format, domain reputation (Google Safe Browsing API), reject localhost/private IPs. Detect if original URL points to your own shortener and reject.
×
Underestimating storage growth
Symptom
At 1 billion URLs per year, with 500 bytes per record (including indexes), storage grows by 500GB/year. Indexes take additional space. Eventually you hit disk limits or costs spike.
Fix
Plan for compaction: use database with built-in compression (Cassandra, ScyllaDB). Enable TTL-based expiry. Archive old records to cold storage (S3, Glacier) after 1 year.
×
Forgetting about background cleanup of expired links
Symptom
Database grows indefinitely. Expired links still consume cache space. Over time, query performance degrades.
Fix
Implement a scheduled worker (cron, Kubernetes CronJob) that deletes expired rows from DB. Use Redis keyspace notifications to invalidate cache on expiry. For database, consider using Cassandra's TTL feature.
×
Not handling cache stampedes on viral links
Symptom
When a hot key expires, thousands of requests simultaneously hit the database. DB CPU spikes, requests time out, cascading failure.
Fix
Use a distributed lock (Redis SETNX) so only one request reloads the cache. Others either wait or serve stale data. Also add a local in-memory cache with a short TTL.
INTERVIEW PREP · PRACTICE MODE
Interview Questions on This Topic
Q01SENIOR
How would you generate unique short codes in a distributed system?
Q02SENIOR
How do you handle a viral link that gets millions of hits in an hour?
Q03SENIOR
Explain the trade-offs between 301 and 302 redirects for a URL shortener...
Q04SENIOR
How would you design the database schema for a URL shortener that suppor...
Q05SENIOR
How do you ensure high availability for a URL shortener?
Q01 of 05SENIOR
How would you generate unique short codes in a distributed system?
ANSWER
Use a globally unique ID generator like Snowflake (timestamp + machine ID + sequence) and convert the ID to a Base62 string. This guarantees no collisions and avoids the need for a central DB counter. Alternatively, use ZooKeeper to hand out ID ranges to each app server. Both scale horizontally. For additional security, shuffle the Base62 alphabet or XOR the ID with a secret to make codes non-sequential.
Q02 of 05SENIOR
How do you handle a viral link that gets millions of hits in an hour?
ANSWER
We have a multi-layer caching strategy: 1) Local in-memory cache (Caffeine) per application server with a 1-2 second TTL. 2) Distributed Redis cluster with LRU eviction. 3) Database only as the last resort. We also implement cache stampede prevention: only one thread per key reloads the cache. If the load still overwhelms, we can rate-limit requests per short code or serve stale cached data. Proactive pinning of known viral links to dedicated cache nodes helps.
Q03 of 05SENIOR
Explain the trade-offs between 301 and 302 redirects for a URL shortener.
ANSWER
301 (Permanent) makes browsers cache the redirect, so subsequent requests go directly to the target without hitting our server. This reduces server load by up to 90% for repeat visitors. The downside is we can't update the destination URL without changing the short code, and we lose per-click analytics for cached requests. 302 (Temporary) passes every request through our server, allowing click tracking, URL updates, and analytics. The trade-off is higher server load. In production, a hybrid approach works: use 301 for the first visit, then track subsequent visits via JavaScript or service worker.
Q04 of 05SENIOR
How would you design the database schema for a URL shortener that supports custom aliases and expiration?
ANSWER
Use a Cassandra table with short_key as primary key and clustering columns for versioning if needed. Schema includes: short_key (text), original_url (text), user_id (bigint, for custom aliases), created_at (timestamp), expires_at (timestamp), click_count (counter column). Use a secondary index on user_id to list all links by user. For expiration, set TTL on rows or use a separate column and a background worker to delete. Cassandra's counter columns handle atomic increments for click counting.
Q05 of 05SENIOR
How do you ensure high availability for a URL shortener?
ANSWER
We design for no single points of failure. The application layer is stateless and runs behind a load balancer across multiple availability zones. The database (Cassandra) is configured with replication factor 3 across zones. The Redis cache is a cluster with replicas. For ID generation, use Snowflake with NTP monitoring or ZooKeeper for coordination. For the DNS resolution of the short domain, use anycast with multiple DNS providers. Implement circuit breakers on external dependencies (e.g., Safe Browsing API).
01
How would you generate unique short codes in a distributed system?
SENIOR
02
How do you handle a viral link that gets millions of hits in an hour?
SENIOR
03
Explain the trade-offs between 301 and 302 redirects for a URL shortener.
SENIOR
04
How would you design the database schema for a URL shortener that supports custom aliases and expiration?
SENIOR
05
How do you ensure high availability for a URL shortener?
SENIOR
FAQ · 6 QUESTIONS
Frequently Asked Questions
01
How do you handle hash collisions if you use MD5?
You take the first 7 characters of the hash. If that key already exists in the database with a different original URL, you append a predefined string (salt) to the original URL and re-hash until you find a unique key.
Was this helpful?
02
What happens if the Redis cache is full?
We follow an LRU (Least Recently Used) eviction policy. The least accessed links are evicted to make room for new ones. Since most links follow a long-tail distribution, the 'cold' links will live in the DB while 'hot' links stay in memory.
Was this helpful?
03
How do you prevent people from guessing all your shortened URLs?
Instead of using a simple incrementing ID (1, 2, 3...), we use a distributed ID generator (Snowflake) and then shuffle the Base62 alphabet or XOR the ID with a secret. This makes the generated strings appear random to the end user while remaining technically sequential internally.
Was this helpful?
04
How do you handle custom aliases like 'mybrand' when someone wants a specific short code?
We reserve the short code in the database with a flag indicating it's a custom alias. Before creating, we check if the key is already taken (both for generated and custom). Custom aliases are stored with a prefix in the ID or in a separate table to avoid collision with generated codes. We also add validation to prevent users from taking too short or offensive codes.
Was this helpful?
05
How do you scale the database writes for click counting?
We don't write each click synchronously to the DB. Instead, we batch click events from Kafka and update the click_count in batches (e.g., update 100 clicks at once). For Cassandra, we use counter columns which are atomic and scalable. The analytics pipeline is decoupled from the redirect path.
Was this helpful?
06
What if the ID generator fails?
Have a fallback mechanism. For Snowflake, if clock skew is detected, switch to a ZooKeeper-based ID generator or use a Redis atomic increment as a temporary fallback. Also, have alerting on clock drift and sequence exhaustion.