URL Shortener Design — Why Auto-Increment Kills at Scale
Auto-increment locks dropped throughput from 1000/sec to 0 mid-campaign.
20+ years shipping large-scale distributed systems. Drawn from code that ran under real load.
- A URL shortener maps a long URL to a short code and redirects clients via HTTP 301/302
- Hashing strategies: base62 encoding of unique IDs vs hash-then-collision-check
- Redirects are cheap: aim for <10ms total latency at P99
- Caching must handle hot keys: a single viral link can generate millions of requests per minute
- Biggest mistake: using a single database counter to generate IDs — single point of failure and bottleneck
Imagine every long book title in a library had a short call number stamped on its spine — '792.4 SHA' instead of 'The Complete Works of Shakespeare, Volume III'. A URL shortener does exactly that for web addresses. You hand it a massive, ugly link and it gives you back a tiny code — like a coat-check ticket — that it keeps pinned to the original address. When someone shows up with the ticket, the system finds the coat (the real URL) and sends them straight to it.
Every time you see a link like 'bit.ly/3xQp9R' in a tweet, a QR code, or an SMS campaign, a surprisingly complex distributed system is working behind the scenes. URL shorteners process billions of redirects per day, and companies like Bitly, TinyURL, and Twitter's t.co have quietly become some of the most read-heavy services on the internet — often handling tens of thousands of requests per second at peak. Getting this design wrong at scale doesn't just mean slow pages; it means broken marketing campaigns, dead QR codes on printed packaging, and lost revenue that can't be recovered.
The core problem sounds trivial: map a long string to a short one and reverse the mapping on demand. But that simplicity is deceptive. You need to generate short codes that are globally unique, store hundreds of millions of mappings efficiently, serve redirects in under 10 milliseconds, handle hot keys (a single viral link getting millions of hits per minute), expire links, support custom aliases, and survive datacenter failures — all simultaneously.
By the end of this article you'll have a production-grade mental model for a URL shortener: you'll know exactly how to generate collision-free short codes, why you should never put a counter in a single database row, how to layer caching to absorb viral traffic spikes, and what the interview panel is really testing when they ask you this question.
What is Design URL Shortener?
A URL shortener is a service that takes a long URL and returns a shorter, unique alias that redirects clients to the original URL. The typical flow: a client submits a long URL via an API, the service generates a short code (e.g., 'abc123'), stores the mapping in a database with optional metadata (creation time, expiration, owner), and returns the full short URL (e.g., 'https://short.url/abc123'). When a client requests that short URL, the service looks up the code, retrieves the original URL, and issues an HTTP redirect (301 for permanent, 302 for temporary). Analytics (clicks, referrers, timestamps) are usually logged asynchronously.
Short Code Generation — Hashing vs Counter-Based IDs
There are two dominant strategies for generating short codes. The first is hash-based: take the long URL, compute a hash (e.g., MD5 or SHA-256), take the first N characters (usually 6–8), check for collisions, and if one exists add a salt or retry with a different prefix. The second is ID-based: use a globally unique integer (from a distributed ID generator) and encode it in base62 (0-9, a-z, A-Z) to produce a compact alphanumeric string. Base62 encoding of a 64-bit integer yields up to 11 characters — typical shorteners use 6–7 characters, which gives 62^6 ≈ 56 billion combinations.
ID-based systems are simpler for uniqueness (just generate a unique ID) but require a reliable ID generator. Hash-based systems must handle collisions and require longer codes for the same collision probability. Most production systems prefer ID-based with base62 encoding because the code space is deterministic and collision-checking is trivial.
Database Schema & Write Path
The core database stores the mapping from short code to long URL. The schema is simple: primary key on short_code, columns for original_url, created_at, expiration_at, owner_id (optional). But at scale, the write path must be designed for high throughput during creation bursts. Write operations are not the bottleneck (traffic is ~99% reads), but if you use a single database for ID generation, you get into trouble. Instead, decouple ID generation from the database: generate IDs in an application tier using Snowflake-like algorithms (or pre-allocated segments). Then insert the mapping asynchronously? No — inserts must be synchronous for consistency, but they can be batched and buffered.
For reads, index on short_code is critical. Use a covering index (include original_url) to avoid disk access. Partition the table by short_code prefix to distribute writes. Use a read replica for analytics queries, but always route redirect lookups to the primary or cache first.
Caching Layer — Survival Guide for Viral Traffic
A single viral link can generate millions of requests per minute. Without caching, your database will melt. The caching architecture needs at least two tiers: L1 (in-memory cache per application instance) and L2 (distributed cache like Redis or Memcached). L1 stores the hottest keys (recently accessed short codes) and evicts using LRU. L2 stores a larger set of mappings with a longer TTL.
Cache-aside pattern: on a redirect request, check L1 → if miss, check L2 → if miss, fetch from DB and populate both caches. Set a TTL of 24 hours for L2, but proactive invalidation when a link is deleted or expires. For read-heavy workloads, consider a write-through cache: on creation, immediately write to cache and DB asynchronously (with a queue). That way the first read is already fast.
Hot key problem: when a single short code gets 100k requests per second, Redis can become a hotspot. Solutions: local L1 caching (each app server caches the hot key), or use Redis with replicas and client-side sharding to distribute reads.
- L1: in-memory per microservice instance. Fastest. Limited size. Evict aggressively.
- L2: Redis cluster. Shared across all instances. Tolerates higher latency but still sub-millisecond.
- Cache miss penalty: L1 miss → Redis hit ~1ms. Redis miss → DB hit ~10ms. Every miss hurts throughput.
- Proactive populate: write-through cache on URL creation prevents the first request from hitting the DB.
Redirect Mechanics — HTTP Status and Performance
When a client requests a short URL, the server must respond with an HTTP redirect. Two status codes matter: 301 (Moved Permanently) and 302 (Found). 301 tells the browser to cache the redirect permanently — subsequent requests go directly to the long URL without hitting the shortener. This is great for performance but breaks analytics if you want to count every click (because cached browsers don't hit your service). 302 tells the browser not to cache — every request hits the shortener, enabling click tracking.
Most services use 302 by default for dynamic analytics, and offer 301 as an option for permanent links. The redirect response also includes the Location header. The server must set CORS headers if the short URL is embedded in an iframe.
Performance: the entire redirect (from request to response) should complete in under 10ms at P99. This includes DNS resolution on the client side, TCP connection, TLS handshake, and the server processing. The server side is typically <1ms with caching. Server-side improvements: keepalive connections, HTTP/2 multiplexing, and edge caching (CDN).
Expiration, Custom Aliases, and Analytics
Real URL shorteners support link expiration (e.g., for temporary campaign links) and custom aliases (user picks a meaningful short code). Expiration is implemented by storing an expires_at column and checking during redirect lookup. If the current time exceeds expires_at, return 410 Gone or redirect to a fallback page. Custom aliases require a separate validation: they must be unique globally and not conflict with auto-generated codes. A common approach is to reserve a prefix for auto-generated codes (e.g., starting with a digit) and allow custom aliases to start with a letter. Or use two separate tables.
Analytics: every redirect should asynchronously log the click event (time, referrer, user-agent, IP) to a high-throughput queue (Kafka, Kinesis). A separate consumer processes the stream to update click counts and generate reports. The click count on the links table should be denormalised for quick display but must be updated asynchronously to avoid write contention. Use eventual consistency: the consumer updates the count in the DB via upsert.
Capacity Estimation — Don’t Let Your Database Be the Blame
Competitors mention “30M new URLs per month” but they skip the real point: you need to size for the write path before you pick a short-code scheme. 30M/month is 1.8B records over 5 years. That’s not a flex — that’s a death sentence if you haven’t estimated reads.
Every short link redirect is a read. If you have 1.8B stored URLs and each gets redirected an average of 10 times (conservative for a viral service), you’re looking at 18B reads over the same period. Your database won’t survive that without aggressive caching and a careful choice of the short-code length.
7 characters from 62 characters gives you 3.5 trillion combinations. That’s enough for 1.8B records with plenty of room to spare. But here’s the gotcha — your storage model must account for the full row: short code (7 bytes), long URL (up to 2048 bytes), creation timestamp, expiration, user ID. That’s roughly 2.1 KB per row. 1.8B * 2.1 KB = 3.78 PB of raw storage. You need replication and sharding before you even think about going live.
Senior engineer rule: always overestimate reads, underestimate writes, and double your storage projection.
Low-Level Design — Where the Database Actually Bleeds
High-level architecture is for whiteboard interviews. Low-level design is where you figure out why your service falls over at 10K QPS. Stop hand-waving about “application servers” and talk about the write path.
You have two major choke points: the short-code collision check and the redirect lookup. For the write path, you must handle concurrent requests for the same long URL. If two users submit the same URL at the same millisecond, your counter-based system will return two different short codes — that’s fine. But if you’re using hashing (e.g., MD5 truncated to 7 chars), the hash is deterministic. You’ll get the same short code for the same URL. That means you need a retry loop to handle collisions, and you must ensure atomicity with a database unique constraint. Use a write-ahead log to re-insert on collision.
For reads, the short code is your primary key. A B-Tree index on the short-code column will give you O(log n) lookups. But at 18B reads, even that’s slow. You’ll need a hash index (O(1)) or a distributed cache. The trick is to pre-generate short codes in batches — say 10K at a time — and store them in Redis. When a user requests a short URL, pop one from Redis. This decouples the short-code generation from the write path and reduces database load.
Don’t forget the expiration sweep. A cron job that deletes expired records every hour is fine for 30M records. For 1.8B, use TTL indexes (MongoDB) or partition by expiration month and drop entire partitions. Lazy expiration on redirect read is a band-aid, not a solution.
Functional Requirements — What This Thing Actually Does
Before you touch a line of code, you need to know what the system is supposed to do. Functional requirements are the raw features — no fluff. A URL shortener has exactly two APIs: shorten and redirect. The shorten API takes a long URL and returns a short code. The redirect API takes that code and returns a 302 or 301 to the original URL. That's it. Don't add custom aliases, analytics, or expiration until the core loop works. Every feature you bolt on increases latency, storage cost, and failure surface. Start with the minimum viable product: generate a unique key, store the mapping, serve the redirect. Anything else is a distraction until you've proven the basic flow under load. If you can't make two endpoints fast and reliable, your fancy analytics pipeline won't matter.
Non-Functional Requirements — The Things That Keep You Employed
Non-functional requirements are the constraints that separate a toy from a production system. For a URL shortener, the big three are latency, availability, and durability. Your redirect endpoint must respond in under 50 milliseconds — users expect instant page loads, and search engines penalize slow redirects. Availability means 99.99% uptime at minimum. When a short link breaks, the internet notices. Durability means once you store a mapping, you never lose it. That rules out in-memory caches as primary storage. You need a replicated database with atomic writes and consistent reads. Think DynamoDB, Cassandra, or PostgreSQL with read replicas. The write path for shortening can be slower — nobody notices 100ms when creating a link. The read path for redirects must be blazing fast. Cache aggressively with Redis or Memcached, but prepare for cache misses with database fallback that doesn't degrade. Design for failure: if your cache goes down, your DB should handle the load without timing out.
Scalability — From 10 Users to 10 Billion Redirects
Scalability isn't optional — it's the entire point of a URL shortener. Your system goes viral when a celebrity tweets a bit.ly link. You need to handle 100,000 redirects per second without breaking a sweat. The bottleneck is the database read path. A single PostgreSQL instance handles maybe 10,000 reads per second. You need horizontal scaling. Shard your database by short code hash. Use consistent hashing to avoid reshuffling on node addition. Cache aggressively with a distributed Redis cluster. Each redirect should hit cache 99% of the time. The write path is easier — maybe 100 new URLs per second at peak. Use an async queue to batch writes. For the counter-based ID generation, you can't use a single centralized counter across all machines. That's a SPOF. Instead, use range-based counters per application instance: instance 1 gets IDs 1-1M, instance 2 gets 1M-2M. Or use distributed sequences like Snowflake or ZooKeeper. Memory is also a concern. Each redirect consumes no RAM on your server — you just need to handle TCP connections and kernel network buffers. Profile with realistic traffic patterns before launch.
🏗️ Putting It Together (Step-by-Step)
URL shortening is a write-once, read-often system. The write path: client POSTs a long URL → API server validates → code generator creates a unique short key → database stores mapping (key, long URL, created_at, expiration) → cache writes key→URL → response returns short URL. The read path: client GETs short URL → DNS resolves → load balancer → API server checks cache (Redis) first → cache hit: return 302 redirect → cache miss: query database → if found, populate cache, return redirect → if not found, return 404. Critical ordering: Always write to database before cache to avoid stale data. For custom aliases, add a uniqueness check before generation. Analytics events fire asynchronously via message queue — never block the redirect path. Expiration runs as a background job scanning for stale entries, purging cache keys and database rows in batches of 1000 to avoid write locks.
✅ Summary
A URL shortener is deceptively simple — two API endpoints (create, redirect) and a cache. The design breaks into: short code generation (hashing vs. counter-based IDs), storage schema (key-value with timestamps), caching layer (Redis with TTL), redirect mechanics (302 vs 301), and supporting features (expiration, custom aliases, analytics). The non-functional constraints dominate: 99.99% uptime for redirects, sub-50ms latency, ability to handle viral traffic spikes (10M+ redirects/minute). Capacity estimation is mandatory — a naive VARCHAR(255) short code column burns memory in both cache and index. Every design decision trades off between write throughput (DB writes are slow) and read performance (cache misses are expensive). The winning pattern: counter-based IDs stored as base62 for short codes, key-value DB (PostgreSQL or DynamoDB), Redis read-through cache, async analytics pipeline. Test your cache eviction policy with a simulated DDOS — you'll find your real bottleneck.
Redirection
Redirection is the core operation: converting a short code back to a long URL and sending the client there. Speed is paramount because every millisecond of redirect latency directly impacts user experience. The redirection flow begins when a client requests a short URL like https://short.ly/abc123. The server must quickly look up the mapping from code to target URL. To avoid database bottlenecks, we use a cache-aside pattern: first check Redis (or Memcached) with the short code as key. If found, return immediately. If not, query the database, populate the cache, and then redirect. The HTTP response must use a 301 (permanent) or 302 (temporary) redirect status code depending on whether the mapping may change. For 301, browsers cache the redirect forever, reducing server load but making updates impossible. Use 302 for custom aliases or analytics. The response includes a Location header with the target URL. For high-traffic systems, we implement async analytics capture: the redirect itself never waits for logging. Instead, we push an event to Kafka or a message queue. The client gets the redirect in under 10ms, while analytics are processed asynchronously. This separation ensures that even if the analytics pipeline fails, the redirect still works.
Redirection API and Speed Optimization
The Redirection API is a minimal, read-only endpoint: GET /{short_code}. Its only job is to return a 302 (or 301) with a Location header. To maximize speed, we apply several optimizations. First, we use a write-through cache: every new mapping is simultaneously written to the database and cache, so the first redirect is already fast. Second, we precompute a bloom filter for all short codes (stored in memory or Redis). Before hitting cache or DB, we check the bloom filter — if it says the code doesn't exist, reject immediately with 404, saving a cache miss. Third, we use connection pooling for both Redis and the database to avoid TCP handshake overhead. Fourth, we deploy the redirect service behind a CDN (like CloudFront or Cloudflare) that caches 301 redirects at the edge. For 302 redirects, the CDN forwards requests to origin but still terminates TLS early, reducing latency. Finally, we use HTTP/2 and keepalive connections. The entire API response is under 200 bytes, so network round-trip is the dominant cost. We geolocate our servers using anycast DNS so users hit the nearest data center. With all optimizations, the 99th percentile redirect latency should be under 50ms globally.
The Single-Table Counter That Took Down a Shortener
- Never rely on a single database auto-increment for ID generation at scale — it's a write bottleneck and a single point of failure.
- Use distributed ID generators or pre-allocated ID ranges to eliminate contention.
- Always design for write scalability even if you expect read-heavy workload — shortener creation traffic spikes during campaigns.
curl -v http://shortener.io/shortCode # Check redirect headers and statusredis-cli GET shortener:shortCode # Check if key existsKey takeaways
Common mistakes to avoid
3 patternsUsing a single database auto-increment for short code IDs
Not caching redirect lookups
Using 302 for all links (no CDN edge caching)
Interview Questions on This Topic
How would you generate a unique short code for every URL in a distributed system?
Frequently Asked Questions
20+ years shipping large-scale distributed systems. Drawn from code that ran under real load.
That's Real World. Mark it forged?
13 min read · try the examples if you haven't