Junior 13 min · March 05, 2026

URL Shortener Design — Why Auto-Increment Kills at Scale

Auto-increment locks dropped throughput from 1000/sec to 0 mid-campaign.

N
Naren Founder & Principal Engineer

20+ years shipping large-scale distributed systems. Drawn from code that ran under real load.

Follow
Production
production tested
May 23, 2026
last updated
1,554
articles · all by Naren
 ● Production Incident 🔎 Debug Guide ⚙ Triage Commands
Quick Answer
  • A URL shortener maps a long URL to a short code and redirects clients via HTTP 301/302
  • Hashing strategies: base62 encoding of unique IDs vs hash-then-collision-check
  • Redirects are cheap: aim for <10ms total latency at P99
  • Caching must handle hot keys: a single viral link can generate millions of requests per minute
  • Biggest mistake: using a single database counter to generate IDs — single point of failure and bottleneck
✦ Definition~90s read
What is Design URL Shortener?

A URL shortener is a service that takes a long URL and returns a shorter, unique alias that redirects clients to the original URL. The typical flow: a client submits a long URL via an API, the service generates a short code (e.g., 'abc123'), stores the mapping in a database with optional metadata (creation time, expiration, owner), and returns the full short URL (e.g., 'https://short.url/abc123').

Imagine every long book title in a library had a short call number stamped on its spine — '792.4 SHA' instead of 'The Complete Works of Shakespeare, Volume III'.

When a client requests that short URL, the service looks up the code, retrieves the original URL, and issues an HTTP redirect (301 for permanent, 302 for temporary). Analytics (clicks, referrers, timestamps) are usually logged asynchronously.

Plain-English First

Imagine every long book title in a library had a short call number stamped on its spine — '792.4 SHA' instead of 'The Complete Works of Shakespeare, Volume III'. A URL shortener does exactly that for web addresses. You hand it a massive, ugly link and it gives you back a tiny code — like a coat-check ticket — that it keeps pinned to the original address. When someone shows up with the ticket, the system finds the coat (the real URL) and sends them straight to it.

Every time you see a link like 'bit.ly/3xQp9R' in a tweet, a QR code, or an SMS campaign, a surprisingly complex distributed system is working behind the scenes. URL shorteners process billions of redirects per day, and companies like Bitly, TinyURL, and Twitter's t.co have quietly become some of the most read-heavy services on the internet — often handling tens of thousands of requests per second at peak. Getting this design wrong at scale doesn't just mean slow pages; it means broken marketing campaigns, dead QR codes on printed packaging, and lost revenue that can't be recovered.

The core problem sounds trivial: map a long string to a short one and reverse the mapping on demand. But that simplicity is deceptive. You need to generate short codes that are globally unique, store hundreds of millions of mappings efficiently, serve redirects in under 10 milliseconds, handle hot keys (a single viral link getting millions of hits per minute), expire links, support custom aliases, and survive datacenter failures — all simultaneously.

By the end of this article you'll have a production-grade mental model for a URL shortener: you'll know exactly how to generate collision-free short codes, why you should never put a counter in a single database row, how to layer caching to absorb viral traffic spikes, and what the interview panel is really testing when they ask you this question.

What is Design URL Shortener?

A URL shortener is a service that takes a long URL and returns a shorter, unique alias that redirects clients to the original URL. The typical flow: a client submits a long URL via an API, the service generates a short code (e.g., 'abc123'), stores the mapping in a database with optional metadata (creation time, expiration, owner), and returns the full short URL (e.g., 'https://short.url/abc123'). When a client requests that short URL, the service looks up the code, retrieves the original URL, and issues an HTTP redirect (301 for permanent, 302 for temporary). Analytics (clicks, referrers, timestamps) are usually logged asynchronously.

Production Insight
Redirects are cheap, but each one hits the DB if caching is missed. A P99 latency of 5ms is achievable with Redis in front.
Database fallback kills throughput — every missed cache is an order of magnitude slower.
Rule: cache aggressively and survive a cache miss without cascading failures.
Key Takeaway
URL shortener = write-once, read-often system.
Cache the mapping, not the redirect.
The short code is the primary key — design for O(1) lookup.
URL Shortener Design: Auto-Increment Pitfalls THECODEFORGE.IO URL Shortener Design: Auto-Increment Pitfalls Why counter-based short codes fail under scale and how to fix it Auto-Increment ID Sequential primary key in DB Base62 Encoding Converts ID to short code Write Contention Lock contention on ID generation Hash-Based Generation MD5/SHA-1 truncated to 7 chars Cache Layer Redis/Memcached for hot URLs Redirect with 301/302 HTTP redirect to original URL ⚠ Auto-increment IDs expose total URL count and enable enumeration Use hash-based or distributed ID generators (e.g., Snowflake) THECODEFORGE.IO
thecodeforge.io
URL Shortener Design: Auto-Increment Pitfalls
Design Url Shortener

Short Code Generation — Hashing vs Counter-Based IDs

There are two dominant strategies for generating short codes. The first is hash-based: take the long URL, compute a hash (e.g., MD5 or SHA-256), take the first N characters (usually 6–8), check for collisions, and if one exists add a salt or retry with a different prefix. The second is ID-based: use a globally unique integer (from a distributed ID generator) and encode it in base62 (0-9, a-z, A-Z) to produce a compact alphanumeric string. Base62 encoding of a 64-bit integer yields up to 11 characters — typical shorteners use 6–7 characters, which gives 62^6 ≈ 56 billion combinations.

ID-based systems are simpler for uniqueness (just generate a unique ID) but require a reliable ID generator. Hash-based systems must handle collisions and require longer codes for the same collision probability. Most production systems prefer ID-based with base62 encoding because the code space is deterministic and collision-checking is trivial.

io/thecodeforge/shortener/Base62Encoder.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
package io.thecodeforge.shortener;

import java.util.ArrayList;
import java.util.List;

public class Base62Encoder {
    private static final String BASE62 = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";
    private static final int BASE = 62;

    public static String encode(long id) {
        if (id == 0) return String.valueOf(BASE62.charAt(0));
        List<Character> chars = new ArrayList<>();
        while (id > 0) {
            chars.add(BASE62.charAt((int) (id % BASE)));
            id /= BASE;
        }
        StringBuilder sb = new StringBuilder(chars.size());
        for (int i = chars.size() - 1; i >= 0; i--) {
            sb.append(chars.get(i));
        }
        return sb.toString();
    }

    public static long decode(String shortCode) {
        long id = 0;
        for (char c : shortCode.toCharArray()) {
            int digit = BASE62.indexOf(c);
            if (digit == -1) throw new IllegalArgumentException("Invalid character in code: " + c);
            id = id * BASE + digit;
        }
        return id;
    }
}
Avoid MD5 for Short Codes
MD5 collision probability is low for short prefixes but predictable — an attacker can craft colliding URLs. Use SHA-256 with a salt or, better, an ID-based system. Never use MD5 for security-sensitive applications.
Production Insight
Base62 encoding of a Snowflake ID gives a short, URL-safe, and collision-free code.
Hash-based systems need collision handling: if hash collides, append a salt and rehash until unique.
Rule: for production, prefer ID-based generation. It's simpler to reason about and debug.
Key Takeaway
Base62 encoding of a distributed ID is the standard.
Hash-based maps URL→code deterministically but has collision overhead.
Rule: don't mix auto-generated and custom aliases in the same code space without a prefix.
Choosing Between Hash and ID-Based
IfYou need deterministic mapping from long URL to short code (same URL always gets same code)
UseUse hash-based with a fixed-length truncation. Accept collision risks and handle retries.
IfYou need a small code space (e.g., 6 chars) and don't care about deterministic mapping
UseUse ID-based with base62. Easier to scale and guarantee uniqueness.
IfYou need to support custom aliases (user picks the code)
UseUse ID-based for auto-generated, but store custom aliases in a separate namespace or table.

Database Schema & Write Path

The core database stores the mapping from short code to long URL. The schema is simple: primary key on short_code, columns for original_url, created_at, expiration_at, owner_id (optional). But at scale, the write path must be designed for high throughput during creation bursts. Write operations are not the bottleneck (traffic is ~99% reads), but if you use a single database for ID generation, you get into trouble. Instead, decouple ID generation from the database: generate IDs in an application tier using Snowflake-like algorithms (or pre-allocated segments). Then insert the mapping asynchronously? No — inserts must be synchronous for consistency, but they can be batched and buffered.

For reads, index on short_code is critical. Use a covering index (include original_url) to avoid disk access. Partition the table by short_code prefix to distribute writes. Use a read replica for analytics queries, but always route redirect lookups to the primary or cache first.

schema/create_links_table.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
-- TheCodeForge schema for URL shortener
CREATE TABLE links (
    short_code VARCHAR(10) NOT NULL PRIMARY KEY,
    original_url VARCHAR(2048) NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    expires_at TIMESTAMP NULL,
    owner_id BIGINT NOT NULL DEFAULT 0,
    click_count BIGINT NOT NULL DEFAULT 0
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

-- Index for fast lookups (covering index)
CREATE INDEX idx_short_code ON links (short_code) INCLUDE (original_url, expires_at);

-- For analytics queries (non-urgent)
CREATE INDEX idx_owner_created ON links (owner_id, created_at);
Schema Design Tip
Keep the short_code as VARCHAR but validate it's alphanumeric. Use utf8mb4 to support emojis in original URLs (some users will paste them). Avoid storing the full URL twice — normalise if you need to deduplicate.
Production Insight
Index on short_code is the most critical index. A covering index avoids a separate data file lookup.
Write scalability is not about inserts per se, but about ID generation. Pre-allocate ID blocks to workers.
Rule: partition the links table by the first two characters of short_code to spread writes across shards at extreme scale.
Key Takeaway
Schema is simple — the complexity is in ID generation and caching.
Covering index on short_code turns lookup into an index-only scan.
Rule: always query the primary for redirects; use replicas for reporting only.

Caching Layer — Survival Guide for Viral Traffic

A single viral link can generate millions of requests per minute. Without caching, your database will melt. The caching architecture needs at least two tiers: L1 (in-memory cache per application instance) and L2 (distributed cache like Redis or Memcached). L1 stores the hottest keys (recently accessed short codes) and evicts using LRU. L2 stores a larger set of mappings with a longer TTL.

Cache-aside pattern: on a redirect request, check L1 → if miss, check L2 → if miss, fetch from DB and populate both caches. Set a TTL of 24 hours for L2, but proactive invalidation when a link is deleted or expires. For read-heavy workloads, consider a write-through cache: on creation, immediately write to cache and DB asynchronously (with a queue). That way the first read is already fast.

Hot key problem: when a single short code gets 100k requests per second, Redis can become a hotspot. Solutions: local L1 caching (each app server caches the hot key), or use Redis with replicas and client-side sharding to distribute reads.

io/thecodeforge/shortener/RedirectCache.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
package io.thecodeforge.shortener;

import com.google.common.cache.Cache;
import com.google.common.cache.CacheBuilder;
import redis.clients.jedis.Jedis;

import java.util.concurrent.TimeUnit;

public class RedirectCache {
    private final Cache<String, String> l1 = CacheBuilder.newBuilder()
            .maximumSize(100_000)
            .expireAfterWrite(1, TimeUnit.MINUTES)
            .recordStats()
            .build();

    private final Jedis jedis;  // L2 connection pool
    private static final String PREFIX = "url:";
    private static final int L2_TTL_SECONDS = 86400;

    public String getOriginalUrl(String shortCode) {
        // L1 lookup
        String url = l1.getIfPresent(shortCode);
        if (url != null) return url;

        // L2 lookup
        url = jedis.get(PREFIX + shortCode);
        if (url != null) {
            l1.put(shortCode, url);
            return url;
        }

        // DB fallback happens outside this method
        return null;
    }

    public void put(String shortCode, String originalUrl) {
        l1.put(shortCode, originalUrl);
        jedis.setex(PREFIX + shortCode, L2_TTL_SECONDS, originalUrl);
    }
}
Tiered Caching: L1 vs L2
  • L1: in-memory per microservice instance. Fastest. Limited size. Evict aggressively.
  • L2: Redis cluster. Shared across all instances. Tolerates higher latency but still sub-millisecond.
  • Cache miss penalty: L1 miss → Redis hit ~1ms. Redis miss → DB hit ~10ms. Every miss hurts throughput.
  • Proactive populate: write-through cache on URL creation prevents the first request from hitting the DB.
Production Insight
Hot keys can overwhelm a single Redis node. Use L1 caching to absorb the top 10 hot keys locally.
Cache stampede occurs when many requests miss cache simultaneously — use early re-compute (e.g., set a probabilistic TTL)
Rule: monitor cache hit rate per short code. If a code has >10% cache misses, promote it to L1 proactively.
Key Takeaway
Tiered caching is non-negotiable for viral traffic.
Hot keys need local L1 caching to keep Redis from melting.
Rule: always populate cache on write, not just on read.

Redirect Mechanics — HTTP Status and Performance

When a client requests a short URL, the server must respond with an HTTP redirect. Two status codes matter: 301 (Moved Permanently) and 302 (Found). 301 tells the browser to cache the redirect permanently — subsequent requests go directly to the long URL without hitting the shortener. This is great for performance but breaks analytics if you want to count every click (because cached browsers don't hit your service). 302 tells the browser not to cache — every request hits the shortener, enabling click tracking.

Most services use 302 by default for dynamic analytics, and offer 301 as an option for permanent links. The redirect response also includes the Location header. The server must set CORS headers if the short URL is embedded in an iframe.

Performance: the entire redirect (from request to response) should complete in under 10ms at P99. This includes DNS resolution on the client side, TCP connection, TLS handshake, and the server processing. The server side is typically <1ms with caching. Server-side improvements: keepalive connections, HTTP/2 multiplexing, and edge caching (CDN).

http/redirect-example.httpHTTP
1
2
3
4
5
6
7
8
9
10
HTTP/1.1 302 Found
Location: https://www.example.com/long-article-url
Cache-Control: no-cache, no-store, must-revalidate
Content-Length: 0
Access-Control-Allow-Origin: *

# Alternatively, a 301 redirect for permanent links:
HTTP/1.1 301 Moved Permanently
Location: https://www.example.com/long-article-url
Cache-Control: public, max-age=31536000, immutable
302 Redirects Can Bust Caches
Using 302 for all links means every click goes to your origin server. At scale, that's expensive. Consider using 301 for 'permanent' links (by default after a few hours of existence) and 302 for new links. Or use 307/308 for clients that require preserving the HTTP method.
Production Insight
Use 302 for analytics-required links; 301 for permanent ones to offload traffic.
Edge caching with a CDN (CloudFront, Cloudflare) can serve redirects from the edge — reduces latency to <2ms globally.
Rule: set a short TTL (like 1 hour) on CDN cache for 302, so you can still update links quickly.
Key Takeaway
Redirect status determines browser caching behaviour.
301 saves bandwidth but loses click data.
Rule: use 302 by default and switch to 301 after the link is 'stable'.

Expiration, Custom Aliases, and Analytics

Real URL shorteners support link expiration (e.g., for temporary campaign links) and custom aliases (user picks a meaningful short code). Expiration is implemented by storing an expires_at column and checking during redirect lookup. If the current time exceeds expires_at, return 410 Gone or redirect to a fallback page. Custom aliases require a separate validation: they must be unique globally and not conflict with auto-generated codes. A common approach is to reserve a prefix for auto-generated codes (e.g., starting with a digit) and allow custom aliases to start with a letter. Or use two separate tables.

Analytics: every redirect should asynchronously log the click event (time, referrer, user-agent, IP) to a high-throughput queue (Kafka, Kinesis). A separate consumer processes the stream to update click counts and generate reports. The click count on the links table should be denormalised for quick display but must be updated asynchronously to avoid write contention. Use eventual consistency: the consumer updates the count in the DB via upsert.

sql/expired-link-check.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
-- TheCodeForge: check expiration during redirect lookup
SELECT original_url,
       CASE WHEN expires_at IS NOT NULL AND expires_at < CURRENT_TIMESTAMP THEN 1 ELSE 0 END AS expired
FROM links
WHERE short_code = ?
LIMIT 1;

-- Then in application code:
if (row.isExpired()) {
    response.setStatus(410); // Gone
    return;
}
redirectUserTo(row.getOriginalUrl());
Custom Alias Validation
Validate that custom aliases are at least 4 characters, alphanumeric only, and not in a reserved list (like 'api', 'login'). Use a Bloom filter to quickly reject common unwanted aliases.
Production Insight
Expiration checks add ~1ms to the redirect path. Index on expires_at can slow writes. A better approach: set a short TTL in cache and let the cache expire naturally — but then the link stays accessible in cache after DB expiration. So you must invalidate cache on expiration.
Analytics pipelines must be idempotent: a retry should not double-count clicks. Use a unique event ID per click.
Rule: separate analytics writes from the redirect path entirely to avoid impacting latency.
Key Takeaway
Expiration requires either DB query on every redirect or proactive cache invalidation.
Custom aliases need namespace separation from auto-generated codes.
Rule: analytics should be eventually consistent and never block the redirect.

Capacity Estimation — Don’t Let Your Database Be the Blame

Competitors mention “30M new URLs per month” but they skip the real point: you need to size for the write path before you pick a short-code scheme. 30M/month is 1.8B records over 5 years. That’s not a flex — that’s a death sentence if you haven’t estimated reads.

Every short link redirect is a read. If you have 1.8B stored URLs and each gets redirected an average of 10 times (conservative for a viral service), you’re looking at 18B reads over the same period. Your database won’t survive that without aggressive caching and a careful choice of the short-code length.

7 characters from 62 characters gives you 3.5 trillion combinations. That’s enough for 1.8B records with plenty of room to spare. But here’s the gotcha — your storage model must account for the full row: short code (7 bytes), long URL (up to 2048 bytes), creation timestamp, expiration, user ID. That’s roughly 2.1 KB per row. 1.8B * 2.1 KB = 3.78 PB of raw storage. You need replication and sharding before you even think about going live.

Senior engineer rule: always overestimate reads, underestimate writes, and double your storage projection.

CapacityEstimator.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
// io.thecodeforge — system-design tutorial

# Estimate total records and storage for a URL shortener
MONTHLY_NEW_URLS = 30_000_000
YEARS = 5
MONTHS = YEARS * 12

total_records = MONTHLY_NEW_URLS * MONTHS  # 1.8 billion

ROW_SIZE_BYTES = 7 + 2048 + 8 + 8 + 4  # code, url, created_ts, exp_ts, user_id
total_storage_bytes = total_records * ROW_SIZE_BYTES
total_storage_pb = total_storage_bytes / (1024**5)  # Convert to petabytes

avg_reads_per_url = 10
total_reads = total_records * avg_reads_per_url

print(f"Total records: {total_records:.2e}")
print(f"Total storage: {total_storage_pb:.2f} PB")
print(f"Estimated reads over 5 years: {total_reads:.2e}")
Output
Total records: 1.80e+09
Total storage: 3.78 PB
Estimated reads over 5 years: 1.80e+10
Production Trap:
Never assume reads are 1:1 with writes. Viral traffic can spike reads 100x. Model for the peak, not the average.
Key Takeaway
Always estimate reads and storage before choosing a short-code generation method. Reads dictate caching strategy; storage dictates sharding needs.

Low-Level Design — Where the Database Actually Bleeds

High-level architecture is for whiteboard interviews. Low-level design is where you figure out why your service falls over at 10K QPS. Stop hand-waving about “application servers” and talk about the write path.

You have two major choke points: the short-code collision check and the redirect lookup. For the write path, you must handle concurrent requests for the same long URL. If two users submit the same URL at the same millisecond, your counter-based system will return two different short codes — that’s fine. But if you’re using hashing (e.g., MD5 truncated to 7 chars), the hash is deterministic. You’ll get the same short code for the same URL. That means you need a retry loop to handle collisions, and you must ensure atomicity with a database unique constraint. Use a write-ahead log to re-insert on collision.

For reads, the short code is your primary key. A B-Tree index on the short-code column will give you O(log n) lookups. But at 18B reads, even that’s slow. You’ll need a hash index (O(1)) or a distributed cache. The trick is to pre-generate short codes in batches — say 10K at a time — and store them in Redis. When a user requests a short URL, pop one from Redis. This decouples the short-code generation from the write path and reduces database load.

Don’t forget the expiration sweep. A cron job that deletes expired records every hour is fine for 30M records. For 1.8B, use TTL indexes (MongoDB) or partition by expiration month and drop entire partitions. Lazy expiration on redirect read is a band-aid, not a solution.

CollisionRetryLoop.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
// io.thecodeforge — system-design tutorial

import hashlib
import redis
import psycopg2

# Pre-generate 10K short codes in Redis batch
r = redis.Redis(host='cache-cluster', decode_response=True)
for _ in range(10_000):
    r.lpush('available_short_codes', generate_random_code())

def shorten_url(long_url, db_conn):
    # Pop short code from Redis (no DB hit)
    short_code = r.brpop('available_short_codes', timeout=5)[1]
    try:
        with db_conn.cursor() as cur:
            cur.execute(
                "INSERT INTO urls (short_code, long_url, created_at) VALUES (%s, %s, NOW())",
                (short_code, long_url)
            )
        db_conn.commit()
        return short_code
    except psycopg2.IntegrityError:
        db_conn.rollback()
        # Retry with a new code
        return shorten_url(long_url, db_conn)

# On redirect: hash index lookup
cur.execute("SELECT long_url FROM urls WHERE short_code = %s", (code,))
Output
Short code popped from Redis in <1ms, insert succeeds unless collision, then retry.
Senior Shortcut:
Pre-generate short codes in Redis. It turns a write-path bottleneck into a cache-pop operation and eliminates database contention on collision detection.
Key Takeaway
Decouple short-code generation from the write path using a pre-generated pool. Hash indexes beat B-Trees for read-heavy workloads at scale.

Functional Requirements — What This Thing Actually Does

Before you touch a line of code, you need to know what the system is supposed to do. Functional requirements are the raw features — no fluff. A URL shortener has exactly two APIs: shorten and redirect. The shorten API takes a long URL and returns a short code. The redirect API takes that code and returns a 302 or 301 to the original URL. That's it. Don't add custom aliases, analytics, or expiration until the core loop works. Every feature you bolt on increases latency, storage cost, and failure surface. Start with the minimum viable product: generate a unique key, store the mapping, serve the redirect. Anything else is a distraction until you've proven the basic flow under load. If you can't make two endpoints fast and reliable, your fancy analytics pipeline won't matter.

shortener_functions.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
// io.thecodeforge — system-design tutorial

class URLShortener:
    def __init__(self):
        self.store: dict[str, str] = {}
        self.counter: int = 0

    def shorten(self, long_url: str) -> str:
        # Base-62 encode counter-derived ID
        self.counter += 1
        code = self._base62(self.counter)
        self.store[code] = long_url
        return code

    def redirect(self, code: str) -> str:
        return self.store.get(code, None)

    def _base62(self, num: int) -> str:
        chars = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"
        if num == 0:
            return chars[0]
        result = []
        while num > 0:
            result.append(chars[num % 62])
            num //= 62
        return ''.join(reversed(result))
Output
>>> shortener = URLShortener()
>>> code = shortener.shorten('https://longurl.com/page')
>>> code
'1'
>>> shortener.redirect(code)
'https://longurl.com/page'
Senior Shortcut:
Counter-based IDs are fast and deterministic. Hashes need collision checks. For a first pass, counter beats hashing every time.
Key Takeaway
Two APIs. No more. Shorten and redirect. Everything else is optional until you've nailed the core.

Non-Functional Requirements — The Things That Keep You Employed

Non-functional requirements are the constraints that separate a toy from a production system. For a URL shortener, the big three are latency, availability, and durability. Your redirect endpoint must respond in under 50 milliseconds — users expect instant page loads, and search engines penalize slow redirects. Availability means 99.99% uptime at minimum. When a short link breaks, the internet notices. Durability means once you store a mapping, you never lose it. That rules out in-memory caches as primary storage. You need a replicated database with atomic writes and consistent reads. Think DynamoDB, Cassandra, or PostgreSQL with read replicas. The write path for shortening can be slower — nobody notices 100ms when creating a link. The read path for redirects must be blazing fast. Cache aggressively with Redis or Memcached, but prepare for cache misses with database fallback that doesn't degrade. Design for failure: if your cache goes down, your DB should handle the load without timing out.

nonfunctional_check.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
// io.thecodeforge — system-design tutorial

import time
import random

class RedirectService:
    def __init__(self, cache, db):
        self.cache = cache
        self.db = db

    def get_redirect(self, code: str) -> str:
        start = time.time()
        # Cache read with fallback
        url = self.cache.get(code)
        if not url:
            url = self.db.query(code)
            if url:
                # Re-populate cache on miss
                self.cache.set(code, url, ttl=3600)
        elapsed = time.time() - start
        if elapsed > 0.050:
            print(f"WARNING: P99 latency exceeded for code {code}")
        return url

# Production trap: cache miss storm under viral load
# Solution: pre-warm cache for top 1% short codes
Output
>>> svc = RedirectService(redis_client, postgres_client)
>>> svc.get_redirect('abc123')
'https://target-url.com'
# No output if under 50ms
Production Trap:
Cache stampedes happen when a viral short code expires and 10k requests hit the database at once. Use early expiration with probabilistic ttl to stagger misses.
Key Takeaway
50ms read latency. 99.99% availability. Durable writes. If you can't guarantee all three, your system isn't production-ready.

Scalability — From 10 Users to 10 Billion Redirects

Scalability isn't optional — it's the entire point of a URL shortener. Your system goes viral when a celebrity tweets a bit.ly link. You need to handle 100,000 redirects per second without breaking a sweat. The bottleneck is the database read path. A single PostgreSQL instance handles maybe 10,000 reads per second. You need horizontal scaling. Shard your database by short code hash. Use consistent hashing to avoid reshuffling on node addition. Cache aggressively with a distributed Redis cluster. Each redirect should hit cache 99% of the time. The write path is easier — maybe 100 new URLs per second at peak. Use an async queue to batch writes. For the counter-based ID generation, you can't use a single centralized counter across all machines. That's a SPOF. Instead, use range-based counters per application instance: instance 1 gets IDs 1-1M, instance 2 gets 1M-2M. Or use distributed sequences like Snowflake or ZooKeeper. Memory is also a concern. Each redirect consumes no RAM on your server — you just need to handle TCP connections and kernel network buffers. Profile with realistic traffic patterns before launch.

sharded_redirect.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
// io.thecodeforge — system-design tutorial

import hashlib

class ShardedRedirectService:
    def __init__(self, cache_nodes: list, db_nodes: list):
        self.caches = cache_nodes
        self.dbs = db_nodes

    def get_redirect(self, code: str) -> str:
        # Consistent hash on short code
        shard_id = int(hashlib.md5(code.encode()).hexdigest(), 16) % len(self.dbs)
        # Try local cache first
        url = self.caches[shard_id].get(code)
        if not url:
            url = self.dbs[shard_id].query(code)
            if url:
                self.caches[shard_id].set(code, url)
        return url

# Add nodes without reshuffling: use virtual nodes
# Production: 256 virtual nodes per physical node
Output
>>> svc = ShardedRedirectService(cache_nodes=[cache1, cache2], db_nodes=[db1, db2])
>>> svc.get_redirect('viral')
'https://trending-page.com'
Senior Shortcut:
Don't build your own consistent hash ring until you absolutely must. Use Redis Cluster or DynamoDB's built-in partitioning. They're battle-tested.
Key Takeaway
Shard by short code hash. Cache 99% of reads. Never centralize IDs. Scale horizontally from day one.

🏗️ Putting It Together (Step-by-Step)

URL shortening is a write-once, read-often system. The write path: client POSTs a long URL → API server validates → code generator creates a unique short key → database stores mapping (key, long URL, created_at, expiration) → cache writes key→URL → response returns short URL. The read path: client GETs short URL → DNS resolves → load balancer → API server checks cache (Redis) first → cache hit: return 302 redirect → cache miss: query database → if found, populate cache, return redirect → if not found, return 404. Critical ordering: Always write to database before cache to avoid stale data. For custom aliases, add a uniqueness check before generation. Analytics events fire asynchronously via message queue — never block the redirect path. Expiration runs as a background job scanning for stale entries, purging cache keys and database rows in batches of 1000 to avoid write locks.

redirect_handler.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
// io.thecodeforge — system-design tutorial

from fastapi import FastAPI, HTTPException
from redis import Redis
from db import get_url

app = FastAPI()
cache = Redis(host='cache-cluster', port=6379, decode_responses=True)

@app.get('/{short_code}')
async def redirect(short_code: str):
    long_url = cache.get(short_code)
    if long_url:
        return {'Location': long_url, 'status': 302}
    
    record = await get_url(short_code)
    if not record:
        raise HTTPException(status_code=404, detail='Short URL not found')
    
    cache.setex(short_code, 3600, record['long_url'])
    return {'Location': record['long_url'], 'status': 302}
Output
→ 302 redirect on cache hit
→ DB query + cache populate on miss
→ 404 if not found
Production Trap:
Never write cache first. A cache write failure without DB persistence creates dead keys — users get 404 on valid URLs. Always commit to DB, then populate cache.
Key Takeaway
Write to DB first, cache second. Read from cache first, DB second. Never invert this order.

✅ Summary

A URL shortener is deceptively simple — two API endpoints (create, redirect) and a cache. The design breaks into: short code generation (hashing vs. counter-based IDs), storage schema (key-value with timestamps), caching layer (Redis with TTL), redirect mechanics (302 vs 301), and supporting features (expiration, custom aliases, analytics). The non-functional constraints dominate: 99.99% uptime for redirects, sub-50ms latency, ability to handle viral traffic spikes (10M+ redirects/minute). Capacity estimation is mandatory — a naive VARCHAR(255) short code column burns memory in both cache and index. Every design decision trades off between write throughput (DB writes are slow) and read performance (cache misses are expensive). The winning pattern: counter-based IDs stored as base62 for short codes, key-value DB (PostgreSQL or DynamoDB), Redis read-through cache, async analytics pipeline. Test your cache eviction policy with a simulated DDOS — you'll find your real bottleneck.

code_generator.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
// io.thecodeforge — system-design tutorial

import base64, hashlib

BASE62 = '0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz'

def encode_base62(num: int) -> str:
    if num == 0:
        return '0'
    encoded = ''
    while num > 0:
        encoded = BASE62[num % 62] + encoded
        num //= 62
    return encoded

def hash_short(url: str) -> str:
    digest = hashlib.md5(url.encode()).hexdigest()
    return base64.urlsafe_b64encode(digest[:6].encode()).decode()[:7]

# Counter-based is 10x faster than hashing for insertions
# Use hash only when you need determinism (custom aliases)
Output
base62('123456789') → '8m0Kx'
hash_short('https://long.url') → 'YWJjZGV'
Key Insight:
Counter-based IDs (auto-increment + base62) generate shorter codes than hashing. Average code length: 7 chars vs 8-10 for hashing. That's 30% less storage and 30% faster cache reads.
Key Takeaway
Short code generation method directly impacts database storage, cache memory, and redirect latency. Choose counter-based by default.

Redirection

Redirection is the core operation: converting a short code back to a long URL and sending the client there. Speed is paramount because every millisecond of redirect latency directly impacts user experience. The redirection flow begins when a client requests a short URL like https://short.ly/abc123. The server must quickly look up the mapping from code to target URL. To avoid database bottlenecks, we use a cache-aside pattern: first check Redis (or Memcached) with the short code as key. If found, return immediately. If not, query the database, populate the cache, and then redirect. The HTTP response must use a 301 (permanent) or 302 (temporary) redirect status code depending on whether the mapping may change. For 301, browsers cache the redirect forever, reducing server load but making updates impossible. Use 302 for custom aliases or analytics. The response includes a Location header with the target URL. For high-traffic systems, we implement async analytics capture: the redirect itself never waits for logging. Instead, we push an event to Kafka or a message queue. The client gets the redirect in under 10ms, while analytics are processed asynchronously. This separation ensures that even if the analytics pipeline fails, the redirect still works.

redirect_service.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
// io.thecodeforge — system-design tutorial
// 25 lines max
import redis
from flask import Flask, redirect, request

app = Flask(__name__)
cache = redis.Redis(host='localhost', port=6379, decode_responses=True)

def fetch_url(short_code: str) -> str:
    url = cache.get(short_code)
    if url:
        return url
    # fallback to DB query
    url = db.query("SELECT target_url FROM mappings WHERE short_code=%s", short_code)
    if url:
        cache.setex(short_code, 3600, url)
    return url

@app.route('/<short_code>')
def handle_redirect(short_code):
    target = fetch_url(short_code)
    if not target:
        return "Not Found", 404
    # async analytics capture
    analytics_queue.enqueue(short_code, request.remote_addr, datetime.utcnow())
    return redirect(target, code=302)

if __name__ == '__main__':
    app.run()
Output
HTTP/1.1 302 Found
Location: https://example.com/very-long-url
Content-Length: 0
Production Trap:
Never fetch analytics synchronously inside the redirect handler. If the analytics database is slow or down, your redirects will time out. Always push to a queue and process offline.
Key Takeaway
Redirects must be sub-10ms operations; separate the fast path (cache hit + redirect) from the slow path (analytics capture).

Redirection API and Speed Optimization

The Redirection API is a minimal, read-only endpoint: GET /{short_code}. Its only job is to return a 302 (or 301) with a Location header. To maximize speed, we apply several optimizations. First, we use a write-through cache: every new mapping is simultaneously written to the database and cache, so the first redirect is already fast. Second, we precompute a bloom filter for all short codes (stored in memory or Redis). Before hitting cache or DB, we check the bloom filter — if it says the code doesn't exist, reject immediately with 404, saving a cache miss. Third, we use connection pooling for both Redis and the database to avoid TCP handshake overhead. Fourth, we deploy the redirect service behind a CDN (like CloudFront or Cloudflare) that caches 301 redirects at the edge. For 302 redirects, the CDN forwards requests to origin but still terminates TLS early, reducing latency. Finally, we use HTTP/2 and keepalive connections. The entire API response is under 200 bytes, so network round-trip is the dominant cost. We geolocate our servers using anycast DNS so users hit the nearest data center. With all optimizations, the 99th percentile redirect latency should be under 50ms globally.

speed_optimizations.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
// io.thecodeforge — system-design tutorial
// 25 lines max
import redis
from flask import Flask, redirect
from pybloom_live import BloomFilter

app = Flask(__name__)
cache = redis.ConnectionPool(host='redis-cluster', max_connections=100)
bloom = BloomFilter(capacity=1_000_000_000, error_rate=0.01)

def init_bloom():
    # preload from DB snapshot or S3
    for code in db.stream_all_codes():
        bloom.add(code)

@app.route('/<short_code>')
def fast_redirect(short_code):
    if short_code not in bloom:
        return "", 404  # instant rejection
    url = cache.get(short_code)
    return redirect(url, code=302) if url else ("", 404)

if __name__ == '__main__':
    init_bloom()
    app.run(threaded=True, host='0.0.0.0', port=8080)
Output
GET /abc123 HTTP/2
HTTP/2 302
location: https://target.com
x-cache: hit
content-length: 0
(13.2ms total)
Production Trap:
A bloom filter can generate false positives — a request for a non-existent code will hit the cache (and miss). That's okay because a cache miss is still fast. But never rely on bloom filter alone for correctness; it's just a fast early-exit.
Key Takeaway
Speed optimization is a layered defense: bloom filter < cache < DB. Each layer must fail fast and never block the user.
● Production incidentPOST-MORTEMseverity: high

The Single-Table Counter That Took Down a Shortener

Symptom
Short code generation slowed from 1000/sec to 0 during a marketing campaign. The entire shortener stopped accepting new links.
Assumption
An auto-increment column in a single writer database is simple and works fine for moderate traffic.
Root cause
All writes went to one RDBMS primary. Under high write load, the InnoDB auto-increment lock (table-level for INSERT) caused contention. When the database crashed, all new link creation was blocked.
Fix
Switched to a distributed ID generator using Snowflake-like IDs (64-bit, with timestamp + worker ID + sequence) and segregated write traffic across multiple worker nodes. Each worker generates IDs without coordination.
Key lesson
  • Never rely on a single database auto-increment for ID generation at scale — it's a write bottleneck and a single point of failure.
  • Use distributed ID generators or pre-allocated ID ranges to eliminate contention.
  • Always design for write scalability even if you expect read-heavy workload — shortener creation traffic spikes during campaigns.
Production debug guideSymptom → Action guide for common URL shortener issues4 entries
Symptom · 01
Short code returns 404 even though it exists in the database
Fix
Check cache layer (Redis/Memcached) for stale entries. Invalidate the key and verify DB. Also check if the link has expired (deadline in DB).
Symptom · 02
Redirect takes >50ms consistently
Fix
Check reverse proxy (Nginx) caching rules. Ensure 301 redirects are set for permanent links so browsers cache them. Profile Redis lookup time — high latency could indicate a hot key causing resource contention.
Symptom · 03
Custom alias already taken but user didn't set it
Fix
Check whether a separate namespace for custom aliases is colliding with auto-generated codes. Use a distinct prefix or separate database for custom aliases.
Symptom · 04
Analytics data lost for specific short codes
Fix
Verify that the async analytics pipeline (Kafka + streaming job) is not dropping messages. Check for backpressure in Kafka consumer groups. Ensure idempotent insertion to prevent duplicates.
★ Quick Debug: URL Shortener Redirect IssuesUse these commands to diagnose the most common production redirect problems.
Short code not found in DB but exists in cache
Immediate action
Check cache consistency — likely a stale entry after deletion or expiration.
Commands
curl -v http://shortener.io/shortCode # Check redirect headers and status
redis-cli GET shortener:shortCode # Check if key exists
Fix now
Purge the cache key and ensure DB lookup triggers cache refresh.
High latency on redirect+
Immediate action
Identify whether latency is from cache layer or DB fallback.
Commands
curl -w '@curl-format.txt' -o /dev/null -s http://shortener.io/abc123 # Measure time_namelookup, time_connect, time_starttransfer
redis-cli --latency -h <redis-host> # Check Redis latency
Fix now
Add local L1 cache using a small in-memory cache (e.g., Guava or Caffeine) to absorb hot key traffic.
Custom alias not working for a new link+
Immediate action
Verify that the custom alias is not already in use and that the request reached the service.
Commands
docker compose logs creation-service | grep customAlias # Check logs for creation attempt
SELECT * FROM links WHERE short_code = 'customAlias' # DB query to confirm existence
Fix now
Return explicit 409 Conflict for duplicate custom aliases and force a different alias.
Hash-Based vs ID-Based Short Code Generation
PropertyHash-BasedID-Based
Deterministic URL→CodeYesNo (same URL gets different codes)
Collision freeCollisions possible, need retryAlways unique (ID guarantees)
Code lengthFixed (e.g., 7 chars)Variable, depends on ID size (6-10 chars)
ID generation bottleneckNone (hash is deterministic)Requires distributed ID generator (Snowflake)
Supports custom aliasesEasy (prefix hash with alias)Need separate namespace

Key takeaways

1
URL shorteners are read-heavy systems; cache aggressively in two tiers (L1 + L2).
2
ID-based short code generation with base62 encoding avoids collisions and scales horizontally.
3
Use 302 for analytics tracking, 301 for permanent links to reduce origin load.
4
Analytics must be asynchronous and idempotent
never block the redirect path.
5
Design for viral traffic
hot keys need local caching, CDN offload, and throttled creation APIs.

Common mistakes to avoid

3 patterns
×

Using a single database auto-increment for short code IDs

Symptom
Write throughput caps at ~10k/s on a single MySQL node. Viral campaign creates request queue and eventually timeouts. Service becomes unavailable for new link creation.
Fix
Replace with distributed ID generator (Snowflake algorithm) or pre-allocate ID ranges to worker nodes.
×

Not caching redirect lookups

Symptom
Database load spikes 100x during viral link, causing slow queries and cascading timeouts. Latency jumps from 5ms to 500ms.
Fix
Implement at least a Redis cache layer with write-through on creation. Use local L1 cache for extreme hot keys.
×

Using 302 for all links (no CDN edge caching)

Symptom
Every request hits origin servers, increasing infrastructure cost and latency. Can't scale globally without expensive regional deployments.
Fix
Use 301 for permanent links after a grace period. Serve 301 redirects from a CDN edge cache with long TTL. Reserve 302 for temporary/analytics-only links.
INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01SENIOR
How would you generate a unique short code for every URL in a distribute...
Q02SENIOR
Explain how you would handle a viral link generating 1 million requests ...
Q03JUNIOR
What's the difference between 301 and 302 redirects, and why would you c...
Q01 of 03SENIOR

How would you generate a unique short code for every URL in a distributed system?

ANSWER
Use a distributed ID generator like Snowflake (64-bit: timestamp + worker ID + sequence). Encode the ID in base62 to produce a short alphanumeric code. This guarantees uniqueness without coordination. Alternatively, use a hash of the long URL (e.g., SHA-256 truncated) with collision retry, but that's less efficient and non-deterministic if you want same URL → same code.
FAQ · 4 QUESTIONS

Frequently Asked Questions

01
What is a URL shortener in simple terms?
02
Why not just use a hash of the URL as the short code?
03
How do I handle custom aliases? Do they conflict with auto-generated codes?
04
What's the best caching strategy for a URL shortener?
N
Naren Founder & Principal Engineer

20+ years shipping large-scale distributed systems. Drawn from code that ran under real load.

Follow
Verified
production tested
May 23, 2026
last updated
1,554
articles · all by Naren
🔥

That's Real World. Mark it forged?

13 min read · try the examples if you haven't

Previous
Software Architecture Explained: Patterns, Trade-offs and Real Decisions
1 / 17 · Real World
Next
Design Twitter Feed