Design TinyURL: System Design Interview Deep Dive (2026)
- Base62 encoding of a unique 64-bit ID is the most robust way to generate short codes.
- Read-heavy systems require heavy caching (Redis) and NoSQL storage (Cassandra/DynamoDB) for horizontal scaling.
- Distributed ID generation (ZooKeeper/Snowflake) is the heart of a collision-free system.
Imagine every website address is a long home address like '123 Sunflower Lane, Apartment 4B, Springfield, Illinois, 62701, USA'. TinyURL is like a nickname system — you tell the post office 'call that address #XK9' and now anyone who says '#XK9' gets redirected to the full address instantly. The post office (the server) keeps a giant lookup book that maps short nicknames to long addresses. That's the whole system — a glorified, globally-distributed lookup book that has to handle billions of lookups per day without breaking a sweat.
Every senior engineer has sat across from an interviewer who says 'design a URL shortener' with a calm smile. It sounds trivial — take a long URL, make it short. But behind that smile is a question that probes distributed systems, database design, caching strategy, hash collision handling, rate limiting, analytics, and horizontal scaling simultaneously. Bit.ly processes over 600 million redirects per day. TinyURL has been alive since 2002. These systems are deceptively simple on the surface and genuinely hard to build correctly at scale.
The core problem is a deceptively asymmetric one: writes are rare, reads are overwhelmingly frequent. When you shorten a URL, that's a one-time write. But that short link might be embedded in a viral tweet and hit 10 million times in an hour. Your design has to reflect this read-heavy reality — every architectural choice from your hashing scheme to your cache eviction policy flows from that single insight.
By the end of this article you'll be able to walk into any system design interview and design TinyURL end-to-end: justify your short code generation strategy, design a DB schema that survives traffic spikes, build a caching layer that handles 99% of reads from memory, handle custom aliases and expiration, discuss analytics pipelines, and correctly answer every follow-up an interviewer throws at you. Let's build it.
The Core Logic: Base62 Encoding vs. Hashing
In a URL shortener, the 'Magic' is how we generate the tiny string. You have two main paths: Hashing (MD5/SHA-256) or Base62 Encoding a unique ID. Hashing often leads to collisions that require complex 'check-and-retry' logic. The industry-standard approach is to use a distributed ID generator (like a Snowflake ID or a centralized Range Manager) and convert that numeric ID into a Base62 string (a-z, A-Z, 0-9).
For example, an ID like 125 converted to Base62 results in a short, predictable, and unique string. To prevent predictability (so people can't guess the 'next' URL), we can add a bit of salt or shuffle our Base62 alphabet.
package io.thecodeforge.shortener; /** * TheCodeForge Production-Grade Base62 Encoder * Converts a unique Long ID into a 7-character short code. */ public class Base62Encoder { private static final String ALPHABET = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"; private static final int BASE = ALPHABET.length(); public static String encode(long id) { StringBuilder sb = new StringBuilder(); while (id > 0) { sb.append(ALPHABET.charAt((int) (id % BASE))); id /= BASE; } // Pad to ensure consistent length if required by business logic while (sb.length() < 7) { sb.append(ALPHABET.charAt(0)); } return sb.reverse().toString(); } public static void main(String[] args) { long uniqueId = 56800235584L; // Example ID from a distributed generator System.out.println("Short Code for " + uniqueId + ": " + encode(uniqueId)); } }
Data Layer Strategy: Handling Scale and Redirection
Since this is a read-heavy system (100:1 read/write ratio), our database choice and caching strategy are critical. We use a NoSQL database like Cassandra or a sharded MongoDB for the URL mappings because we don't need complex joins—just a simple Key-Value lookup.
To achieve sub-millisecond redirects, we put a Redis cache in front of the database. We use an LRU (Least Recently Used) eviction policy because in the real world, 20% of the links (the viral ones) will generate 80% of the traffic.
-- io.thecodeforge.shortener - Database Schema -- Optimized for NoSQL or Sharded SQL CREATE TABLE io_thecodeforge.url_mapping ( short_key VARCHAR(7) PRIMARY KEY, original_url TEXT NOT NULL, user_id BIGINT, created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, expires_at TIMESTAMP, click_count BIGINT DEFAULT 0 ); -- Secondary Index for User Management CREATE INDEX idx_user_urls ON io_thecodeforge.url_mapping(user_id);
| Approach | Pros | Cons |
|---|---|---|
| Hashing (MD5/SHA) | Stateless, simple implementation | Collisions require check-before-insert |
| Base62 Encoding | Guaranteed unique, no collisions | Requires a centralized ID generator |
| Custom Aliases | Better UX/Branding | Requires manual check for availability |
🎯 Key Takeaways
- Base62 encoding of a unique 64-bit ID is the most robust way to generate short codes.
- Read-heavy systems require heavy caching (Redis) and NoSQL storage (Cassandra/DynamoDB) for horizontal scaling.
- Distributed ID generation (ZooKeeper/Snowflake) is the heart of a collision-free system.
- Choose your HTTP status code (301 vs 302) based on your analytics and caching requirements.
⚠ Common Mistakes to Avoid
Frequently Asked Questions
How do you handle hash collisions if you use MD5?
You take the first 7 characters of the hash. If that key already exists in the database with a different original URL, you append a predefined string (salt) to the original URL and re-hash until you find a unique key.
What happens if the Redis cache is full?
We follow an LRU (Least Recently Used) policy. The least accessed links are evicted to make room for new ones. Since most links follow a long-tail distribution, the 'cold' links will live in the DB while 'hot' links stay in memory.
How do you prevent people from guessing all your shortened URLs?
Instead of using a simple incrementing ID (1, 2, 3...), we can use a block-based ID generator and shuffle the Base62 alphabet. This makes the generated strings appear random to the end user while remaining technically sequential internally.
Developer and founder of TheCodeForge. I built this site because I was tired of tutorials that explain what to type without explaining why it works. Every article here is written to make concepts actually click.