Database Sharding — Why Sequential IDs Cause Hot Shards
- Sharding enables horizontal scale — add more servers as data grows.
- The shard key must be included in all queries — otherwise all shards must be scanned.
- Hash sharding distributes evenly; range sharding allows range queries but risks hot shards.
- Sharding splits data across multiple independent database instances (shards) to scale beyond a single server.
- A shard key determines which shard stores each row; queries must include it to avoid broadcasting.
- Hash sharding distributes writes evenly; range sharding supports efficient range queries but risks hot shards.
- Cross-shard queries are slow and distributed transactions are hard — design the shard key to keep related data together.
- Most apps should not shard — optimize indexes, add replicas, and scale vertically first.
Sharding Quick Debug Cheat Sheet
Query not returning results for a known user
SELECT shard FROM shard_map WHERE user_id = ?Check the router logs: tail -100 /var/log/shard_router.log | grep 'user_id=123'Cross-shard query timeout (e.g., aggregate across shards)
SHOW PROCESSLIST on each shard → kill the offending queries.Run EXPLAIN on the original query to see if it can be optimized with shard key filtering.Resharding in progress, data inconsistent
SELECT COUNT(*) FROM migration_audit WHERE status = 'PENDING'Restart migration from last checkpoint: migrate_resume --last-checkpoint <timestamp>Production Incident
Production Debug GuideCommon symptoms and the actions you take to diagnose them
Most applications never need to shard. A properly indexed PostgreSQL or MySQL instance can handle tens of millions of rows and thousands of queries per second. But when you hit the limits of vertical scaling — larger instances give diminishing returns — sharding becomes the path forward.
The tradeoff is real: sharding gives you horizontal scale but introduces significant operational complexity. Cross-shard queries are slow, transactions spanning shards are hard, and resharding (changing the number of shards) requires careful migration. Get the shard key wrong, and you'll trade one bottleneck for another.
How Sharding Works
The shard key is the column used to route data to a specific shard. Every query must include the shard key to avoid broadcasting to all shards. The application or a proxy layer uses a routing function (e.g., hash modulo) to determine the target shard. Each shard is an independent database instance with its own compute and storage.
In production, the shard key is usually the primary dimension of access — something you filter on in every query. User ID, tenant ID, or region ID are common choices. The key must be present in all queries; otherwise the system either fails or performs a full scan across all shards.
# Package: io.thecodeforge.sharding.router import hashlib class ShardRouter: def __init__(self, shard_dsn_list: list[str]): self.shards = shard_dsn_list self.n = len(shards) def _hash(self, key: str) -> int: # Use consistent hashing in production, not plain modulo return int(hashlib.md5(key.encode()).hexdigest(), 16) % self.n def get_shard_for_key(self, key: str) -> str: return self.shards[self._hash(key)] def execute_query(self, key: str, sql: str, params: dict): if "WHERE" not in sql: raise ValueError("Query must include shard key filter") shard_id = self.get_shard_for_key(key) # Execute on the single shard return self._query_shard(shard_id, sql, params) def execute_cross_shard(self, sql: str, params: dict): # Broadcast to all shards and merge results = [] for shard in self.shards: results.extend(self._query_shard(shard, sql, params)) return results
- A dictionary is effectively sharded by the first letter: 'apple' belongs to 'A', 'banana' to 'B'.
- Queries that don't specify a letter (e.g., 'find all fruits with 5 letters') require scanning every page — that's a cross-shard query.
- Hot shards happen when one letter (like 'S') contains vastly more words than others.
- Hash sharding is like assigning words to pages randomly — even distribution but you lose the ability to easily browse consecutive words.
Hash vs Range Sharding
Hash sharding applies a hash function to the shard key and takes modulo the number of shards. It distributes rows evenly across shards, preventing hot shards. The trade-off: range queries (e.g., 'users joined in January') require broadcasting to all shards because the hash destroys order.
Range sharding assigns contiguous ranges of the shard key to each shard. Range queries can be routed to a single shard if the range fits within one boundary. The downside: hot shards appear when new data falls disproportionately on one range — for example, sequential user IDs always go to the last shard.
Consistent hashing (used by Cassandra, DynamoDB) maps data to a ring of hash values. When a shard is added or removed, only the data from the adjacent shard moves — no full rehash. This minimizes migration during resharding.
# Package: io.thecodeforge.sharding.strategies import hashlib class HashShard: def __init__(self, n_shards: int): self.n = n_shards def route(self, user_id: int) -> int: return user_id % self.n class RangeShard: def __init__(self, boundaries: list[int]): # boundaries like [0, 1_000_000, 2_000_000] self.boundaries = boundaries def route(self, user_id: int) -> int: for i in range(1, len(self.boundaries)): if user_id < self.boundaries[i]: return i - 1 return len(self.boundaries) - 1 class ConsistentHashRing: def __init__(self, nodes: list[str], replicas: int = 150): self.ring = {} self.sorted_keys = [] for node in nodes: for i in range(replicas): key = hashlib.md5(f"{node}:{i}".encode()).hexdigest() hash_val = int(key, 16) self.ring[hash_val] = node self.sorted_keys = sorted(self.ring.keys()) def get_node(self, key: str) -> str: if not self.ring: return None hash_val = int(hashlib.md5(key.encode()).hexdigest(), 16) idx = bisect_left(self.sorted_keys, hash_val) if idx == len(self.sorted_keys): idx = 0 return self.ring[self.sorted_keys[idx]]
Shard Key Selection and Trade-offs
Choosing the wrong shard key is the most common sharding mistake. A good shard key has three properties: high cardinality (many distinct values), even distribution, and inclusion in most queries. Common choices are user_id, tenant_id, order_id, or a hash of email.
If you need both even distribution and efficient range queries over time, consider a composite shard key (e.g., hash(user_id) + month). The primary key includes the hash, and a secondary index supports the time range scan. This adds storage and complexity but may be worthwhile.
Another approach: use a lookup table that maps each shard key to a shard ID. This decouples routing from the key value, allowing you to change shards without modifying the key. However, the lookup becomes a potential bottleneck and single point of failure.
-- Schema for shard map lookup table CREATE TABLE io_thecodeforge_sharding.shard_map ( shard_key_hash BIGINT PRIMARY KEY, shard_id INT NOT NULL, created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, FOREIGN KEY (shard_id) REFERENCES io_thecodeforge_sharding.shard_config(shard_id) ); -- When a new entity is created, insert into the map INSERT INTO io_thecodeforge_sharding.shard_map (shard_key_hash, shard_id) VALUES (MD5(CONCAT('user_', :user_id)) MOD 1000000, :target_shard); -- Query to find which shard holds a given user SELECT shard_id FROM io_thecodeforge_sharding.shard_map WHERE shard_key_hash = MD5(CONCAT('user_', :user_id)) MOD 1000000;
Cross-Shard Queries and Transactions
When a query doesn't include the shard key, the system must broadcast it to all shards and merge results — this is slow and resource-intensive. For range queries across shards, consider a secondary index or a search engine (Elasticsearch) that pre-indexes the data.
Distributed transactions (e.g., transfer money between users on different shards) require coordination — typically two-phase commit (2PC). 2PC is slow, blocks on failure, and reduces availability. Most production systems avoid cross-shard transactions by designing the data model so that related data lives on the same shard. For example, put all orders for a customer on the same shard as the customer record. If absolute consistency is needed, use saga patterns with compensating actions.
# Package: io.thecodeforge.sharding.cross_shard # Example of a saga that transfers funds across shards import asyncio class SagaCoordinator: async def transfer(self, from_account, to_account, amount): # Step 1: Prepare both shards from_shard = self.get_shard(from_account.user_id) to_shard = self.get_shard(to_account.user_id) from_ok = await self.prepare_debit(from_shard, from_account, amount) if not from_ok: return abort to_ok = await self.prepare_credit(to_shard, to_account, amount) if not to_ok: await self.rollback_debit(from_shard, from_account, amount) return abort # Step 2: Commit try: await self.commit_debit(from_shard, from_account, amount) await self.commit_credit(to_shard, to_account, amount) except Exception: # Compensating transaction await self.compensate(from_shard, to_shard) raise return success
- A user's profile, orders, and payments should all land on the same shard via user_id shard key.
- If you need to join user and product data, either keep products on every shard (replicated) or accept cross-shard joins via materialized views.
- Cross-shard transactions are always slower and less reliable. Prefer eventual consistency with idempotent operations.
Resharding and Consistent Hashing
Resharding — changing the number of shards — is the most painful operation in a sharded system. With simple modulo hash, changing N to N+1 remaps nearly all rows (about 87.5% for 4->8). The system must read all data, rehash it, and write to new locations. This requires downtime or a complex double-write migration.
Consistent hashing dramatically reduces the amount of data that moves: only the data in the immediate neighbor of a new node is affected. This is why Cassandra and DynamoDB use it. However, consistent hashing adds complexity in handling node failures and virtual nodes (vnodes).
- Pre-splitting: Start with many virtual shards (e.g., 1024) that map to physical shards. To add a physical shard, reassign some virtual shards.
- Mirroring: Write all new data to both old and new shards simultaneously, then backfill old data.
- Read-repair: After migration, inconsistent reads are fixed lazily during read operations (Cassandra's approach).
# Package: io.thecodeforge.sharding.consistent_hash import hashlib from bisect import bisect_left class ConsistentHash: def __init__(self, nodes: list[str], vnodes: int = 150): self.ring = {} self.sorted_keys = [] for node in nodes: for i in range(vnodes): key = hashlib.md5(f"{node}:{i}".encode()).hexdigest() h = int(key, 16) self.ring[h] = node self.sorted_keys.append(h) self.sorted_keys.sort() def get_node(self, key: str) -> str: if not self.ring: return None h = int(hashlib.md5(key.encode()).hexdigest(), 16) idx = bisect_left(self.sorted_keys, h) if idx == len(self.sorted_keys): idx = 0 return self.ring[self.sorted_keys[idx]] def add_node(self, new_node: str, vnodes: int = 150): # Adding only affects adjacent ring keys for i in range(vnodes): key = hashlib.md5(f"{new_node}:{i}".encode()).hexdigest() h = int(key, 16) self.ring[h] = new_node self.sorted_keys.append(h) self.sorted_keys.sort()
| Property | Hash Sharding | Range Sharding |
|---|---|---|
| Distribution | Even — no hot shards | Can be skewed — hot shard risk near boundaries |
| Range Queries | Must scan all shards | Fast if range fits within one shard |
| Resharding | Full rehash required | Adds new boundaries, still moves data |
| Write Performance | Uniform across shards | Bursts concentrate on active range |
| Complexity | Low | Medium — need boundary management |
| Use Case | High write volume, any access pattern | Read-heavy with predictable range queries |
🎯 Key Takeaways
- Sharding enables horizontal scale — add more servers as data grows.
- The shard key must be included in all queries — otherwise all shards must be scanned.
- Hash sharding distributes evenly; range sharding allows range queries but risks hot shards.
- Cross-shard joins and transactions are hard — design your shard key to minimise them.
- Most applications should NOT shard — optimise indexes, add read replicas, and scale vertically first.
⚠ Common Mistakes to Avoid
Interview Questions on This Topic
- QWhat is a shard key and how do you choose a good one?Mid-levelReveal
- QWhat is the hot shard problem and how do you prevent it?Mid-levelReveal
- QHow does consistent hashing help with resharding?SeniorReveal
- QHow would you design a system that supports cross-shard joins without sacrificing performance?SeniorReveal
Frequently Asked Questions
How do you handle transactions that span multiple shards?
Two-phase commit (2PC) is the formal approach — a coordinator asks all shards to prepare, then commits if all agree. But 2PC is slow and reduces availability (blocked if any shard fails). Most large-scale systems avoid cross-shard transactions by designing the data model so transactions stay within one shard. Eventual consistency or saga patterns handle cases where cross-shard coordination is unavoidable.
What is resharding and why is it painful?
Resharding is changing the number of shards (e.g., 4 → 8). With simple hash sharding (user_id % n), changing n means nearly all data maps to a different shard — you must migrate ~87.5% of data. Consistent hashing minimises this: adding a shard only moves adjacent data. Most cloud databases (DynamoDB, Cassandra) handle resharding automatically.
Can I shard on a non-unique column like status or country?
Technically yes, but it's dangerous. Low cardinality columns lead to very few shards being used (e.g., only 2 countries = 2 shards). This limits write throughput and creates severe hot shards. Always choose a high-cardinality shard key.
What's the difference between sharding and partitioning?
Partitioning splits a table within a single database instance (e.g., PostgreSQL partition by range). Sharding splits data across multiple database instances on different servers. Partitioning is a tuning technique; sharding is a scaling strategy. You can combine both: partition within each shard.
Developer and founder of TheCodeForge. I built this site because I was tired of tutorials that explain what to type without explaining why it works. Every article here is written to make concepts actually click.