Skip to content
Home Database Database Sharding — Why Sequential IDs Cause Hot Shards

Database Sharding — Why Sequential IDs Cause Hot Shards

Where developers are forged. · Structured learning · Free forever.
📍 Part of: Database Design → Topic 6 of 16
One shard hit 95% CPU while others idled at 15% during a flash sale.
🔥 Advanced — solid Database foundation required
In this tutorial, you'll learn
One shard hit 95% CPU while others idled at 15% during a flash sale.
  • Sharding enables horizontal scale — add more servers as data grows.
  • The shard key must be included in all queries — otherwise all shards must be scanned.
  • Hash sharding distributes evenly; range sharding allows range queries but risks hot shards.
✦ Plain-English analogy ✦ Real code with output ✦ Interview questions
Quick Answer
  • Sharding splits data across multiple independent database instances (shards) to scale beyond a single server.
  • A shard key determines which shard stores each row; queries must include it to avoid broadcasting.
  • Hash sharding distributes writes evenly; range sharding supports efficient range queries but risks hot shards.
  • Cross-shard queries are slow and distributed transactions are hard — design the shard key to keep related data together.
  • Most apps should not shard — optimize indexes, add replicas, and scale vertically first.
🚨 START HERE

Sharding Quick Debug Cheat Sheet

Immediate commands and fixes for the most common sharding failures.
🟡

Query not returning results for a known user

Immediate ActionCompute which shard should hold that user: shard_index = hash(user_id) % num_shards.
Commands
SELECT shard FROM shard_map WHERE user_id = ?
Check the router logs: tail -100 /var/log/shard_router.log | grep 'user_id=123'
Fix NowIf the user's data is on the wrong shard due to a routing logic bug, run a one-time migration script to move that record.
🟡

Cross-shard query timeout (e.g., aggregate across shards)

Immediate ActionCancel the query. Check if the query can be rewritten to use shard key. If not, consider materialized aggregates on each shard.
Commands
SHOW PROCESSLIST on each shard → kill the offending queries.
Run EXPLAIN on the original query to see if it can be optimized with shard key filtering.
Fix NowIf it's a one-off, temporarily increase query timeout. Long-term, redesign the query to avoid full scan.
🟡

Resharding in progress, data inconsistent

Immediate ActionStop application writes. Check which rows have been migrated and which haven't.
Commands
SELECT COUNT(*) FROM migration_audit WHERE status = 'PENDING'
Restart migration from last checkpoint: migrate_resume --last-checkpoint <timestamp>
Fix NowIf migration is stuck due to conflicts, use a force flag to re-sync the batch.
Production Incident

The Hot Shard That Took Down Checkout at Midnight

A range-based shard key on customer_id caused a single shard to handle 80% of writes during a flash sale — the database CPU hit 100%, queries queued, and the checkout timed out for thousands of users.
SymptomOrder creation latency spiked from 20ms to 12s. The checkout API returned 504 Gateway Timeout errors. Monitoring showed one shard at 95% CPU while others idled at 15%.
AssumptionThe team assumed user IDs were uniformly distributed and that range sharding by customer_id would spread load evenly across shards.
Root causeThe signup flow generated sequential IDs — new customers all landed in the last shard. During a flash sale, most orders came from new users, all hitting the same shard.
FixSwitched to hash sharding on customer_id (target: 8 shards, hash modulo 8). Migrated existing data using a double-write strategy: write to both old and new shards for a week, then cut over.
Key Lesson
Always simulate write patterns before choosing a shard key — sequential IDs kill range sharding.Use hash sharding or consistent hashing for write-heavy workloads.Monitor per-shard CPU and throughput; set alerts for imbalance.
Production Debug Guide

Common symptoms and the actions you take to diagnose them

A query that used to run under 10ms now takes 500msCheck if the query includes the shard key. If not, the router broadcasts to all shards. Add the key to the WHERE clause.
One shard's disk fills up much faster than othersCompute the shard key distribution: SELECT shard, COUNT(*) FROM metadata. If heavily skewed, your shard key is bad or you need resharding.
Distributed transactions fail with 'unrecoverable' errorsCheck the transaction coordinator logs. 2PC prepare phase may be timing out. Consider redesigning the data model to keep transaction rows on the same shard.
After adding a shard, queries return inconsistent resultsVerify that the routing logic has been updated. For modulo-based sharding, you cannot just add a shard without a full data migration. Use consistent hashing to avoid this.

Most applications never need to shard. A properly indexed PostgreSQL or MySQL instance can handle tens of millions of rows and thousands of queries per second. But when you hit the limits of vertical scaling — larger instances give diminishing returns — sharding becomes the path forward.

The tradeoff is real: sharding gives you horizontal scale but introduces significant operational complexity. Cross-shard queries are slow, transactions spanning shards are hard, and resharding (changing the number of shards) requires careful migration. Get the shard key wrong, and you'll trade one bottleneck for another.

How Sharding Works

The shard key is the column used to route data to a specific shard. Every query must include the shard key to avoid broadcasting to all shards. The application or a proxy layer uses a routing function (e.g., hash modulo) to determine the target shard. Each shard is an independent database instance with its own compute and storage.

In production, the shard key is usually the primary dimension of access — something you filter on in every query. User ID, tenant ID, or region ID are common choices. The key must be present in all queries; otherwise the system either fails or performs a full scan across all shards.

io/thecodeforge/sharding/router.py · PYTHON
123456789101112131415161718192021222324252627282930
# Package: io.thecodeforge.sharding.router

import hashlib

class ShardRouter:
    def __init__(self, shard_dsn_list: list[str]):
        self.shards = shard_dsn_list
        self.n = len(shards)

    def _hash(self, key: str) -> int:
        # Use consistent hashing in production, not plain modulo
        return int(hashlib.md5(key.encode()).hexdigest(), 16) % self.n

    def get_shard_for_key(self, key: str) -> str:
        return self.shards[self._hash(key)]

    def execute_query(self, key: str, sql: str, params: dict):
        if "WHERE" not in sql:
            raise ValueError("Query must include shard key filter")
        shard_id = self.get_shard_for_key(key)
        # Execute on the single shard
        return self._query_shard(shard_id, sql, params)

    def execute_cross_shard(self, sql: str, params: dict):
        # Broadcast to all shards and merge
        results = []
        for shard in self.shards:
            results.extend(self._query_shard(shard, sql, params))
        return results
Mental Model
Think of Sharding Like a Dictionary
A shard key is like a word's first letter — you know exactly which page (shard) to go to.
  • A dictionary is effectively sharded by the first letter: 'apple' belongs to 'A', 'banana' to 'B'.
  • Queries that don't specify a letter (e.g., 'find all fruits with 5 letters') require scanning every page — that's a cross-shard query.
  • Hot shards happen when one letter (like 'S') contains vastly more words than others.
  • Hash sharding is like assigning words to pages randomly — even distribution but you lose the ability to easily browse consecutive words.
📊 Production Insight
If you add a second shard without changing the routing function, half your reads will miss and you'll return stale or duplicated data.
The fix is to use a versioned routing table so that reads are always consistent during migration.
Rule: Never change the number of shards without a versioned routing strategy.
🎯 Key Takeaway
The shard key is the most important decision in sharding.
Every query must include it.
Pick it based on your primary access pattern, not on data distribution.
Choosing a Shard Key
IfAll queries include a user_id or tenant_id filter
UseUse that ID as the shard key. Hash it for even distribution.
IfYou need fast range queries on a date column
UseUse range sharding on date, but expect hot shards at boundaries (e.g., month end).
IfNo single query filter covers 95% of traffic
UseConsider a composite shard key or a secondary index layer (e.g., Elasticsearch for cross-shard searches).
IfYou expect to add shards in the future
UseUse consistent hashing — avoid modulo-based routing unless you plan to migrate all data on each change.

Hash vs Range Sharding

Hash sharding applies a hash function to the shard key and takes modulo the number of shards. It distributes rows evenly across shards, preventing hot shards. The trade-off: range queries (e.g., 'users joined in January') require broadcasting to all shards because the hash destroys order.

Range sharding assigns contiguous ranges of the shard key to each shard. Range queries can be routed to a single shard if the range fits within one boundary. The downside: hot shards appear when new data falls disproportionately on one range — for example, sequential user IDs always go to the last shard.

Consistent hashing (used by Cassandra, DynamoDB) maps data to a ring of hash values. When a shard is added or removed, only the data from the adjacent shard moves — no full rehash. This minimizes migration during resharding.

io/thecodeforge/sharding/strategies.py · PYTHON
123456789101112131415161718192021222324252627282930313233343536373839404142
# Package: io.thecodeforge.sharding.strategies

import hashlib

class HashShard:
    def __init__(self, n_shards: int):
        self.n = n_shards

    def route(self, user_id: int) -> int:
        return user_id % self.n

class RangeShard:
    def __init__(self, boundaries: list[int]):
        # boundaries like [0, 1_000_000, 2_000_000]
        self.boundaries = boundaries

    def route(self, user_id: int) -> int:
        for i in range(1, len(self.boundaries)):
            if user_id < self.boundaries[i]:
                return i - 1
        return len(self.boundaries) - 1

class ConsistentHashRing:
    def __init__(self, nodes: list[str], replicas: int = 150):
        self.ring = {}
        self.sorted_keys = []
        for node in nodes:
            for i in range(replicas):
                key = hashlib.md5(f"{node}:{i}".encode()).hexdigest()
                hash_val = int(key, 16)
                self.ring[hash_val] = node
            self.sorted_keys = sorted(self.ring.keys())

    def get_node(self, key: str) -> str:
        if not self.ring:
            return None
        hash_val = int(hashlib.md5(key.encode()).hexdigest(), 16)
        idx = bisect_left(self.sorted_keys, hash_val)
        if idx == len(self.sorted_keys):
            idx = 0
        return self.ring[self.sorted_keys[idx]]
⚠ The hot shard trap with range sharding
If your shard key is a monotonically increasing value (e.g., auto-increment ID, timestamp), range sharding will slam all new writes onto the last shard. This creates a write hotspot that throttles throughput and makes vertical scaling pointless. Hash sharding avoids this, but at the cost of range query performance.
📊 Production Insight
A SaaS company sharded by customer signup date (YYYY-MM). January's shard got 10x the writes of February's because a marketing campaign in January brought 80% of new customers.
They had to migrate to hash sharding, which took three weeks.
Rule: If your data shows any temporal skew, do not use range sharding on that field.

Shard Key Selection and Trade-offs

Choosing the wrong shard key is the most common sharding mistake. A good shard key has three properties: high cardinality (many distinct values), even distribution, and inclusion in most queries. Common choices are user_id, tenant_id, order_id, or a hash of email.

If you need both even distribution and efficient range queries over time, consider a composite shard key (e.g., hash(user_id) + month). The primary key includes the hash, and a secondary index supports the time range scan. This adds storage and complexity but may be worthwhile.

Another approach: use a lookup table that maps each shard key to a shard ID. This decouples routing from the key value, allowing you to change shards without modifying the key. However, the lookup becomes a potential bottleneck and single point of failure.

io/thecodeforge/sharding/shard_map.sql · SQL
123456789101112131415
-- Schema for shard map lookup table
CREATE TABLE io_thecodeforge_sharding.shard_map (
    shard_key_hash BIGINT PRIMARY KEY,
    shard_id INT NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    FOREIGN KEY (shard_id) REFERENCES io_thecodeforge_sharding.shard_config(shard_id)
);

-- When a new entity is created, insert into the map
INSERT INTO io_thecodeforge_sharding.shard_map (shard_key_hash, shard_id)
VALUES (MD5(CONCAT('user_', :user_id)) MOD 1000000, :target_shard);

-- Query to find which shard holds a given user
SELECT shard_id FROM io_thecodeforge_sharding.shard_map
WHERE shard_key_hash = MD5(CONCAT('user_', :user_id)) MOD 1000000;
📊 Production Insight
A lookup table adds ~1ms per query and must be replicated or sharded itself. If it goes down, you cannot route any queries.
We use it only for tenants that explicitly need data locality flexibility.
Rule: Default to hash sharding. Only add a lookup table when you have dynamic shard allocation requirements.

Cross-Shard Queries and Transactions

When a query doesn't include the shard key, the system must broadcast it to all shards and merge results — this is slow and resource-intensive. For range queries across shards, consider a secondary index or a search engine (Elasticsearch) that pre-indexes the data.

Distributed transactions (e.g., transfer money between users on different shards) require coordination — typically two-phase commit (2PC). 2PC is slow, blocks on failure, and reduces availability. Most production systems avoid cross-shard transactions by designing the data model so that related data lives on the same shard. For example, put all orders for a customer on the same shard as the customer record. If absolute consistency is needed, use saga patterns with compensating actions.

io/thecodeforge/sharding/cross_shard.py · PYTHON
1234567891011121314151617181920212223242526272829
# Package: io.thecodeforge.sharding.cross_shard

# Example of a saga that transfers funds across shards
import asyncio

class SagaCoordinator:
    async def transfer(self, from_account, to_account, amount):
        # Step 1: Prepare both shards
        from_shard = self.get_shard(from_account.user_id)
        to_shard = self.get_shard(to_account.user_id)
        
        from_ok = await self.prepare_debit(from_shard, from_account, amount)
        if not from_ok:
            return abort
        to_ok = await self.prepare_credit(to_shard, to_account, amount)
        if not to_ok:
            await self.rollback_debit(from_shard, from_account, amount)
            return abort
        
        # Step 2: Commit
        try:
            await self.commit_debit(from_shard, from_account, amount)
            await self.commit_credit(to_shard, to_account, amount)
        except Exception:
            # Compensating transaction
            await self.compensate(from_shard, to_shard)
            raise
        return success
Mental Model
Same-shard locality is the single most important design constraint
If you cannot keep all related rows on one shard, you are building a cross-shard distributed system — which is 10x harder.
  • A user's profile, orders, and payments should all land on the same shard via user_id shard key.
  • If you need to join user and product data, either keep products on every shard (replicated) or accept cross-shard joins via materialized views.
  • Cross-shard transactions are always slower and less reliable. Prefer eventual consistency with idempotent operations.
📊 Production Insight
A finance app tried 2PC for cross-shard transfers. When one shard was under maintenance, all transactions failed. Then they moved to a saga pattern with retries and manual reconciliation.
Rule: Design your shard key to make cross-shard operations the exception, not the norm.

Resharding and Consistent Hashing

Resharding — changing the number of shards — is the most painful operation in a sharded system. With simple modulo hash, changing N to N+1 remaps nearly all rows (about 87.5% for 4->8). The system must read all data, rehash it, and write to new locations. This requires downtime or a complex double-write migration.

Consistent hashing dramatically reduces the amount of data that moves: only the data in the immediate neighbor of a new node is affected. This is why Cassandra and DynamoDB use it. However, consistent hashing adds complexity in handling node failures and virtual nodes (vnodes).

Production resharding strategies include
  • Pre-splitting: Start with many virtual shards (e.g., 1024) that map to physical shards. To add a physical shard, reassign some virtual shards.
  • Mirroring: Write all new data to both old and new shards simultaneously, then backfill old data.
  • Read-repair: After migration, inconsistent reads are fixed lazily during read operations (Cassandra's approach).
io/thecodeforge/sharding/consistent_hash.py · PYTHON
1234567891011121314151617181920212223242526272829303132333435
# Package: io.thecodeforge.sharding.consistent_hash

import hashlib
from bisect import bisect_left

class ConsistentHash:
    def __init__(self, nodes: list[str], vnodes: int = 150):
        self.ring = {}
        self.sorted_keys = []
        for node in nodes:
            for i in range(vnodes):
                key = hashlib.md5(f"{node}:{i}".encode()).hexdigest()
                h = int(key, 16)
                self.ring[h] = node
                self.sorted_keys.append(h)
        self.sorted_keys.sort()

    def get_node(self, key: str) -> str:
        if not self.ring:
            return None
        h = int(hashlib.md5(key.encode()).hexdigest(), 16)
        idx = bisect_left(self.sorted_keys, h)
        if idx == len(self.sorted_keys):
            idx = 0
        return self.ring[self.sorted_keys[idx]]

    def add_node(self, new_node: str, vnodes: int = 150):
        # Adding only affects adjacent ring keys
        for i in range(vnodes):
            key = hashlib.md5(f"{new_node}:{i}".encode()).hexdigest()
            h = int(key, 16)
            self.ring[h] = new_node
            self.sorted_keys.append(h)
        self.sorted_keys.sort()
📊 Production Insight
A team using modulo sharding needed to go from 4 to 8 shards. They scheduled 18 hours of read-only downtime. Halfway through, the migration script hit primary key conflicts and they had to restart from a backup. They lost 2 hours of writes.
Rule: Always test migration under realistic load. Use consistent hashing if you anticipate adding shards.
🗂 Hash vs Range Sharding Comparison
Which strategy to use depends on your query patterns and growth expectations
PropertyHash ShardingRange Sharding
DistributionEven — no hot shardsCan be skewed — hot shard risk near boundaries
Range QueriesMust scan all shardsFast if range fits within one shard
ReshardingFull rehash requiredAdds new boundaries, still moves data
Write PerformanceUniform across shardsBursts concentrate on active range
ComplexityLowMedium — need boundary management
Use CaseHigh write volume, any access patternRead-heavy with predictable range queries

🎯 Key Takeaways

  • Sharding enables horizontal scale — add more servers as data grows.
  • The shard key must be included in all queries — otherwise all shards must be scanned.
  • Hash sharding distributes evenly; range sharding allows range queries but risks hot shards.
  • Cross-shard joins and transactions are hard — design your shard key to minimise them.
  • Most applications should NOT shard — optimise indexes, add read replicas, and scale vertically first.

⚠ Common Mistakes to Avoid

    Choosing a low-cardinality shard key
    Symptom

    Some shards hold 90% of the data and handle 90% of queries, while others are nearly empty.

    Fix

    Choose a shard key with at least 100 times more distinct values than the number of shards. Use a hash of a high-cardinality column like user_id or email.

    Forgetting to include the shard key in every query
    Symptom

    Queries that filter on other columns broadcast to all shards and become 10–100x slower. The database CPU spikes and response times degrade.

    Fix

    Add a NOTICE or WARNING log whenever a query does not include the shard key. Enforce it with a query parser that rejects unfiltered queries or routes them to a dedicated read-only shard pool.

    Using modulo sharding with a future expectation to add shards
    Symptom

    When a shard is added, the team discovers that 87% of data needs to be moved, and there is no efficient migration plan.

    Fix

    Use consistent hashing from the start. If already on modulo, plan for eventual migration by pre-splitting into many virtual shards (e.g., 1024) that map to physical shards.

    Not monitoring per-shard load and skew
    Symptom

    One shard runs out of disk or CPU while others are at 20% utilization, but no alerts exist for skew.

    Fix

    Set up monitoring for each shard: CPU, disk, query latency, connection count. Alert if any shard exceeds 150% of the average.

Interview Questions on This Topic

  • QWhat is a shard key and how do you choose a good one?Mid-levelReveal
    A shard key is the column or set of columns used to determine which shard stores a row. A good shard key has high cardinality, even distribution, and is present in most queries. Prefer natural keys like user_id or tenant_id over auto-increment IDs. If you need range queries, consider a composite key or use a hash with a secondary index.
  • QWhat is the hot shard problem and how do you prevent it?Mid-levelReveal
    A hot shard receives a disproportionate share of reads or writes, causing performance degradation. Prevention: use hash sharding (especially consistent hashing), monitor per-shard load, and design the shard key to avoid clustering writes on one range. If using range sharding, periodically rebalance boundaries.
  • QHow does consistent hashing help with resharding?SeniorReveal
    Consistent hashing maps data to a ring of hash values. When a new shard (node) is added, it takes responsibility for only a subset of keys from its immediate neighbor, leaving other shards untouched. This minimizes the amount of data that must be migrated from O(N) to O(1/N) of total data per node addition.
  • QHow would you design a system that supports cross-shard joins without sacrificing performance?SeniorReveal
    Avoid cross-shard joins in the critical path. Approaches: 1) Keep data denormalized on each shard (e.g., replicate lookup tables). 2) Use a global secondary index (like DynamoDB GSI) or a separate search engine (Elasticsearch). 3) Perform the join in the application layer by fetching data from all shards and merging, but accept it will be slow. For OLTP workloads, design the shard key so related rows co-locate.

Frequently Asked Questions

How do you handle transactions that span multiple shards?

Two-phase commit (2PC) is the formal approach — a coordinator asks all shards to prepare, then commits if all agree. But 2PC is slow and reduces availability (blocked if any shard fails). Most large-scale systems avoid cross-shard transactions by designing the data model so transactions stay within one shard. Eventual consistency or saga patterns handle cases where cross-shard coordination is unavoidable.

What is resharding and why is it painful?

Resharding is changing the number of shards (e.g., 4 → 8). With simple hash sharding (user_id % n), changing n means nearly all data maps to a different shard — you must migrate ~87.5% of data. Consistent hashing minimises this: adding a shard only moves adjacent data. Most cloud databases (DynamoDB, Cassandra) handle resharding automatically.

Can I shard on a non-unique column like status or country?

Technically yes, but it's dangerous. Low cardinality columns lead to very few shards being used (e.g., only 2 countries = 2 shards). This limits write throughput and creates severe hot shards. Always choose a high-cardinality shard key.

What's the difference between sharding and partitioning?

Partitioning splits a table within a single database instance (e.g., PostgreSQL partition by range). Sharding splits data across multiple database instances on different servers. Partitioning is a tuning technique; sharding is a scaling strategy. You can combine both: partition within each shard.

🔥
Naren Founder & Author

Developer and founder of TheCodeForge. I built this site because I was tired of tutorials that explain what to type without explaining why it works. Every article here is written to make concepts actually click.

← PreviousDatabase RelationshipsNext →Database Replication
Forged with 🔥 at TheCodeForge.io — Where Developers Are Forged