Skip to content
Home Database Partitioning in Databases

Partitioning in Databases

Where developers are forged. · Structured learning · Free forever.
📍 Part of: Database Design → Topic 9 of 16
Database partitioning explained — horizontal vs vertical partitioning, range, hash, and list partitioning, partition pruning, and how it differs from sharding.
🔥 Advanced — solid Database foundation required
In this tutorial, you'll learn
Database partitioning explained — horizontal vs vertical partitioning, range, hash, and list partitioning, partition pruning, and how it differs from sharding.
  • Partitioning is transparent — applications query the parent table and the database routes to partitions.
  • Partition pruning: queries with WHERE on the partition key skip irrelevant partitions entirely.
  • Dropping a partition is instantaneous — the best way to archive old data at scale.
✦ Plain-English analogy ✦ Real code with output ✦ Interview questions
Quick Answer

Partitioning splits a large table into smaller, more manageable pieces (partitions) that the database manages transparently. The partitions live on the same database instance. Range partitioning splits by value ranges (e.g., by year). Hash partitioning distributes rows evenly. List partitioning assigns rows to partitions based on discrete values.

Range Partitioning — The Production Gold Standard

Range partitioning is the go-to strategy for time-series data. It maps rows to partitions based on a continuous range of values. This is incredibly powerful for data lifecycle management; instead of running expensive DELETE queries that bloat the transaction log and fragment indexes, you simply drop the oldest partition.

By using the PARTITION BY RANGE clause, the database engine gains the intelligence to perform 'Partition Pruning'—ignoring every file on disk that doesn't contain the requested date range.

Example · SQL
1234567891011121314151617181920212223242526
-- io.thecodeforge.db.schema
-- PostgreSQL range partitioning by year
CREATE TABLE orders (
    id          BIGINT,
    customer_id INT NOT NULL,
    order_date  DATE NOT NULL,
    total       DECIMAL(12,2)
) PARTITION BY RANGE (order_date);

-- Creating the child tables (partitions)
CREATE TABLE orders_2023 PARTITION OF orders
    FOR VALUES FROM ('2023-01-01') TO ('2024-01-01');

CREATE TABLE orders_2024 PARTITION OF orders
    FOR VALUES FROM ('2024-01-01') TO ('2025-01-01');

CREATE TABLE orders_2025 PARTITION OF orders
    FOR VALUES FROM ('2025-01-01') TO ('2026-01-01');

-- Add indexes on children automatically via the parent
CREATE INDEX idx_orders_customer ON orders (customer_id);

-- Partition pruning in action
EXPLAIN ANALYZE 
SELECT * FROM orders 
WHERE order_date >= '2025-06-01' AND order_date <= '2025-06-30';
▶ Output
Seq Scan on orders_2025 (cost=0.00..15.50 rows=5 width=48)
Filter: ((order_date >= '2025-06-01'::date) AND (order_date <= '2025-06-30'::date))

Hash Partitioning — Solving Hotspot Issues

When your data doesn't have a natural 'range' (like a timestamp) or when all your writes hit the 'current' range creating a bottleneck, Hash Partitioning is the solution. It uses a hash function on the partition key to distribute rows evenly across a fixed number of partitions.

This ensures that even if 10,000 users are signing up at the same second, their data is spread across multiple physical files, reducing I/O contention.

Example · SQL
123456789101112131415161718
-- io.thecodeforge.db.schema
-- Hash partitioning for even load distribution
CREATE TABLE user_sessions (
    session_id UUID PRIMARY KEY,
    user_id    BIGINT NOT NULL,
    data       JSONB,
    last_seen  TIMESTAMP
) PARTITION BY HASH (user_id);

-- Create 4 partitions using the modulus/remainder logic
CREATE TABLE user_sessions_p0 PARTITION OF user_sessions
    FOR VALUES WITH (MODULUS 4, REMAINDER 0);
CREATE TABLE user_sessions_p1 PARTITION OF user_sessions
    FOR VALUES WITH (MODULUS 4, REMAINDER 1);
CREATE TABLE user_sessions_p2 PARTITION OF user_sessions
    FOR VALUES WITH (MODULUS 4, REMAINDER 2);
CREATE TABLE user_sessions_p3 PARTITION OF user_sessions
    FOR VALUES WITH (MODULUS 4, REMAINDER 3);
▶ Output
-- Rows are distributed: hash(user_id) % 4

Containerized Database Management

To test partitioning strategies locally without polluting your system, use Docker to spin up a pre-configured instance. This ensures your staging and production environments use the exact same partitioning logic.

Example · DOCKER
123456789101112
# io.thecodeforge.infrastructure
FROM postgres:16-alpine

# Optimized for large-scale partitioning tests
ENV POSTGRES_DB=forge_warehouse
ENV POSTGRES_USER=admin
ENV POSTGRES_PASSWORD=secure_password

# Copy initialization scripts that create partitions
COPY init-partitioning.sql /docker-entrypoint-initdb.d/

EXPOSE 5432
▶ Output
Step 4/5 : COPY init-partitioning.sql /docker-entrypoint-initdb.d/
TypeSplit ByBest ForWeakness
RangeValue ranges (date, ID)Time-series, historical archivalHot partition if writes cluster at range boundary
HashHash of column valueEven distribution, high-concurrency writesRange-based queries force a full scan of all partitions
ListDiscrete values (country, status)Regional data, status-based isolationUnknown/New values fail unless a 'DEFAULT' partition exists
CompositeCombination of above (e.g. Range then Hash)Massive multi-tenant SaaS platformsSignificant administrative and query planning overhead

🎯 Key Takeaways

  • Partitioning is transparent — applications query the parent table and the database routes to partitions.
  • Partition pruning: queries with WHERE on the partition key skip irrelevant partitions entirely.
  • Dropping a partition is instantaneous — the best way to archive old data at scale.
  • Partitioning differs from sharding: partitions live on one database instance; sharding splits across multiple.
  • Always create an index on the partition key — otherwise partition pruning cannot work.

Interview Questions on This Topic

  • QExplain the concept of 'Partition Pruning'. How does the query optimizer use it to reduce I/O?
  • QYou have a table that stores logs for 5 years. Deleting old logs is taking hours and locking the table. How would you redesign this using partitioning?
  • QLeetCode Standard: Given a massive dataset of transactions, how would you partition it to optimize for queries that only look at the 'current month' vs. 'user history'?
  • QWhen would you choose Hash partitioning over Range partitioning, and what are the trade-offs for range-based queries?
  • QCan a primary key in a partitioned table be any column, or does it have to include the partition key? Why?

Frequently Asked Questions

What is the difference between partitioning and sharding?

Partitioning is a local operation: you split a table into pieces, but they all still live on the same physical server and share the same CPU/RAM. Sharding is a horizontal scaling strategy where data is distributed across multiple independent servers. Sharding increases total capacity, while partitioning primarily improves manageability and specific query performance.

Does partitioning improve query performance automatically?

No. Partitioning only improves performance if the query optimizer can use 'Partition Pruning'. If your query doesn't include the partition key in the WHERE clause, the database has to check every single partition, which is actually slower than querying a single large table due to the overhead of managing multiple file descriptors.

Can I partition an existing table with millions of rows?

Not directly. In most systems (like PostgreSQL or MySQL), you must create a new partitioned table, migrate the data using a script or 'INSERT INTO ... SELECT', and then swap the table names. This usually requires a maintenance window or a carefully managed background migration.

What is a 'Default Partition' and why do I need it?

A default partition acts as a 'catch-all'. If you are using List or Range partitioning and a row comes in that doesn't fit any defined range, the database will throw an error unless a default partition exists. It’s a safety net for unexpected data values.

🔥
Naren Founder & Author

Developer and founder of TheCodeForge. I built this site because I was tired of tutorials that explain what to type without explaining why it works. Every article here is written to make concepts actually click.

← PreviousCAP Theorem and DatabasesNext →Database Connection Pooling
Forged with 🔥 at TheCodeForge.io — Where Developers Are Forged