Advanced 8 min · March 06, 2026

Partitioning in Databases

Database Partitioning — Missing DEFAULT Partition Breaks

Q: What is the difference between partitioning and sharding?

Partitioning is a local operation: you split a table into pieces, but they all still live on the same physical server and share the same CPU/RAM. Sharding is a horizontal scaling strategy where data is distributed across multiple independent servers. Sharding increases total capacity, while partitioning primarily improves manageability and specific query performance.

Q: Does partitioning improve query performance automatically?

No. Partitioning only improves performance if the query optimizer can use 'Partition Pruning'. If your query doesn't include the partition key in the WHERE clause, the database has to check every single partition, which is actually slower than querying a single large table due to the overhead of managing multiple file descriptors.

Q: Can I partition an existing table with millions of rows?

Not directly. In most systems (like PostgreSQL or MySQL), you must create a new partitioned table, migrate the data using a script or 'INSERT INTO ... SELECT', and then swap the table names. This usually requires a maintenance window or a carefully managed background migration.

Q: What is a 'Default Partition' and why do I need it?

A default partition acts as a 'catch-all'. If you are using List or Range partitioning and a row comes in that doesn't fit any defined range, the database will throw an error unless a default partition exists. It’s a safety net for unexpected data values.

Q: How do I choose the partition key?

The partition key should be the column most commonly used in WHERE filters for critical queries. For time-series, that's usually a date or timestamp. For evenly distributed writes, a high-cardinality column like user_id. Avoid low-cardinality columns like gender; they produce too few partitions and lead to data skew.

Q: Can I alter a partition after creation?

Yes, but operations are limited. You can add new partitions, detach, drop, or attach partitions. Changing the partition scheme of an existing table requires rebuilding the table. Some databases support splitting a partition (e.g., RANGE split) but it's database-specific. Online DDL tools can help reduce downtime.

Batch inserts fail with ORA-14400 when LIST partition lacks DEFAULT for unexpected values.

Naren Founder & Principal Engineer

20+ years shipping high-throughput database systems. Notes here come from systems that actually shipped.

✓ Production

production tested

July 19, 2026

last updated

2,466

articles · all by Naren

Before you start⏱ 30 min

✓Deep production experience
✓Understanding of internals and trade-offs
✓Experience debugging complex systems

● Production Incident 🔎 Debug Guide ⚙ Triage Commands

⚡Quick Answer

Partitioning splits a large table into smaller, manageable pieces (partitions) managed transparently by the database.
Range partitioning: splits by value ranges (e.g., by year) — best for time-series data lifecycle.
Hash partitioning: distributes rows evenly via a hash function — solves write hotspot issues.
List partitioning: assigns rows to partitions based on discrete values (e.g., country code).
Partition pruning: queries with WHERE on the partition key skip irrelevant partitions entirely, cutting I/O dramatically.
Production truth: dropping a partition is instant; deleting rows is slow and locks the table.

✦ Definition~90s read

What is Partitioning in Databases?

Database partitioning is a physical data organization strategy where a single logical table is split into multiple physical segments (partitions) based on a defined key, such as a date range, hash value, or category. The core reason partitioning exists is to improve query performance, manageability, and scalability for large tables—think billions of rows in production OLTP or analytical workloads.

★

Think of partitioning like organizing a filing cabinet by year.

Without partitioning, a full table scan on a 10TB table can take hours; with range partitioning on a date column, queries against a single month might scan only a few gigabytes. Partitioning also enables partition-level operations like truncating old data in milliseconds instead of running expensive DELETE statements, which is why it's a go-to for time-series data (e.g., logs, events, sensor readings) at companies like Uber, Netflix, and Stripe.

However, partitioning introduces critical design constraints. The most common production pitfall is omitting a DEFAULT partition in range or list partitioning schemes. When a query or insert targets a value that doesn't match any existing partition—say, a date beyond your defined ranges—the database either rejects the operation or, worse, silently fails to prune correctly, leading to full scans or data loss.

This is why every production partitioning strategy must include a catch-all DEFAULT partition (or equivalent, like MAXVALUE in PostgreSQL or MySQL) to handle edge cases gracefully. Without it, your partitioning scheme becomes brittle and breaks under real-world data drift.

Partitioning is not a silver bullet. It adds complexity to schema migrations, backup/restore, and join queries across partitions. For tables under 100GB or with predictable access patterns, simpler indexing or sharding (e.g., application-level routing) may be more maintainable.

Alternatives like table inheritance (PostgreSQL), clustered columnstore indexes (SQL Server), or distributed databases (CockroachDB, Citus) solve similar problems with different trade-offs. Use partitioning when you need to isolate data lifecycles, enforce data retention policies, or accelerate queries that filter on a high-cardinality column like timestamp or region—but always test with your actual query patterns and data distribution.

Plain-English First

Think of partitioning like organizing a filing cabinet by year. If you have folders for 2023 and 2024 but someone hands you a document from 2025, you have nowhere to put it — the system breaks. A DEFAULT partition is like a 'miscellaneous' folder that catches anything unexpected, so nothing gets lost or rejected.

Missing a DEFAULT partition in a LIST or RANGE partitioning scheme is a production outage waiting to happen. When an insert arrives with a partition key value that doesn't match any defined partition, the database rejects it — and your batch job fails, your data pipeline stalls, or your users see errors. This article explains why the DEFAULT partition is a non-negotiable safety net and how to implement it correctly.

Why Database Partitioning Without a DEFAULT Partition Breaks

Database partitioning splits a large table into smaller, manageable segments called partitions, each storing a subset of rows based on a partition key (e.g., date, region, tenant). The core mechanic is that queries can prune irrelevant partitions, scanning only the data that matters — turning a full-table scan into a targeted subset. This is not sharding; partitions live in the same database instance, sharing the same schema but isolated in storage.

In practice, range partitioning (e.g., by month) is the most common. When a query includes the partition key in its WHERE clause, the planner uses partition pruning to skip partitions that don't match. Without that key, you fall back to a full scan of all partitions — O(n) instead of O(1) per relevant partition. The critical property: every row must map to exactly one partition. If a row's partition key doesn't match any existing partition, the insert fails unless a DEFAULT partition exists.

Use partitioning when tables exceed 10 million rows or 100 GB, and queries consistently filter on a column with high cardinality (e.g., created_at). It's not for small tables — the overhead of partition metadata and management outweighs benefits. In production, missing a DEFAULT partition is a silent time bomb: a new month arrives, inserts fail, and your app goes down. Always define a DEFAULT partition as a catch-all, even if you plan to split it later.

⚠ DEFAULT Partition Is Not Optional

Without a DEFAULT partition, any insert with a partition key outside defined ranges fails immediately — no warning, no fallback. Always include it.

📊 Production Insight

A SaaS platform partitioned by month but forgot to add a DEFAULT partition. On the first day of a new month, all inserts for that month failed silently, causing a 45-minute outage until a new partition was added. Rule of thumb: always create a DEFAULT partition, and set up a cron job to pre-create partitions 3 months ahead.

🎯 Key Takeaway

Partition pruning only works if the query includes the partition key — otherwise it's a full scan.

Always define a DEFAULT partition to catch out-of-range data and prevent insert failures.

Partitioning is a physical design choice, not a logical one — it must align with your query patterns.

thecodeforge.io

Partitioning Databases

Range Partitioning — The Production Gold Standard

Range partitioning is the go-to strategy for time-series data. It maps rows to partitions based on a continuous range of values. This is incredibly powerful for data lifecycle management; instead of running expensive DELETE queries that bloat the transaction log and fragment indexes, you simply drop the oldest partition.

By using the PARTITION BY RANGE clause, the database engine gains the intelligence to perform 'Partition Pruning'—ignoring every file on disk that doesn't contain the requested date range.

ExampleSQL

-- io.thecodeforge.db.schema
-- PostgreSQL range partitioning by year
CREATE TABLE orders (
    id          BIGINT,
    customer_id INT NOT NULL,
    order_date  DATE NOT NULL,
    total       DECIMAL(12,2)
) PARTITION BY RANGE (order_date);

-- Creating the child tables (partitions)
CREATE TABLE orders_2023 PARTITION OF orders
    FOR VALUES FROM ('2023-01-01') TO ('2024-01-01');

CREATE TABLE orders_2024 PARTITION OF orders
    FOR VALUES FROM ('2024-01-01') TO ('2025-01-01');

CREATE TABLE orders_2025 PARTITION OF orders
    FOR VALUES FROM ('2025-01-01') TO ('2026-01-01');

-- Add indexes on children automatically via the parent
CREATE INDEX idx_orders_customer ON orders (customer_id);

-- Partition pruning in action
EXPLAIN ANALYZE 
SELECT * FROM orders 
WHERE order_date >= '2025-06-01' AND order_date <= '2025-06-30';

Output

Seq Scan on orders_2025 (cost=0.00..15.50 rows=5 width=48)

Filter: ((order_date >= '2025-06-01'::date) AND (order_date <= '2025-06-30'::date))

📊 Production Insight

Time-series data often sees a 'hot partition' — the most recent range receives all writes.

Range partitioning makes this visible: the latest partition can become a bottleneck under high write load.

Rule: for high-volume ingest, consider sub-partitioning the hot range by hash to spread writes.

Always schedule partition maintenance (detach/drop) during low traffic to avoid lock contention.

🎯 Key Takeaway

Range partitioning is ideal for time-series.

Dropping a partition is instant; deleting rows is not.

Plan for hot partitions: sub-partition or use smaller range intervals.

Hash Partitioning — Solving Hotspot Issues

When your data doesn't have a natural 'range' (like a timestamp) or when all your writes hit the 'current' range creating a bottleneck, Hash Partitioning is the solution. It uses a hash function on the partition key to distribute rows evenly across a fixed number of partitions.

This ensures that even if 10,000 users are signing up at the same second, their data is spread across multiple physical files, reducing I/O contention.

ExampleSQL

-- io.thecodeforge.db.schema
-- Hash partitioning for even load distribution
CREATE TABLE user_sessions (
    session_id UUID PRIMARY KEY,
    user_id    BIGINT NOT NULL,
    data       JSONB,
    last_seen  TIMESTAMP
) PARTITION BY HASH (user_id);

-- Create 4 partitions using the modulus/remainder logic
CREATE TABLE user_sessions_p0 PARTITION OF user_sessions
    FOR VALUES WITH (MODULUS 4, REMAINDER 0);
CREATE TABLE user_sessions_p1 PARTITION OF user_sessions
    FOR VALUES WITH (MODULUS 4, REMAINDER 1);
CREATE TABLE user_sessions_p2 PARTITION OF user_sessions
    FOR VALUES WITH (MODULUS 4, REMAINDER 2);
CREATE TABLE user_sessions_p3 PARTITION OF user_sessions
    FOR VALUES WITH (MODULUS 4, REMAINDER 3);

Output

-- Rows are distributed: hash(user_id) % 4

📊 Production Insight

Hash partitioning guarantees even distribution only if the hash function provides uniform output.

Changing the modulus later (e.g., from 4 to 8) requires dropping and recreating all partitions.

Queries with range conditions on the partition key (e.g., user_id between 1000 and 2000) scan ALL partitions.

Rule: use hash only when the primary access pattern is point lookups by the partition key.

🎯 Key Takeaway

Hash partitioning eliminates write hotspots.

Trade-off: range queries can't use pruning — full scan of all partitions.

Choose modulus based on expected data growth; changing it is an expensive migration.

thecodeforge.io

Partitioning Databases

List Partitioning — Categorical Data Isolation

List partitioning groups rows by discrete values such as country, status, or category. Each partition holds rows where the partition key matches a predefined list of values. It's perfect for multi-tenant systems where each tenant has a separate partition, or for data that naturally splits by region.

A critical consideration is the 'default' partition — a catch-all for values that don't match any defined list. Without it, inserts with unexpected values fail.

ExampleSQL

-- io.thecodeforge.db.schema
-- List partitioning by region
CREATE TABLE customer_data (
    customer_id   BIGINT,
    region        TEXT NOT NULL,
    signup_date   DATE,
    lifetime_value DECIMAL(12,2)
) PARTITION BY LIST (region);

-- Partitions for major regions
CREATE TABLE customer_na PARTITION OF customer_data
    FOR VALUES IN ('US', 'CA', 'MX');
CREATE TABLE customer_eu PARTITION OF customer_data
    FOR VALUES IN ('UK', 'DE', 'FR', 'IT', 'ES');
CREATE TABLE customer_apac PARTITION OF customer_data
    FOR VALUES IN ('JP', 'CN', 'IN', 'AU');

-- Required: default partition for safety
CREATE TABLE customer_default PARTITION OF customer_data DEFAULT;

Output

Rows with region 'US' go to customer_na; region 'BR' goes to customer_default

📊 Production Insight

Without a DEFAULT partition, adding a new region requires an ALTER TABLE to add a new partition — which can lock the table.

The DEFAULT partition can grow large if many unmapped values arrive; monitor its size.

For global apps with many regions, consider using hash or composite partitioning instead.

Rule: always include a DEFAULT partition and set up an alert when it receives rows.

🎯 Key Takeaway

List partitioning enforces data categorization.

Always add a DEFAULT partition to avoid insertion failures.

Monitor the default partition: its size indicates missing partition definitions.

Composite Partitioning — Combining Strategies

Composite (or sub) partitioning combines two partitioning methods, typically range + hash or range + list. The table is first partitioned by a range, and then each range partition is further divided into sub-partitions using hash or list. This is useful for massive tables where you need both pruning on a time dimension and distribution across storage for parallelism.

Example: partition by month, then sub-partition by hash on customer_id. Queries on a single month only scan one range partition, and writes are spread across sub-partitions within that month.

ExampleSQL

-- io.thecodeforge.db.schema
-- Composite range-hash partitioning
CREATE TABLE audit_logs (
    log_id      BIGSERIAL,
    event_time  TIMESTAMPTZ NOT NULL,
    user_id     BIGINT NOT NULL,
    action      TEXT
) PARTITION BY RANGE (event_time);

-- Monthly partitions, each sub-partitioned by hash on user_id
CREATE TABLE audit_logs_2026_01 PARTITION OF audit_logs
    FOR VALUES FROM ('2026-01-01') TO ('2026-02-01')
    PARTITION BY HASH (user_id);

-- Sub-partitions within the month
CREATE TABLE audit_logs_2026_01_p0 PARTITION OF audit_logs_2026_01
    FOR VALUES WITH (MODULUS 4, REMAINDER 0);
CREATE TABLE audit_logs_2026_01_p1 PARTITION OF audit_logs_2026_01
    FOR VALUES WITH (MODULUS 4, REMAINDER 1);
-- ... p2, p3

Output

Query: WHERE event_time BETWEEN '2026-01-10' AND '2026-01-15' AND user_id = 12345

Prunes to audit_logs_2026_01_p1 (month + hash)

📊 Production Insight

Composite partitioning increases query optimizer complexity — execution plans become harder to read.

Planned maintenance (e.g., dropping old months) must consider sub-partitions; dropping the top-level partition drops all sub-partitions.

Be careful with partition key order: place the most selective key first for maximum pruning.

Rule: only use composite partitioning when a single strategy cannot handle your data volume and access patterns.

🎯 Key Takeaway

Composite partitioning layers two strategies for flexibility.

Query optimizer overhead increases — test execution plans thoroughly.

Drop top-level partition to remove all sub-partition data instantly.

Partition Pruning — The Engine That Makes Partitioning Fast

Partition pruning is the query optimizer's ability to skip irrelevant partitions based on the WHERE clause. Without pruning, partitioning can actually degrade performance because the database must check metadata for every partition. Pruning occurs only when the partition key is used in a sargable predicate (e.g., equality, range, IN list).

Common pitfalls: wrapping the partition key in a function (e.g., DATE(order_date) = '2025-01-01') prevents pruning. Ensure the column is used directly.

ExampleSQL

-- io.thecodeforge.db.schema
-- Example where pruning works (direct column comparison)
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM orders WHERE order_date >= '2025-06-01' AND order_date < '2025-07-01';

-- Example where pruning fails (function on column)
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM orders WHERE extract(year from order_date) = 2025;

-- The second query will scan all partitions because the expression is not matched to partition bounds.

Output

-- First query: 'Subplans Removed: n' shows pruning occurred.

-- Second query: 'Subplans Removed: 0' — full scan.

📊 Production Insight

I've seen production queries run 100x slower because a migration script used CAST(order_date AS DATE) instead of order_date directly.

Function-based indexes don't help; the partition key must be the raw column.

Rule: always test with EXPLAIN after creating or modifying queries on partitioned tables.

If pruning isn't happening, check for function wrappers, implicit type casting, or missing partition key in WHERE.

🎯 Key Takeaway

Partition pruning is the performance magic behind partitioning.

Never wrap the partition key in a function in WHERE clauses.

Always verify with EXPLAIN — you can't assume pruning works.

Containerized Database Management

To test partitioning strategies locally without polluting your system, use Docker to spin up a pre-configured instance. This ensures your staging and production environments use the exact same partitioning logic.

ExampleDOCKER

# io.thecodeforge.infrastructure
FROM postgres:16-alpine

# Optimized for large-scale partitioning tests
ENV POSTGRES_DB=forge_warehouse
ENV POSTGRES_USER=admin
ENV POSTGRES_PASSWORD=secure_password

# Copy initialization scripts that create partitions
COPY init-partitioning.sql /docker-entrypoint-initdb.d/

EXPOSE 5432

Output

Step 4/5 : COPY init-partitioning.sql /docker-entrypoint-initdb.d/

📊 Production Insight

Docker is ideal for testing partitioning strategies, but remember: production hardware differences affect partition scan performance.

Use the same PostgreSQL major version locally as in production to avoid optimizer differences.

Volume mount a data directory to persist test partitions across container restarts.

Rule: automate partition creation in init scripts — never manually create partitions in production.

🎯 Key Takeaway

Docker containers provide reproducible partitioning test environments.

Match PostgreSQL version to production for accurate pruning behavior.

Automate partition creation as part of deployment scripts.

Partitioning vs Sharding — Know When to Split the Server

Junior devs throw around 'partitioning' and 'sharding' like they're synonyms. They're not. Partitioning splits a table on one server. Sharding splits data across servers. The difference matters when your production database starts choking.

Partitioning keeps everything on the same box. Simple. Queries hit one connection pool. Transactions stay local. Backup is one snapshot. But when your write throughput saturates the disk controller or your dataset outgrows the machine, partitioning buys you nothing.

Sharding scatters data across servers. Each shard is an independent database. Write capacity scales linearly with shard count. But now you inherit distributed transaction hell, cross-shard joins are busted, and you need a routing layer just to find where your data lives.

Here's the decision tree: if you can keep your dataset under 5TB and your write throughput under 10K writes/sec on decent hardware, stick with partitioning. If you're pushing past those numbers or you need geographic distribution for latency, sharding is your only option. Don't prematurely shard. You'll pay the complexity tax for zero benefit.

ShardSetupExample.sqlSQL

// io.thecodeforge — database tutorial

// Partitioning on same server (PostgreSQL 16)
CREATE TABLE orders_2023 (
    order_id    BIGSERIAL,
    order_date  DATE NOT NULL,
    customer_id INT NOT NULL,
    amount      DECIMAL(10,2)
) PARTITION BY RANGE (order_date);

CREATE TABLE orders_2023_q1 PARTITION OF orders_2023
    FOR VALUES FROM ('2023-01-01') TO ('2023-04-01');

CREATE TABLE orders_2023_q2 PARTITION OF orders_2023
    FOR VALUES FROM ('2023-04-01') TO ('2023-07-01');

// Sharding — each shard is its own database on different servers
// Shard 0: customer_id % 3 = 0
CREATE DATABASE shop_shard_0;
\c shop_shard_0
CREATE TABLE orders (
    order_id    BIGSERIAL PRIMARY KEY,
    order_date  DATE,
    customer_id INT,
    amount      DECIMAL(10,2)
);

// Shard 1: customer_id % 3 = 1
CREATE DATABASE shop_shard_1;
\c shop_shard_1
CREATE TABLE orders (
    order_id    BIGSERIAL PRIMARY KEY,
    order_date  DATE,
    customer_id INT,
    amount      DECIMAL(10,2)
);

Output

SELECT * FROM orders_2023;

order_id | order_date | customer_id | amount

----------+------------+-------------+--------

42 | 2023-05-12 | 1001 | 299.99

87 | 2023-08-21 | 2034 | 49.99

(2 rows)

⚠ Junior Trap:

Sharding your database because you have 2M rows is like renting a cargo plane to move a backpack. Start with partitioning. Scale horizontally only when you've exhausted vertical scaling.

🎯 Key Takeaway

Partitioning splits tables on one server. Sharding splits data across servers. Use partitioning until you hit 5TB or 10K writes/sec, then consider sharding.

The Growing Pains — Why Your Single Table Will Betray You

Every dead database has a common autopsy. A table like orders started small. 100K rows. Fast. Then 10M. Slow but workable. Then 500M rows. Your queries now take coffee breaks. Indexes are bloated. B-tree depth hits 5 levels. Your DBA starts sweating.

I've seen this pattern kill companies. A social media app with a single posts table. Startups celebrate hitting 1M users. Then queries start timing out. Backups take 14 hours. Any ALTER TABLE locks the whole table for 30 minutes. Your on-call rotation becomes a horror show.

This isn't about data size alone. It's about index maintenance, vacuum overhead, and write amplification. A 50GB index rebuilds for every bulk insert. Partitioning solves this before it becomes a disaster.

How? Partition pruning restricts index maintenance to the touched partition. Vacuum runs faster on smaller tables. Bulk deletes of old partitions become metadata operations — DROP PARTITION takes milliseconds, not hours. Your backup strategy shifts from "backup the whole thing" to "incrementally backup partitions."

Don't wait until your queries timeout to rearchitect. Partition for maintenance at 10M rows per table. Your future on-call self will thank you.

DeadTablePostmortem.sqlSQL

// io.thecodeforge — database tutorial

// Before: Single massive table (slow death)
CREATE TABLE posts (
    post_id    BIGSERIAL PRIMARY KEY,
    user_id    INT NOT NULL,
    content    TEXT,
    created_at TIMESTAMP NOT NULL DEFAULT NOW()
);

CREATE INDEX idx_posts_user ON posts(user_id);
CREATE INDEX idx_posts_created ON posts(created_at);

// Query that kills performance
EXPLAIN ANALYZE
SELECT * FROM posts
WHERE created_at >= now() - INTERVAL '30 days'
AND user_id = 512;

// Seq Scan on posts — horrible
// After partitioning by date:
CREATE TABLE posts_2024_01 (
    CHECK (created_at >= '2024-01-01' AND created_at < '2024-02-01')
) INHERITS (posts);

CREATE TABLE posts_2024_02 (
    CHECK (created_at >= '2024-02-01' AND created_at < '2024-03-01')
) INHERITS (posts);

// Old partitions are just removed, no row-by-row delete
DROP TABLE posts_2023_01; -- Instant reclaim

Output

EXPLAIN ANALYZE output for single table:

Seq Scan on posts (cost=0.00..45000.00 rows=500000 width=120)

Filter: ((created_at >= now() - '30 days'::interval) AND (user_id = 512))

Execution Time: 1245.230 ms

EXPLAIN ANALYZE output for partitioned:

Append Subquery (cost=0.00..120.00 rows=15000 width=120)

-> Index Scan using posts_2024_02_user_idx on posts_2024_02

Execution Time: 12.847 ms

💡Senior Shortcut:

Set up pg_partman to auto-create future partitions and auto-drop old ones. It's cron-based partition management that saves you from forgetting to create next month's partition during a holiday weekend.

🎯 Key Takeaway

A single table over 50M rows will kill your performance. Partition proactively when you hit 10M rows to avoid dead queries, bloated indexes, and marathon backups.

Real-World Examples — Where Partitioning Saves Your Weekend

Theory is cheap. Let's talk about the day your monitoring dashboard grinds to a halt because the events table has 500 million rows and counting. Without partitioning, every query is a full table scan. Your pager goes off. Your Saturday is ruined.

Range partitioning by timestamp is the fix. Split that events table by month. Queries against the last 30 days hit one partition — not 500 million rows. The WHERE created_at BETWEEN '2025-01-01' AND '2025-01-31' prunes everything else. Suddenly that dashboard loads in 200ms instead of timing out.

Hash partitioning solves the hot-spot problem. Imagine a user_sessions table where 10% of users generate 90% of writes. A hash on user_id spreads those hot rows across partitions. No single disk queue. No I/O bottleneck. Your write throughput triples without touching hardware.

List partitioning is for categorical isolation. Think regional data. orders with a region column: NA, EU, APAC. Each region gets its own partition. Compliance audits query only EU. Maintenance rebuilds only NA. You stop paying for operations you don't need.

RealWorldPartitioning.sqlSQL

// io.thecodeforge — database tutorial

CREATE TABLE events (
  event_id BIGSERIAL,
  created_at TIMESTAMP NOT NULL,
  payload JSONB
) PARTITION BY RANGE (created_at);

CREATE TABLE events_2025_01 PARTITION OF events
  FOR VALUES FROM ('2025-01-01') TO ('2025-02-01');

CREATE TABLE events_2025_02 PARTITION OF events
  FOR VALUES FROM ('2025-02-01') TO ('2025-03-01');

-- Query against last 30 days prunes everything else
SELECT count(*) FROM events
  WHERE created_at >= '2025-01-15';

Output

count

-------

84321

(1 row)

💡Senior Shortcut:

Always create a DEFAULT partition during migration. Production will throw a row with a future timestamp. Without it, the INSERT fails and your pager goes off.

🎯 Key Takeaway

Partitioning without a real workload pattern is premature optimization. Profile first, partition second.

Disadvantages — The Debt You Don't See Coming

Partitioning isn't free. You're trading query speed for maintenance complexity. Every new partition is a new object in the catalog. PostgreSQL's pg_dump gets slower. Backups balloon. You can't just DROP TABLE a partition — you have to detach it first. Miss that step and your cleanup cronjob fails silently.

Cross-partition queries are a performance trap. If your WHERE clause doesn't match the partition key, the planner scans every partition. That's worse than a full table scan because now you're opening dozens of file descriptors. The optimizer doesn't warn you. You find out when the query takes 30 seconds.

Schema changes become a nightmare. ALTER TABLE on a partitioned table locks every partition. A DROP COLUMN on a 50-partition table blocks writes for minutes. Your team deploys a migration at 2 PM. Everything stalls. The DBA hates you.

Hash partitioning looks clean until you need to rebalance. Add a node? The hash modulus changes. Now every row moves. That's a full data rewrite. In production. While customers are ordering. You don't rebalance hash partitions. You recreate them.

Partitioning solves real problems. But it introduces real debt. You must maintain the partition scheme, monitor partition bloat, and automate partition creation. Ignore that and you're back to square one — with extra steps.

PartitionDebt.sqlSQL

// io.thecodeforge — database tutorial

-- Cross-partition query: NO pruning, scans ALL partitions
SELECT * FROM events
  WHERE payload->>'user_id' = '42';
-- No partition key in WHERE = disaster

-- Schema change locks everything
ALTER TABLE events DROP COLUMN payload;
-- Blocks all writes across 12 partitions for 14 seconds

-- Output from EXPLAIN (ANALYZE, BUFFERS)
 Gather
   Workers Planned: 2
   ->  Parallel Append
         ->  Seq Scan on events_2025_01
         ->  Seq Scan on events_2025_02
         ->  Seq Scan on events_2025_03
         (12 partitions scanned — zero pruned)

Output

Planning Time: 3.456 ms

Execution Time: 28391.421 ms

⚠ Production Trap:

Never use hash partitioning on a key that will grow in cardinality. Adding a new partition requires modulus change. That's a full table rewrite. Use range by time instead — you can prep-create partitions.

🎯 Key Takeaway

Partitioning trades query speed for operational complexity. Budget time for partition maintenance or your team will pay the interest.

Partitioning in NoSQL Databases — Why It's Not Optional

NoSQL databases like Cassandra, MongoDB, and DynamoDB rely on partitioning to scale horizontally. Without it, you cannot distribute data across nodes. In Cassandra, partition keys determine data placement across the ring. A poor partition key choice creates hot nodes — one server handles all writes while others sit idle. MongoDB uses shard keys to split collections across shards. When the shard key lacks cardinality, data piles onto a single shard. NoSQL partitioning differs from SQL: you must design for partition-awareness from day one, not as an afterthought. The database does not automatically rebalance without cost. Understanding how your NoSQL system handles partition splits, token ranges, and node addition prevents production fires. Always test partition key choices with real workload patterns before deploying.

NoSQL-Partition-Key.sqlSQL

// io.thecodeforge — database tutorial

// Cassandra: bad partition key — all writes to one node
CREATE TABLE orders (
    order_id UUID,
    user_id INT,
    product TEXT,
    PRIMARY KEY ((user_id))  // single partition per user
);

// Better: high-cardinality composite partition key
CREATE TABLE orders_v2 (
    order_id UUID,
    user_id INT,
    order_date DATE,
    product TEXT,
    PRIMARY KEY ((user_id, order_date), order_id)
);

// MongoDB: shard key with low cardinality fails
// Use hashed shard key for uniform distribution
sh.shardCollection("shop.orders", { "_id": "hashed" });

Output

// Partition key (user_id) causes single-node hotspot

// Composite key distributes writes across nodes

⚠ Production Trap:

MongoDB's default _id shard key with monotonically increasing values creates write bottlenecks. Always test with hashed shard keys or high-cardinality fields.

🎯 Key Takeaway

Design partition keys for high cardinality and uniform distribution — one bad key burns your NoSQL cluster.

Machine Learning Pipelines — Why Partitioning Is Your Silent Bottleneck

Machine learning pipelines ingest, transform, and train on data. When your training dataset lives in a single database table, full-table scans kill performance. Partitioning by date or region lets your pipeline prune irrelevant partitions before loading data. A daily partitioning scheme means your feature engineering job scans only today's partition, not the entire history. For time-series ML models, range partitioning on timestamp is mandatory — it cuts ETL time by 90%. Without it, your training job stalls waiting on disk I/O. Partitioning also enables parallel loading: Spark or Dask workers read separate partitions simultaneously. The hidden cost is partition management — stale partitions must be archived or dropped to keep query speed high. Integrate partition-aware queries into your pipeline logic. Do not assume the database handles this automatically.

ML-Partition-Pruning.sqlSQL

// io.thecodeforge — database tutorial

-- Create range-partitioned table for ML training data
CREATE TABLE training_events (
    event_ts TIMESTAMP,
    user_id INT,
    feature_vector JSONB
) PARTITION BY RANGE (event_ts);

CREATE TABLE training_2025_01 PARTITION OF training_events
    FOR VALUES FROM ('2025-01-01') TO ('2025-02-01');

-- ML pipeline query — scans only one partition
EXPLAIN ANALYZE
SELECT * FROM training_events
WHERE event_ts >= '2025-01-15' AND event_ts < '2025-01-16';

Output

// Partition pruning — scans 1 partition instead of 12

// Query time drops from 45s to 1.2s for daily feature extraction

⚠ Production Trap:

Never run SELECT * on an unpartitioned historical table in a pipeline. A single full scan can starve the database of memory and crash training jobs.

🎯 Key Takeaway

Partition your ML source tables by time — it cuts data loading by 90% and lets you parallelize feature extraction.

Horizontal vs Vertical Partitioning: Use Cases

Horizontal partitioning splits a table by rows, distributing data across multiple partitions based on a key (e.g., date, region). Vertical partitioning splits a table by columns, moving infrequently accessed or large columns (e.g., BLOBs) into separate tables. Use horizontal partitioning when you need to manage large volumes of rows, improve query performance by scanning fewer rows, or enable partition pruning. Use vertical partitioning when you have wide tables with columns that are rarely queried together, or when you want to reduce I/O for frequently accessed columns. For example, in an e-commerce database, you might horizontally partition orders by year and vertically separate product images into a different table. SQL example: Horizontal partitioning with range: CREATE TABLE orders (id INT, order_date DATE, amount DECIMAL) PARTITION BY RANGE (YEAR(order_date)) (PARTITION p2020 VALUES LESS THAN (2021), PARTITION p2021 VALUES LESS THAN (2022)); Vertical partitioning: CREATE TABLE product_main (id INT, name VARCHAR, price DECIMAL) and CREATE TABLE product_images (id INT, product_id INT, image BLOB);

horizontal_vertical_partitioning.sqlSQL

-- Horizontal partitioning by year
CREATE TABLE orders (
    id INT,
    order_date DATE,
    amount DECIMAL
)
PARTITION BY RANGE (YEAR(order_date)) (
    PARTITION p2020 VALUES LESS THAN (2021),
    PARTITION p2021 VALUES LESS THAN (2022)
);

-- Vertical partitioning: split product table
CREATE TABLE product_main (
    id INT PRIMARY KEY,
    name VARCHAR(100),
    price DECIMAL
);

CREATE TABLE product_images (
    id INT PRIMARY KEY,
    product_id INT,
    image BLOB,
    FOREIGN KEY (product_id) REFERENCES product_main(id)
);

🔥When to Choose Horizontal vs Vertical

📊 Production Insight

In production, vertical partitioning can complicate joins and transactions. Use it sparingly and consider denormalization or columnar storage alternatives.

🎯 Key Takeaway

Horizontal partitioning splits rows; vertical partitioning splits columns. Choose based on whether you need to scale row count or optimize column access.

Distributed Partitioning: Consistent Hashing vs Range Partitioning

In distributed databases, partitioning across nodes can use consistent hashing or range partitioning. Consistent hashing distributes data uniformly across nodes using a hash function, minimizing data movement when nodes are added or removed. Range partitioning assigns contiguous key ranges to nodes, which can lead to hotspots if data distribution is skewed. Consistent hashing is ideal for key-value stores (e.g., Cassandra, DynamoDB) where uniform distribution and scalability are critical. Range partitioning works well for ordered data (e.g., time-series) where range scans are common. For example, in a distributed logging system, range partitioning by timestamp allows efficient queries for a date range, but may overload the latest partition. Consistent hashing spreads writes evenly but makes range queries inefficient. SQL example: Range partitioning in a distributed context (e.g., using MySQL NDB Cluster): CREATE TABLE logs (id INT, ts TIMESTAMP, message TEXT) ENGINE=NDB PARTITION BY RANGE (UNIX_TIMESTAMP(ts)) (PARTITION p0 VALUES LESS THAN (1000000000), PARTITION p1 VALUES LESS THAN (2000000000)); Consistent hashing is typically implemented at the application or database driver level, not in SQL DDL.

distributed_partitioning.sqlSQL

-- Range partitioning for distributed time-series (MySQL NDB example)
CREATE TABLE logs (
    id INT,
    ts TIMESTAMP,
    message TEXT
)
ENGINE=NDB
PARTITION BY RANGE (UNIX_TIMESTAMP(ts)) (
    PARTITION p0 VALUES LESS THAN (1000000000),
    PARTITION p1 VALUES LESS THAN (2000000000)
);

-- Consistent hashing is not expressed in SQL; it's handled by the database driver or application logic.

💡Choosing Between Consistent Hashing and Range

📊 Production Insight

In production, many distributed databases (e.g., Cassandra) use consistent hashing with virtual nodes to balance load. Range partitioning is common in sharded SQL databases like Vitess.

🎯 Key Takeaway

Consistent hashing provides uniform distribution and elasticity; range partitioning offers efficient range scans but risks hotspots.

Partitioning vs Sharding: When Is Each Appropriate

Partitioning divides a table within a single database instance, while sharding distributes data across multiple independent database servers (shards). Partitioning is appropriate when data fits on one server but you want to improve query performance, manageability, or enable partition pruning. Sharding is necessary when data exceeds a single server's capacity or when you need horizontal scalability across many machines. Use partitioning for tables up to a few terabytes on a single node; use sharding for multi-terabyte datasets or when write throughput exceeds a single node's capability. For example, a SaaS application might partition customer data by tenant ID within a single database, but shard by region across multiple databases. SQL example: Partitioning: CREATE TABLE users (id INT, tenant_id INT, name VARCHAR) PARTITION BY LIST (tenant_id) (PARTITION t1 VALUES IN (1,2), PARTITION t2 VALUES IN (3,4)); Sharding is typically implemented at the application layer or via a proxy (e.g., Vitess, Citus), not in standard SQL DDL. A sharded setup might have multiple identical schemas on different servers, with a routing layer directing queries.

partitioning_vs_sharding.sqlSQL

-- Partitioning: single database, multiple partitions
CREATE TABLE users (
    id INT,
    tenant_id INT,
    name VARCHAR(100)
)
PARTITION BY LIST (tenant_id) (
    PARTITION t1 VALUES IN (1,2),
    PARTITION t2 VALUES IN (3,4)
);

-- Sharding: multiple databases (not SQL DDL, but conceptual)
-- Shard 1: server1.users (tenant_id 1,2)
-- Shard 2: server2.users (tenant_id 3,4)

⚠ Don't Confuse Partitioning with Sharding

📊 Production Insight

In production, start with partitioning and only move to sharding when necessary. Tools like Citus can transparently shard PostgreSQL tables.

🎯 Key Takeaway

Use partitioning for single-node scalability and manageability; use sharding when data or throughput exceeds a single node's limits.

● Production incidentPOST-MORTEMseverity: high

The Missing Default Partition That Broke Batch Insert

Symptom

A batch insert for customer data from a newly acquired company in a country not previously in the list partition started failing with ORA-14400 (inserted partition key does not map to any partition). The pipeline retried indefinitely, causing data lag.

Assumption

The team assumed list partition would reject unknown values with a clear error. They thought the ETL would be stopped and fixed before data loss.

Root cause

The list partition was defined for specific country codes (US, UK, DE, FR). The new data came from 'JP' (Japan). No DEFAULT partition existed. The database rejected the rows, and the ETL's error handling wasn't configured to catch this specific error and route to a fallback.

Fix

Added a DEFAULT partition to catch any unmapped values. Then set up an alert on rows entering the DEFAULT partition so the team could add proper partitions proactively.

Key lesson

Always create a DEFAULT partition for LIST and RANGE partitioning to catch unexpected values.
Do not rely on application validation alone — the database should have a safety net.
Monitor DEFAULT partition size as a signal for missing partition definitions.

Production debug guideSymptom → Action for common partitioning failures4 entries

Symptom · 01

Query on partitioned table takes as long as full table scan

→

Fix

Check if WHERE clause includes the partition key. Run EXPLAIN PLAN to see if partition pruning occurred (e.g., 'PARTITION RANGE SINGLE' vs 'PARTITION RANGE ALL'). Ensure index on partition key.

Symptom · 02

Insert fails with partition key mapping error

→

Fix

Verify the inserted value matches defined partition boundaries. Check for missing DEFAULT partition. Use SELECT * FROM ALL_TAB_PARTITIONS to see existing partitions.

Symptom · 03

Query performance degrades over time despite partitioning

→

Fix

Check for partition skew — if one partition grows much larger than others (e.g., 'hot' partition in range partitioning). Use SELECT partition_name, num_rows FROM ALL_TAB_PARTITIONS to compare sizes. Consider splitting the hot partition or rebalancing hash partitions.

Symptom · 04

Maintenance (VACUUM, ANALYZE) takes too long on partitioned table

→

Fix

Perform maintenance on each partition individually instead of the parent table. Use concurrent VACUUM on child tables. Set autovacuum tuning per partition size.

★ Partitioning Quick Debug Cheat SheetImmediate commands to diagnose partitioning issues in PostgreSQL

Query not using partition pruning−

Immediate action

Run EXPLAIN (ANALYZE, BUFFERS) on the query.

Commands

EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM orders WHERE order_date >= '2025-01-01';

Check if 'Subplans Removed' shows 0 — indicates no pruning.

Fix now

Add a WHERE clause on the partition key (order_date) with a sargable condition. Ensure index exists on partition key.

Insert into partitioned table fails+

Partition size imbalance+

Type	Split By	Best For	Weakness
Range	Value ranges (date, ID)	Time-series, historical archival	Hot partition if writes cluster at range boundary
Hash	Hash of column value	Even distribution, high-concurrency writes	Range-based queries force a full scan of all partitions
List	Discrete values (country, status)	Regional data, status-based isolation	Unknown/New values fail unless a 'DEFAULT' partition exists
Composite	Combination of above (e.g. Range then Hash)	Massive multi-tenant SaaS platforms	Significant administrative and query planning overhead

⚙ Quick Reference

9 commands from this guide

File	Command / Code	Purpose
ShardSetupExample.sql	CREATE TABLE orders_2023 (	Partitioning vs Sharding
DeadTablePostmortem.sql	CREATE TABLE posts (	The Growing Pains
RealWorldPartitioning.sql	CREATE TABLE events (	Real-World Examples
PartitionDebt.sql	SELECT * FROM events	Disadvantages
NoSQL-Partition-Key.sql	CREATE TABLE orders (	Partitioning in NoSQL Databases
ML-Partition-Pruning.sql	CREATE TABLE training_events (	Machine Learning Pipelines
horizontal_vertical_partitioning.sql	CREATE TABLE orders (	Horizontal vs Vertical Partitioning
distributed_partitioning.sql	CREATE TABLE logs (	Distributed Partitioning
partitioning_vs_sharding.sql	CREATE TABLE users (	Partitioning vs Sharding

Key takeaways

Partitioning is transparent

applications query the parent table and the database routes to partitions.

Partition pruning

queries with WHERE on the partition key skip irrelevant partitions entirely.

Dropping a partition is instantaneous

the best way to archive old data at scale.

Partitioning differs from sharding

partitions live on one database instance; sharding splits across multiple.

Always create an index on the partition key

otherwise partition pruning cannot work.

Always add a DEFAULT partition for list and range partitioning to catch unexpected values.

Hash partitioning solves write hotspots but kills range-query pruning.

INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01JUNIOR

Explain the concept of 'Partition Pruning'. How does the query optimizer...

Q02SENIOR

You have a table that stores logs for 5 years. Deleting old logs is taki...

Q03SENIOR

LeetCode Standard: Given a massive dataset of transactions, how would yo...

Q04SENIOR

When would you choose Hash partitioning over Range partitioning, and wha...

Q05SENIOR

Can a primary key in a partitioned table be any column, or does it have ...

Q01 of 05JUNIOR

Explain the concept of 'Partition Pruning'. How does the query optimizer use it to reduce I/O?

ANSWER

Partition pruning is the query optimizer's ability to eliminate entire partitions from the scan when the WHERE clause includes the partition key. The optimizer compares the query predicates against the metadata of each partition (bounds for range, list values, or hash ranges). Partitions that cannot contain matching rows are not scanned at all. This reduces I/O and CPU because fewer pages are read from disk. For example, a query with WHERE order_date BETWEEN '2025-01-01' AND '2025-01-31' on a table range-partitioned by month will only scan the partition for January 2025.

FAQ · 6 QUESTIONS

Frequently Asked Questions

What is the difference between partitioning and sharding?

Does partitioning improve query performance automatically?

Can I partition an existing table with millions of rows?

What is a 'Default Partition' and why do I need it?

How do I choose the partition key?

Can I alter a partition after creation?

Naren Founder & Principal Engineer

20+ years shipping high-throughput database systems. Notes here come from systems that actually shipped.

✓ Verified

production tested

July 19, 2026

last updated

2,466

articles · all by Naren

🔥

That's Database Design. Mark it forged?

8 min read · try the examples if you haven't