Partitioning splits a large table into smaller, manageable pieces (partitions) managed transparently by the database.
Range partitioning: splits by value ranges (e.g., by year) — best for time-series data lifecycle.
Hash partitioning: distributes rows evenly via a hash function — solves write hotspot issues.
List partitioning: assigns rows to partitions based on discrete values (e.g., country code).
Partition pruning: queries with WHERE on the partition key skip irrelevant partitions entirely, cutting I/O dramatically.
Production truth: dropping a partition is instant; deleting rows is slow and locks the table.
✦ Definition~90s read
What is Partitioning in Databases?
Database partitioning is a physical data organization strategy where a single logical table is split into multiple physical segments (partitions) based on a defined key, such as a date range, hash value, or category. The core reason partitioning exists is to improve query performance, manageability, and scalability for large tables—think billions of rows in production OLTP or analytical workloads.
Without partitioning, a full table scan on a 10TB table can take hours; with range partitioning on a date column, queries against a single month might scan only a few gigabytes. Partitioning also enables partition-level operations like truncating old data in milliseconds instead of running expensive DELETE statements, which is why it's a go-to for time-series data (e.g., logs, events, sensor readings) at companies like Uber, Netflix, and Stripe.
However, partitioning introduces critical design constraints. The most common production pitfall is omitting a DEFAULT partition in range or list partitioning schemes. When a query or insert targets a value that doesn't match any existing partition—say, a date beyond your defined ranges—the database either rejects the operation or, worse, silently fails to prune correctly, leading to full scans or data loss.
This is why every production partitioning strategy must include a catch-all DEFAULT partition (or equivalent, like MAXVALUE in PostgreSQL or MySQL) to handle edge cases gracefully. Without it, your partitioning scheme becomes brittle and breaks under real-world data drift.
Partitioning is not a silver bullet. It adds complexity to schema migrations, backup/restore, and join queries across partitions. For tables under 100GB or with predictable access patterns, simpler indexing or sharding (e.g., application-level routing) may be more maintainable.
Alternatives like table inheritance (PostgreSQL), clustered columnstore indexes (SQL Server), or distributed databases (CockroachDB, Citus) solve similar problems with different trade-offs. Use partitioning when you need to isolate data lifecycles, enforce data retention policies, or accelerate queries that filter on a high-cardinality column like timestamp or region—but always test with your actual query patterns and data distribution.
Why Database Partitioning Without a DEFAULT Partition Breaks
Database partitioning splits a large table into smaller, manageable segments called partitions, each storing a subset of rows based on a partition key (e.g., date, region, tenant). The core mechanic is that queries can prune irrelevant partitions, scanning only the data that matters — turning a full-table scan into a targeted subset. This is not sharding; partitions live in the same database instance, sharing the same schema but isolated in storage.
In practice, range partitioning (e.g., by month) is the most common. When a query includes the partition key in its WHERE clause, the planner uses partition pruning to skip partitions that don't match. Without that key, you fall back to a full scan of all partitions — O(n) instead of O(1) per relevant partition. The critical property: every row must map to exactly one partition. If a row's partition key doesn't match any existing partition, the insert fails unless a DEFAULT partition exists.
Use partitioning when tables exceed 10 million rows or 100 GB, and queries consistently filter on a column with high cardinality (e.g., created_at). It's not for small tables — the overhead of partition metadata and management outweighs benefits. In production, missing a DEFAULT partition is a silent time bomb: a new month arrives, inserts fail, and your app goes down. Always define a DEFAULT partition as a catch-all, even if you plan to split it later.
DEFAULT Partition Is Not Optional
Without a DEFAULT partition, any insert with a partition key outside defined ranges fails immediately — no warning, no fallback. Always include it.
Production Insight
A SaaS platform partitioned by month but forgot to add a DEFAULT partition. On the first day of a new month, all inserts for that month failed silently, causing a 45-minute outage until a new partition was added. Rule of thumb: always create a DEFAULT partition, and set up a cron job to pre-create partitions 3 months ahead.
Key Takeaway
Partition pruning only works if the query includes the partition key — otherwise it's a full scan.
Always define a DEFAULT partition to catch out-of-range data and prevent insert failures.
Partitioning is a physical design choice, not a logical one — it must align with your query patterns.
thecodeforge.io
Database Partitioning Strategies and Pitfalls
Partitioning Databases
Range Partitioning — The Production Gold Standard
Range partitioning is the go-to strategy for time-series data. It maps rows to partitions based on a continuous range of values. This is incredibly powerful for data lifecycle management; instead of running expensive DELETE queries that bloat the transaction log and fragment indexes, you simply drop the oldest partition.
By using the PARTITION BY RANGE clause, the database engine gains the intelligence to perform 'Partition Pruning'—ignoring every file on disk that doesn't contain the requested date range.
ExampleSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
-- io.thecodeforge.db.schema-- PostgreSQL range partitioning by yearCREATETABLEorders (
id BIGINT,
customer_id INTNOTNULL,
order_date DATENOTNULL,
total DECIMAL(12,2)
) PARTITIONBYRANGE (order_date);
-- Creating the child tables (partitions)CREATETABLE orders_2023 PARTITIONOF orders
FORVALUESFROM ('2023-01-01') TO ('2024-01-01');
CREATETABLE orders_2024 PARTITIONOF orders
FORVALUESFROM ('2024-01-01') TO ('2025-01-01');
CREATETABLE orders_2025 PARTITIONOF orders
FORVALUESFROM ('2025-01-01') TO ('2026-01-01');
-- Add indexes on children automatically via the parentCREATEINDEX idx_orders_customer ONorders (customer_id);
-- Partition pruning in actionEXPLAINANALYZESELECT * FROM orders
WHERE order_date >= '2025-06-01'AND order_date <= '2025-06-30';
Output
Seq Scan on orders_2025 (cost=0.00..15.50 rows=5 width=48)
Filter: ((order_date >= '2025-06-01'::date) AND (order_date <= '2025-06-30'::date))
Production Insight
Time-series data often sees a 'hot partition' — the most recent range receives all writes.
Range partitioning makes this visible: the latest partition can become a bottleneck under high write load.
Rule: for high-volume ingest, consider sub-partitioning the hot range by hash to spread writes.
Always schedule partition maintenance (detach/drop) during low traffic to avoid lock contention.
Key Takeaway
Range partitioning is ideal for time-series.
Dropping a partition is instant; deleting rows is not.
Plan for hot partitions: sub-partition or use smaller range intervals.
Hash Partitioning — Solving Hotspot Issues
When your data doesn't have a natural 'range' (like a timestamp) or when all your writes hit the 'current' range creating a bottleneck, Hash Partitioning is the solution. It uses a hash function on the partition key to distribute rows evenly across a fixed number of partitions.
This ensures that even if 10,000 users are signing up at the same second, their data is spread across multiple physical files, reducing I/O contention.
Hash partitioning guarantees even distribution only if the hash function provides uniform output.
Changing the modulus later (e.g., from 4 to 8) requires dropping and recreating all partitions.
Queries with range conditions on the partition key (e.g., user_id between 1000 and 2000) scan ALL partitions.
Rule: use hash only when the primary access pattern is point lookups by the partition key.
Key Takeaway
Hash partitioning eliminates write hotspots.
Trade-off: range queries can't use pruning — full scan of all partitions.
Choose modulus based on expected data growth; changing it is an expensive migration.
List Partitioning — Categorical Data Isolation
List partitioning groups rows by discrete values such as country, status, or category. Each partition holds rows where the partition key matches a predefined list of values. It's perfect for multi-tenant systems where each tenant has a separate partition, or for data that naturally splits by region.
A critical consideration is the 'default' partition — a catch-all for values that don't match any defined list. Without it, inserts with unexpected values fail.
ExampleSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
-- io.thecodeforge.db.schema-- List partitioning by regionCREATETABLEcustomer_data (
customer_id BIGINT,
region TEXTNOTNULL,
signup_date DATE,
lifetime_value DECIMAL(12,2)
) PARTITIONBYLIST (region);
-- Partitions for major regionsCREATETABLE customer_na PARTITIONOF customer_data
FORVALUESIN ('US', 'CA', 'MX');
CREATETABLE customer_eu PARTITIONOF customer_data
FORVALUESIN ('UK', 'DE', 'FR', 'IT', 'ES');
CREATETABLE customer_apac PARTITIONOF customer_data
FORVALUESIN ('JP', 'CN', 'IN', 'AU');
-- Required: default partition for safetyCREATETABLE customer_default PARTITIONOF customer_data DEFAULT;
Output
Rows with region 'US' go to customer_na; region 'BR' goes to customer_default
Production Insight
Without a DEFAULT partition, adding a new region requires an ALTER TABLE to add a new partition — which can lock the table.
The DEFAULT partition can grow large if many unmapped values arrive; monitor its size.
For global apps with many regions, consider using hash or composite partitioning instead.
Rule: always include a DEFAULT partition and set up an alert when it receives rows.
Key Takeaway
List partitioning enforces data categorization.
Always add a DEFAULT partition to avoid insertion failures.
Monitor the default partition: its size indicates missing partition definitions.
Composite Partitioning — Combining Strategies
Composite (or sub) partitioning combines two partitioning methods, typically range + hash or range + list. The table is first partitioned by a range, and then each range partition is further divided into sub-partitions using hash or list. This is useful for massive tables where you need both pruning on a time dimension and distribution across storage for parallelism.
Example: partition by month, then sub-partition by hash on customer_id. Queries on a single month only scan one range partition, and writes are spread across sub-partitions within that month.
ExampleSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
-- io.thecodeforge.db.schema-- Composite range-hash partitioningCREATETABLEaudit_logs (
log_id BIGSERIAL,
event_time TIMESTAMPTZNOTNULL,
user_id BIGINTNOTNULL,
action TEXT
) PARTITIONBYRANGE (event_time);
-- Monthly partitions, each sub-partitioned by hash on user_idCREATETABLE audit_logs_2026_01 PARTITIONOF audit_logs
FORVALUESFROM ('2026-01-01') TO ('2026-02-01')
PARTITIONBYHASH (user_id);
-- Sub-partitions within the monthCREATETABLE audit_logs_2026_01_p0 PARTITIONOF audit_logs_2026_01
FORVALUESWITH (MODULUS4, REMAINDER0);
CREATETABLE audit_logs_2026_01_p1 PARTITIONOF audit_logs_2026_01
FORVALUESWITH (MODULUS4, REMAINDER1);
-- ... p2, p3
Output
Query: WHERE event_time BETWEEN '2026-01-10' AND '2026-01-15' AND user_id = 12345
Prunes to audit_logs_2026_01_p1 (month + hash)
Production Insight
Composite partitioning increases query optimizer complexity — execution plans become harder to read.
Planned maintenance (e.g., dropping old months) must consider sub-partitions; dropping the top-level partition drops all sub-partitions.
Be careful with partition key order: place the most selective key first for maximum pruning.
Rule: only use composite partitioning when a single strategy cannot handle your data volume and access patterns.
Key Takeaway
Composite partitioning layers two strategies for flexibility.
Query optimizer overhead increases — test execution plans thoroughly.
Drop top-level partition to remove all sub-partition data instantly.
Partition Pruning — The Engine That Makes Partitioning Fast
Partition pruning is the query optimizer's ability to skip irrelevant partitions based on the WHERE clause. Without pruning, partitioning can actually degrade performance because the database must check metadata for every partition. Pruning occurs only when the partition key is used in a sargable predicate (e.g., equality, range, IN list).
Common pitfalls: wrapping the partition key in a function (e.g., DATE(order_date) = '2025-01-01') prevents pruning. Ensure the column is used directly.
ExampleSQL
1
2
3
4
5
6
7
8
9
10
-- io.thecodeforge.db.schema-- Example where pruning works (direct column comparison)EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM orders WHERE order_date >= '2025-06-01'AND order_date < '2025-07-01';
-- Example where pruning fails (function on column)EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM orders WHEREextract(year from order_date) = 2025;
-- The second query will scan all partitions because the expression is not matched to partition bounds.
Output
-- First query: 'Subplans Removed: n' shows pruning occurred.
-- Second query: 'Subplans Removed: 0' — full scan.
Production Insight
I've seen production queries run 100x slower because a migration script used CAST(order_date AS DATE) instead of order_date directly.
Function-based indexes don't help; the partition key must be the raw column.
Rule: always test with EXPLAIN after creating or modifying queries on partitioned tables.
If pruning isn't happening, check for function wrappers, implicit type casting, or missing partition key in WHERE.
Key Takeaway
Partition pruning is the performance magic behind partitioning.
Never wrap the partition key in a function in WHERE clauses.
Always verify with EXPLAIN — you can't assume pruning works.
Containerized Database Management
To test partitioning strategies locally without polluting your system, use Docker to spin up a pre-configured instance. This ensures your staging and production environments use the exact same partitioning logic.
Docker is ideal for testing partitioning strategies, but remember: production hardware differences affect partition scan performance.
Use the same PostgreSQL major version locally as in production to avoid optimizer differences.
Volume mount a data directory to persist test partitions across container restarts.
Rule: automate partition creation in init scripts — never manually create partitions in production.
Key Takeaway
Docker containers provide reproducible partitioning test environments.
Match PostgreSQL version to production for accurate pruning behavior.
Automate partition creation as part of deployment scripts.
Partitioning vs Sharding — Know When to Split the Server
Junior devs throw around 'partitioning' and 'sharding' like they're synonyms. They're not. Partitioning splits a table on one server. Sharding splits data across servers. The difference matters when your production database starts choking.
Partitioning keeps everything on the same box. Simple. Queries hit one connection pool. Transactions stay local. Backup is one snapshot. But when your write throughput saturates the disk controller or your dataset outgrows the machine, partitioning buys you nothing.
Sharding scatters data across servers. Each shard is an independent database. Write capacity scales linearly with shard count. But now you inherit distributed transaction hell, cross-shard joins are busted, and you need a routing layer just to find where your data lives.
Here's the decision tree: if you can keep your dataset under 5TB and your write throughput under 10K writes/sec on decent hardware, stick with partitioning. If you're pushing past those numbers or you need geographic distribution for latency, sharding is your only option. Don't prematurely shard. You'll pay the complexity tax for zero benefit.
ShardSetupExample.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
// io.thecodeforge — database tutorial
// Partitioning on same server (PostgreSQL16)
CREATETABLEorders_2023 (
order_id BIGSERIAL,
order_date DATENOTNULL,
customer_id INTNOTNULL,
amount DECIMAL(10,2)
) PARTITIONBYRANGE (order_date);
CREATETABLE orders_2023_q1 PARTITIONOF orders_2023
FORVALUESFROM ('2023-01-01') TO ('2023-04-01');
CREATETABLE orders_2023_q2 PARTITIONOF orders_2023
FORVALUESFROM ('2023-04-01') TO ('2023-07-01');
// Sharding — each shard is its own database on different servers
// Shard0: customer_id % 3 = 0CREATEDATABASE shop_shard_0;
\c shop_shard_0
CREATETABLEorders (
order_id BIGSERIALPRIMARYKEY,
order_date DATE,
customer_id INT,
amount DECIMAL(10,2)
);
// Shard1: customer_id % 3 = 1CREATEDATABASE shop_shard_1;
\c shop_shard_1
CREATETABLEorders (
order_id BIGSERIALPRIMARYKEY,
order_date DATE,
customer_id INT,
amount DECIMAL(10,2)
);
Output
SELECT * FROM orders_2023;
order_id | order_date | customer_id | amount
----------+------------+-------------+--------
42 | 2023-05-12 | 1001 | 299.99
87 | 2023-08-21 | 2034 | 49.99
(2 rows)
Junior Trap:
Sharding your database because you have 2M rows is like renting a cargo plane to move a backpack. Start with partitioning. Scale horizontally only when you've exhausted vertical scaling.
Key Takeaway
Partitioning splits tables on one server. Sharding splits data across servers. Use partitioning until you hit 5TB or 10K writes/sec, then consider sharding.
The Growing Pains — Why Your Single Table Will Betray You
Every dead database has a common autopsy. A table like orders started small. 100K rows. Fast. Then 10M. Slow but workable. Then 500M rows. Your queries now take coffee breaks. Indexes are bloated. B-tree depth hits 5 levels. Your DBA starts sweating.
I've seen this pattern kill companies. A social media app with a single posts table. Startups celebrate hitting 1M users. Then queries start timing out. Backups take 14 hours. Any ALTER TABLE locks the whole table for 30 minutes. Your on-call rotation becomes a horror show.
This isn't about data size alone. It's about index maintenance, vacuum overhead, and write amplification. A 50GB index rebuilds for every bulk insert. Partitioning solves this before it becomes a disaster.
How? Partition pruning restricts index maintenance to the touched partition. Vacuum runs faster on smaller tables. Bulk deletes of old partitions become metadata operations — DROP PARTITION takes milliseconds, not hours. Your backup strategy shifts from "backup the whole thing" to "incrementally backup partitions."
Don't wait until your queries timeout to rearchitect. Partition for maintenance at 10M rows per table. Your future on-call self will thank you.
DeadTablePostmortem.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
// io.thecodeforge — database tutorial
// Before: Single massive table (slow death)
CREATETABLEposts (
post_id BIGSERIALPRIMARYKEY,
user_id INTNOTNULL,
content TEXT,
created_at TIMESTAMPNOTNULLDEFAULTNOW()
);
CREATEINDEX idx_posts_user ONposts(user_id);
CREATEINDEX idx_posts_created ONposts(created_at);
// Query that kills performance
EXPLAINANALYZESELECT * FROM posts
WHERE created_at >= now() - INTERVAL'30 days'AND user_id = 512;
// SeqScan on posts — horrible
// After partitioning by date:
CREATETABLEposts_2024_01 (
CHECK (created_at >= '2024-01-01'AND created_at < '2024-02-01')
) INHERITS (posts);
CREATETABLEposts_2024_02 (
CHECK (created_at >= '2024-02-01'AND created_at < '2024-03-01')
) INHERITS (posts);
// Old partitions are just removed, no row-by-row delete
DROPTABLE posts_2023_01; -- Instant reclaim
Output
EXPLAIN ANALYZE output for single table:
Seq Scan on posts (cost=0.00..45000.00 rows=500000 width=120)
-> Index Scan using posts_2024_02_user_idx on posts_2024_02
Execution Time: 12.847 ms
Senior Shortcut:
Set up pg_partman to auto-create future partitions and auto-drop old ones. It's cron-based partition management that saves you from forgetting to create next month's partition during a holiday weekend.
Key Takeaway
A single table over 50M rows will kill your performance. Partition proactively when you hit 10M rows to avoid dead queries, bloated indexes, and marathon backups.
Real-World Examples — Where Partitioning Saves Your Weekend
Theory is cheap. Let's talk about the day your monitoring dashboard grinds to a halt because the events table has 500 million rows and counting. Without partitioning, every query is a full table scan. Your pager goes off. Your Saturday is ruined.
Range partitioning by timestamp is the fix. Split that events table by month. Queries against the last 30 days hit one partition — not 500 million rows. The WHERE created_at BETWEEN '2025-01-01' AND '2025-01-31' prunes everything else. Suddenly that dashboard loads in 200ms instead of timing out.
Hash partitioning solves the hot-spot problem. Imagine a user_sessions table where 10% of users generate 90% of writes. A hash on user_id spreads those hot rows across partitions. No single disk queue. No I/O bottleneck. Your write throughput triples without touching hardware.
List partitioning is for categorical isolation. Think regional data. orders with a region column: NA, EU, APAC. Each region gets its own partition. Compliance audits query only EU. Maintenance rebuilds only NA. You stop paying for operations you don't need.
RealWorldPartitioning.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
// io.thecodeforge — database tutorial
CREATETABLEevents (
event_id BIGSERIAL,
created_at TIMESTAMPNOTNULL,
payload JSONB
) PARTITIONBYRANGE (created_at);
CREATETABLE events_2025_01 PARTITIONOF events
FORVALUESFROM ('2025-01-01') TO ('2025-02-01');
CREATETABLE events_2025_02 PARTITIONOF events
FORVALUESFROM ('2025-02-01') TO ('2025-03-01');
-- Query against last 30 days prunes everything elseSELECTcount(*) FROM events
WHERE created_at >= '2025-01-15';
Output
count
-------
84321
(1 row)
Senior Shortcut:
Always create a DEFAULT partition during migration. Production will throw a row with a future timestamp. Without it, the INSERT fails and your pager goes off.
Key Takeaway
Partitioning without a real workload pattern is premature optimization. Profile first, partition second.
Disadvantages — The Debt You Don't See Coming
Partitioning isn't free. You're trading query speed for maintenance complexity. Every new partition is a new object in the catalog. PostgreSQL's pg_dump gets slower. Backups balloon. You can't just DROP TABLE a partition — you have to detach it first. Miss that step and your cleanup cronjob fails silently.
Cross-partition queries are a performance trap. If your WHERE clause doesn't match the partition key, the planner scans every partition. That's worse than a full table scan because now you're opening dozens of file descriptors. The optimizer doesn't warn you. You find out when the query takes 30 seconds.
Schema changes become a nightmare. ALTER TABLE on a partitioned table locks every partition. A DROP COLUMN on a 50-partition table blocks writes for minutes. Your team deploys a migration at 2 PM. Everything stalls. The DBA hates you.
Hash partitioning looks clean until you need to rebalance. Add a node? The hash modulus changes. Now every row moves. That's a full data rewrite. In production. While customers are ordering. You don't rebalance hash partitions. You recreate them.
Partitioning solves real problems. But it introduces real debt. You must maintain the partition scheme, monitor partition bloat, and automate partition creation. Ignore that and you're back to square one — with extra steps.
PartitionDebt.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
// io.thecodeforge — database tutorial
-- Cross-partition query: NO pruning, scans ALL partitionsSELECT * FROM events
WHERE payload->>'user_id' = '42';
-- No partition key in WHERE = disaster-- Schema change locks everythingALTERTABLE events DROPCOLUMN payload;
-- Blocks all writes across 12 partitions for 14 seconds-- Output from EXPLAIN (ANALYZE, BUFFERS)GatherWorkersPlanned: 2
-> ParallelAppend
-> SeqScan on events_2025_01
-> SeqScan on events_2025_02
-> SeqScan on events_2025_03
(12 partitions scanned — zero pruned)
Output
Planning Time: 3.456 ms
Execution Time: 28391.421 ms
Production Trap:
Never use hash partitioning on a key that will grow in cardinality. Adding a new partition requires modulus change. That's a full table rewrite. Use range by time instead — you can prep-create partitions.
Key Takeaway
Partitioning trades query speed for operational complexity. Budget time for partition maintenance or your team will pay the interest.
Partitioning in NoSQL Databases — Why It's Not Optional
NoSQL databases like Cassandra, MongoDB, and DynamoDB rely on partitioning to scale horizontally. Without it, you cannot distribute data across nodes. In Cassandra, partition keys determine data placement across the ring. A poor partition key choice creates hot nodes — one server handles all writes while others sit idle. MongoDB uses shard keys to split collections across shards. When the shard key lacks cardinality, data piles onto a single shard. NoSQL partitioning differs from SQL: you must design for partition-awareness from day one, not as an afterthought. The database does not automatically rebalance without cost. Understanding how your NoSQL system handles partition splits, token ranges, and node addition prevents production fires. Always test partition key choices with real workload patterns before deploying.
NoSQL-Partition-Key.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
// io.thecodeforge — database tutorial
// Cassandra: bad partition key — all writes to one node
CREATETABLEorders (
order_id UUID,
user_id INT,
product TEXT,
PRIMARYKEY ((user_id)) // single partition per user
);
// Better: high-cardinality composite partition key
CREATETABLEorders_v2 (
order_id UUID,
user_id INT,
order_date DATE,
product TEXT,
PRIMARYKEY ((user_id, order_date), order_id)
);
// MongoDB: shard key with low cardinality fails
// Use hashed shard key for uniform distribution
sh.shardCollection("shop.orders", { "_id": "hashed" });
MongoDB's default _id shard key with monotonically increasing values creates write bottlenecks. Always test with hashed shard keys or high-cardinality fields.
Key Takeaway
Design partition keys for high cardinality and uniform distribution — one bad key burns your NoSQL cluster.
Machine Learning Pipelines — Why Partitioning Is Your Silent Bottleneck
Machine learning pipelines ingest, transform, and train on data. When your training dataset lives in a single database table, full-table scans kill performance. Partitioning by date or region lets your pipeline prune irrelevant partitions before loading data. A daily partitioning scheme means your feature engineering job scans only today's partition, not the entire history. For time-series ML models, range partitioning on timestamp is mandatory — it cuts ETL time by 90%. Without it, your training job stalls waiting on disk I/O. Partitioning also enables parallel loading: Spark or Dask workers read separate partitions simultaneously. The hidden cost is partition management — stale partitions must be archived or dropped to keep query speed high. Integrate partition-aware queries into your pipeline logic. Do not assume the database handles this automatically.
ML-Partition-Pruning.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
// io.thecodeforge — database tutorial
-- Create range-partitioned table for ML training dataCREATETABLEtraining_events (
event_ts TIMESTAMP,
user_id INT,
feature_vector JSONB
) PARTITIONBYRANGE (event_ts);
CREATETABLE training_2025_01 PARTITIONOF training_events
FORVALUESFROM ('2025-01-01') TO ('2025-02-01');
-- ML pipeline query — scans only one partitionEXPLAINANALYZESELECT * FROM training_events
WHERE event_ts >= '2025-01-15'AND event_ts < '2025-01-16';
Output
// Partition pruning — scans 1 partition instead of 12
// Query time drops from 45s to 1.2s for daily feature extraction
Production Trap:
Never run SELECT * on an unpartitioned historical table in a pipeline. A single full scan can starve the database of memory and crash training jobs.
Key Takeaway
Partition your ML source tables by time — it cuts data loading by 90% and lets you parallelize feature extraction.
● Production incidentPOST-MORTEMseverity: high
The Missing Default Partition That Broke Batch Insert
Symptom
A batch insert for customer data from a newly acquired company in a country not previously in the list partition started failing with ORA-14400 (inserted partition key does not map to any partition). The pipeline retried indefinitely, causing data lag.
Assumption
The team assumed list partition would reject unknown values with a clear error. They thought the ETL would be stopped and fixed before data loss.
Root cause
The list partition was defined for specific country codes (US, UK, DE, FR). The new data came from 'JP' (Japan). No DEFAULT partition existed. The database rejected the rows, and the ETL's error handling wasn't configured to catch this specific error and route to a fallback.
Fix
Added a DEFAULT partition to catch any unmapped values. Then set up an alert on rows entering the DEFAULT partition so the team could add proper partitions proactively.
Key lesson
Always create a DEFAULT partition for LIST and RANGE partitioning to catch unexpected values.
Do not rely on application validation alone — the database should have a safety net.
Monitor DEFAULT partition size as a signal for missing partition definitions.
Production debug guideSymptom → Action for common partitioning failures4 entries
Symptom · 01
Query on partitioned table takes as long as full table scan
→
Fix
Check if WHERE clause includes the partition key. Run EXPLAIN PLAN to see if partition pruning occurred (e.g., 'PARTITION RANGE SINGLE' vs 'PARTITION RANGE ALL'). Ensure index on partition key.
Symptom · 02
Insert fails with partition key mapping error
→
Fix
Verify the inserted value matches defined partition boundaries. Check for missing DEFAULT partition. Use SELECT * FROM ALL_TAB_PARTITIONS to see existing partitions.
Symptom · 03
Query performance degrades over time despite partitioning
→
Fix
Check for partition skew — if one partition grows much larger than others (e.g., 'hot' partition in range partitioning). Use SELECT partition_name, num_rows FROM ALL_TAB_PARTITIONS to compare sizes. Consider splitting the hot partition or rebalancing hash partitions.
Symptom · 04
Maintenance (VACUUM, ANALYZE) takes too long on partitioned table
→
Fix
Perform maintenance on each partition individually instead of the parent table. Use concurrent VACUUM on child tables. Set autovacuum tuning per partition size.
★ Partitioning Quick Debug Cheat SheetImmediate commands to diagnose partitioning issues in PostgreSQL
Query not using partition pruning−
Immediate action
Run EXPLAIN (ANALYZE, BUFFERS) on the query.
Commands
EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM orders WHERE order_date >= '2025-01-01';
Check if 'Subplans Removed' shows 0 — indicates no pruning.
Fix now
Add a WHERE clause on the partition key (order_date) with a sargable condition. Ensure index exists on partition key.
Insert into partitioned table fails+
Immediate action
Check the insertion value against existing partitions.
Commands
SELECT tableoid::regclass, count(*) FROM parent_table GROUP BY tableoid;
SELECT * FROM information_schema.table_partitions WHERE table_catalog = 'mydb' AND table_name = 'parent_table';
Fix now
Add DEFAULT partition: CREATE TABLE default_part PARTITION OF parent_table DEFAULT;
Partition size imbalance+
Immediate action
Identify size distribution across partitions.
Commands
SELECT relname, pg_total_relation_size(oid) FROM pg_class WHERE relkind = 'r' AND relname LIKE 'orders_%' ORDER BY pg_total_relation_size DESC;
For hash partitions, check hash distribution: SELECT (hashvalue % modulus) FROM ...
Fix now
Rebalance hash partitions by rebuilding with different modulus, or split an overgrown range partition.
Type
Split By
Best For
Weakness
Range
Value ranges (date, ID)
Time-series, historical archival
Hot partition if writes cluster at range boundary
Hash
Hash of column value
Even distribution, high-concurrency writes
Range-based queries force a full scan of all partitions
List
Discrete values (country, status)
Regional data, status-based isolation
Unknown/New values fail unless a 'DEFAULT' partition exists
Composite
Combination of above (e.g. Range then Hash)
Massive multi-tenant SaaS platforms
Significant administrative and query planning overhead
Key takeaways
1
Partitioning is transparent
applications query the parent table and the database routes to partitions.
2
Partition pruning
queries with WHERE on the partition key skip irrelevant partitions entirely.
3
Dropping a partition is instantaneous
the best way to archive old data at scale.
4
Partitioning differs from sharding
partitions live on one database instance; sharding splits across multiple.
5
Always create an index on the partition key
otherwise partition pruning cannot work.
6
Always add a DEFAULT partition for list and range partitioning to catch unexpected values.
7
Hash partitioning solves write hotspots but kills range-query pruning.
Common mistakes to avoid
4 patterns
×
Missing DEFAULT partition in LIST or RANGE partitioning
Symptom
INSERT fails with ORA-14400 (or equivalent) when a value doesn't match any partition boundary. ETL pipelines halt.
Fix
Add a DEFAULT partition for LIST, and for RANGE ensure ranges cover all expected values or add a DEFAULT partition (PostgreSQL 12+ supports DEFAULT for RANGE). Use CREATE TABLE ... PARTITION OF parent DEFAULT;.
×
Using a non-sargable expression on partition key in WHERE clause
Symptom
Queries scan all partitions despite seemingly correct filtering, causing severe performance degradation.
Fix
Always use the raw column with direct comparisons. Avoid wrapping in functions like DATE_TRUNC, TO_CHAR, or CAST. Rewrite queries to use range conditions like col >= '2025-01-01' AND col < '2025-02-01'.
×
Not creating index on partition key
Symptom
Partition pruning may still occur but full scans of the relevant partition are slow without an index.
Fix
Create an index on the partition key (either on parent or on each child). PostgreSQL 12+ allows creating index on parent which propagates to children.
×
Assuming partitioning automatically improves all queries
Symptom
Queries without WHERE on partition key become slower due to overhead of managing multiple partition objects.
Fix
Ensure all critical queries include a filter on the partition key. If not possible, consider an alternative index strategy or a different partition scheme.
INTERVIEW PREP · PRACTICE MODE
Interview Questions on This Topic
Q01JUNIOR
Explain the concept of 'Partition Pruning'. How does the query optimizer...
Q02SENIOR
You have a table that stores logs for 5 years. Deleting old logs is taki...
Q03SENIOR
LeetCode Standard: Given a massive dataset of transactions, how would yo...
Q04SENIOR
When would you choose Hash partitioning over Range partitioning, and wha...
Q05SENIOR
Can a primary key in a partitioned table be any column, or does it have ...
Q01 of 05JUNIOR
Explain the concept of 'Partition Pruning'. How does the query optimizer use it to reduce I/O?
ANSWER
Partition pruning is the query optimizer's ability to eliminate entire partitions from the scan when the WHERE clause includes the partition key. The optimizer compares the query predicates against the metadata of each partition (bounds for range, list values, or hash ranges). Partitions that cannot contain matching rows are not scanned at all. This reduces I/O and CPU because fewer pages are read from disk. For example, a query with WHERE order_date BETWEEN '2025-01-01' AND '2025-01-31' on a table range-partitioned by month will only scan the partition for January 2025.
Q02 of 05SENIOR
You have a table that stores logs for 5 years. Deleting old logs is taking hours and locking the table. How would you redesign this using partitioning?
ANSWER
I would implement range partitioning on a date or timestamp column (e.g., event_time) with monthly or quarterly partitions. Then, instead of DELETE, I would detach or drop the oldest partition when data reaches 5 years old. For example, in PostgreSQL: ALTER TABLE logs DETACH PARTITION logs_2020_q1; which is a metadata-only operation that completes instantly. The detached partition can then be dropped or archived offline. This eliminates the long-running DELETE that accumulates vacuum debt and blocks concurrent queries.
Q03 of 05SENIOR
LeetCode Standard: Given a massive dataset of transactions, how would you partition it to optimize for queries that only look at the 'current month' vs. 'user history'?
ANSWER
For queries confined to current month, a range partition on transaction_date by month would prune to a single partition. For user history (queries needing all months for a specific user), range partitioning would scan all partitions unless the partition key is something like user_id. A composite partitioning strategy works: range on date for time-based pruning, and hash sub-partitioning on user_id for even distribution. This way, queries on current month scan only one range partition, and then within that partition, the hash on user_id can further narrow the scan if the query includes user_id. Alternatively, you could use a list of months but that's less flexible.
Q04 of 05SENIOR
When would you choose Hash partitioning over Range partitioning, and what are the trade-offs for range-based queries?
ANSWER
Choose hash partitioning when the workload is write-heavy with no natural range key. For example, a user_sessions table that receives 100K inserts per second with random user_id values. Hash spreads inserts evenly across partitions, avoiding the 'hot partition' problem that range would create (all writes to the latest range). The trade-off: any query that filters by a range on the partition key (e.g., last_seen BETWEEN ...) will scan all partitions because hash doesn't support pruning. You must accept that all queries must include the hash key with an equality predicate for optimal performance.
Q05 of 05SENIOR
Can a primary key in a partitioned table be any column, or does it have to include the partition key? Why?
ANSWER
In most databases (PostgreSQL, MySQL), the primary key must include the partition key. For example, in PostgreSQL, a unique constraint (including primary key) on a partitioned table must include the partition key columns. The reason is that the database needs to enforce uniqueness globally across all partitions. Without including the partition key, the database would have to check every partition to guarantee uniqueness, which is inefficient. By including the partition key, uniqueness can be enforced within a single partition (since other partitions cannot have the same partition key value). Some databases like MSSQL and Oracle have different rules but generally include the partition key.
01
Explain the concept of 'Partition Pruning'. How does the query optimizer use it to reduce I/O?
JUNIOR
02
You have a table that stores logs for 5 years. Deleting old logs is taking hours and locking the table. How would you redesign this using partitioning?
SENIOR
03
LeetCode Standard: Given a massive dataset of transactions, how would you partition it to optimize for queries that only look at the 'current month' vs. 'user history'?
SENIOR
04
When would you choose Hash partitioning over Range partitioning, and what are the trade-offs for range-based queries?
SENIOR
05
Can a primary key in a partitioned table be any column, or does it have to include the partition key? Why?
SENIOR
FAQ · 6 QUESTIONS
Frequently Asked Questions
01
What is the difference between partitioning and sharding?
Partitioning is a local operation: you split a table into pieces, but they all still live on the same physical server and share the same CPU/RAM. Sharding is a horizontal scaling strategy where data is distributed across multiple independent servers. Sharding increases total capacity, while partitioning primarily improves manageability and specific query performance.
Was this helpful?
02
Does partitioning improve query performance automatically?
No. Partitioning only improves performance if the query optimizer can use 'Partition Pruning'. If your query doesn't include the partition key in the WHERE clause, the database has to check every single partition, which is actually slower than querying a single large table due to the overhead of managing multiple file descriptors.
Was this helpful?
03
Can I partition an existing table with millions of rows?
Not directly. In most systems (like PostgreSQL or MySQL), you must create a new partitioned table, migrate the data using a script or 'INSERT INTO ... SELECT', and then swap the table names. This usually requires a maintenance window or a carefully managed background migration.
Was this helpful?
04
What is a 'Default Partition' and why do I need it?
A default partition acts as a 'catch-all'. If you are using List or Range partitioning and a row comes in that doesn't fit any defined range, the database will throw an error unless a default partition exists. It’s a safety net for unexpected data values.
Was this helpful?
05
How do I choose the partition key?
The partition key should be the column most commonly used in WHERE filters for critical queries. For time-series, that's usually a date or timestamp. For evenly distributed writes, a high-cardinality column like user_id. Avoid low-cardinality columns like gender; they produce too few partitions and lead to data skew.
Was this helpful?
06
Can I alter a partition after creation?
Yes, but operations are limited. You can add new partitions, detach, drop, or attach partitions. Changing the partition scheme of an existing table requires rebuilding the table. Some databases support splitting a partition (e.g., RANGE split) but it's database-specific. Online DDL tools can help reduce downtime.