One table stores all subtypes with a type discriminator column
Shared columns are the sweet spot; subtype-specific columns create NULL density
High NULL density (>35%) breaks index selectivity and forces sequential scans
STI excels when cross-type queries are frequent and subtype overlap >60%
The real danger: ORMs hide the NULLs until slow queries hit production
Migrate to Class Table Inheritance when subtype columns outnumber shared ones
Plain-English First
Imagine a hotel that uses one massive guest registry for every type of booking — standard rooms, suites, conference halls, and parking spots — but most columns on each row are left blank because a parking spot doesn't have a 'pillow preference.' That's Single Table Inheritance. You're storing fundamentally different things in one table by giving every possible attribute its own column, then just leaving the irrelevant ones empty. It works brilliantly when your 'bookings' are 90% the same. It becomes a disaster when your parking spot needs 40 columns that a hotel room would never touch.
A fintech startup I consulted for had a products table with 94 columns. Forty-six of them were NULL on every single row for two of their three product types. Their most critical report query — the one the CEO ran every morning — was doing a full sequential scan because the query planner had given up trying to use indexes on columns that were 70% NULL. That query took 11 seconds. The table had 8 million rows and was growing. That's what Single Table Inheritance looks like when nobody made a conscious decision to use it — it just… happened.
Single Table Inheritance (STI) is a pattern for storing a class hierarchy in one database table. Instead of splitting subtypes across multiple tables, you jam every attribute of every subtype into one table and add a type discriminator column to tell rows apart. Rails made it famous. ORMs love it because it maps cleanly to inheritance hierarchies in code. The problem is that most teams adopt it because their ORM made it the path of least resistance, not because they thought through what their data actually looks like at 10 million rows.
By the end of this article you'll be able to look at any inheritance problem in your schema and make a deliberate call: STI, Class Table Inheritance, or Concrete Table Inheritance — with specific criteria for each choice. You'll know exactly which query patterns STI accelerates, which ones it quietly poisons, and how to migrate away from it before it buries you.
What STI Actually Does to Your Table (And Why ORMs Hide the Ugly Truth)
Before you can decide whether STI is right for your schema, you need to see what it physically produces in the database — not in your ORM's pretty model layer.
The core mechanic is simple: one table, one type column, every column from every subtype lives side by side. When you write a CreditCardPayment row, the bank_transfer_reference column sits there as NULL. When you write a BankTransferPayment, the card_last_four column is NULL. The database has no idea these columns are meaningless for certain types — it just stores the NULLs faithfully and wastes the space.
The ORM makes this invisible. ActiveRecord, Hibernate, Entity Framework — they all filter by the type column automatically and present you clean model objects. You never see the NULLs. You never see the 60-column table with 35 sparse columns. That abstraction is exactly why teams walk into the trap. The schema is rotten but the code looks clean.
What actually goes wrong in production: NULL columns destroy index selectivity. A B-tree index on card_last_four where 60% of rows are NULL is nearly useless — the planner often skips it entirely and scans. Partial indexes fix this, but you have to know to add them, and most teams don't until the slow query log starts screaming.
PaymentSTISchema.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
-- io.thecodeforge — Database tutorial-- This is the STI table a typical payments service ends up with.-- Notice how many columns are NULL for specific payment types.-- This isn't hypothetical — I've seen this exact shape in three production codebases.CREATETABLEpayments (
id BIGSERIALPRIMARYKEY,
-- The discriminator column: the only thing STI *requires*-- Your ORM writes this automatically. Don't forget to index it.
payment_type VARCHAR(50) NOTNULL,
-- Columns shared by ALL payment types — this is STI's sweet spot.-- High overlap here is the green flag for using STI.
amount_cents INTEGERNOTNULL,
currency_code CHAR(3) NOTNULL,
status VARCHAR(20) NOTNULLDEFAULT'pending',
created_at TIMESTAMPTZNOTNULLDEFAULTNOW(),
updated_at TIMESTAMPTZNOTNULLDEFAULTNOW(),
customer_id BIGINTNOTNULLREFERENCEScustomers(id),
-- CreditCardPayment-only columns.-- NULL for every BankTransfer, CryptoPayment, and WireTransfer row.
card_last_four CHAR(4),
card_network VARCHAR(20), -- 'visa', 'mastercard', etc.
card_expiry_month SMALLINT,
card_expiry_year SMALLINT,
card_fingerprint VARCHAR(64), -- tokenised card identifier
stripe_charge_id VARCHAR(64),
-- BankTransferPayment-only columns.-- NULL for every CreditCard, Crypto, and Wire row.
bank_account_last_four CHAR(4),
bank_routing_number VARCHAR(20),
ach_trace_number VARCHAR(15),
bank_name VARCHAR(100),
-- CryptoPayment-only columns.-- NULL for everything else.
blockchain_network VARCHAR(30), -- 'ethereum', 'bitcoin', etc.
wallet_address VARCHAR(100),
transaction_hash VARCHAR(128),
confirmation_count INTEGER,
-- WireTransfer-only columns.
swift_code CHAR(11),
iban VARCHAR(34),
beneficiary_bank_name VARCHAR(100),
correspondent_bank_bic CHAR(11)
);
-- The discriminator index is non-negotiable.-- Without it, every polymorphic query scans the full table.CREATEINDEX idx_payments_type
ONpayments (payment_type);
-- Composite index for the most common query pattern:-- "show me all pending credit card payments for this customer"CREATEINDEX idx_payments_customer_type_status
ONpayments (customer_id, payment_type, status);
-- Partial index — this is the move that saves you when columns are sparse.-- Standard index on card_fingerprint would be ~60% NULL, near-useless.-- This index ONLY indexes rows where the column is actually populated.CREATEINDEX idx_payments_card_fingerprint
ONpayments (card_fingerprint)
WHERE payment_type = 'credit_card'AND card_fingerprint ISNOTNULL;
-- Same pattern for crypto transaction lookups.CREATEINDEX idx_payments_txn_hash
ONpayments (transaction_hash)
WHERE payment_type = 'crypto'AND transaction_hash ISNOTNULL;
-- Now let's see what the data actually looks like at rest.-- Run this after inserting a mix of payment types and you'll-- immediately see the NULL sprawl that STI produces.INSERTINTOpayments (
payment_type, amount_cents, currency_code, status, customer_id,
card_last_four, card_network, card_expiry_month, card_expiry_year,
card_fingerprint, stripe_charge_id
) VALUES (
'credit_card', 4999, 'USD', 'captured', 1001,
'4242', 'visa', 12, 2027,
'fp_abc123xyz', 'ch_3OqExample'
);
INSERTINTOpayments (
payment_type, amount_cents, currency_code, status, customer_id,
bank_account_last_four, bank_routing_number, ach_trace_number, bank_name
) VALUES (
'bank_transfer', 150000, 'USD', 'pending', 1002,
'6789', '021000021', '20240115123456789', 'JPMorgan Chase'
);
INSERTINTOpayments (
payment_type, amount_cents, currency_code, status, customer_id,
blockchain_network, wallet_address, transaction_hash, confirmation_count
) VALUES (
'crypto', 89900, 'USD', 'confirming', 1003,
'ethereum',
'0xAbC123dEf456GhI789jKl012MnO345PqR678StU',
'0x9f86d081884c7d659a2feaa0c55ad015a3bf4f1b2b0b822cd15d6c15b0f00a08',
3
);
-- This query exposes the NULL sprawl visually.-- In a real schema audit, this is the first thing I run.SELECT
payment_type,
COUNT(*) AS total_rows,
COUNT(card_last_four) AS credit_card_col_populated,
COUNT(bank_account_last_four) AS bank_col_populated,
COUNT(transaction_hash) AS crypto_col_populated,
COUNT(swift_code) AS wire_col_populated,
-- This ratio is your STI health metric.-- If it drops below 0.4 for any subtype, you should reconsider STI.ROUND(
COUNT(card_last_four)::NUMERIC / NULLIF(COUNT(*), 0),
2
) AS credit_card_col_fill_ratio
FROM payments
GROUPBY payment_type
ORDERBY total_rows DESC;
Run the fill-ratio query above on any STI table you inherit. If any subtype-specific column has a fill ratio below 0.40 across the full table, your indexes on those columns are working against you. The PostgreSQL query planner uses a NULL fraction estimate from pg_stats — once it exceeds ~35%, it often ignores the index entirely. Fix it with partial indexes using WHERE type = 'x' AND column IS NOT NULL, exactly as shown above.
Production Insight
Indexes on columns with >35% NULL density are worse than useless—they waste disk space and confuse the planner.
Always measure null_frac during schema reviews; if it's high, drop the index and create a partial one matching the subtype.
The fix: query pg_stats before every major release.
Key Takeaway
NULL density is the silent killer of index performance.
Measure it before you trust any index on an STI table.
Partial indexes are your only defense—use them or expect sequential scans.
The Three Criteria That Tell You STI Is the Right Call
Stop deciding based on whether your ORM supports STI. Decide based on your actual data shape. There are three concrete tests. Pass all three and STI is a solid choice. Fail any one and you're building technical debt.
Test 1: The overlap ratio. Count the columns shared by ALL subtypes. Divide by total columns in the table. If that ratio is below 0.6, you're going to end up with a sparse, NULL-heavy table that hurts performance and confuses every developer who opens it cold. The payments example above barely passes — shared columns are amount_cents, currency_code, status, created_at, updated_at, customer_id — 6 out of roughly 25. That's 0.24. That table shouldn't be STI. It got there by accident.
Test 2: Are subtypes ever queried together? STI's genuine superpower is polymorphic queries — 'give me all payments for this customer regardless of type,' or 'show me everything in the queue sorted by created_at.' If you never need to query across subtypes in a single result set, you get none of STI's benefits and all of its costs. Separate tables would be cleaner.
Test 3: Do subtypes have meaningfully different behaviour, or just slightly different data? If your AdminUser and RegularUser subtypes have the same 12 columns and differ only in a role field, that's not an inheritance problem — that's just a column. Don't reach for STI when a type or role column in a single clean table solves it without the ORM ceremony.
ContentItemSTI.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
-- io.thecodeforge — Database tutorial-- Here's a schema where STI EARNS its place.-- A CMS content system: Articles, Videos, and Podcasts.-- The overlap is high (~75%), the polymorphic query pattern is constant,-- and the subtype-specific columns are few.CREATETABLEcontent_items (
id BIGSERIALPRIMARYKEY,
-- Discriminator. Always VARCHAR, never an integer enum ---- string values make the data self-documenting in the DB.
content_type VARCHAR(30) NOTNULL,
-- ~~~ SHARED COLUMNS: 8 out of 11 total = 73% overlap ~~~-- This is the green-flag ratio. STI makes sense here.
title VARCHAR(255) NOTNULL,
slug VARCHAR(255) NOTNULLUNIQUE,
author_id BIGINTNOTNULLREFERENCESauthors(id),
published_at TIMESTAMPTZ,
status VARCHAR(20) NOTNULLDEFAULT'draft',
excerpt TEXT,
thumbnail_url VARCHAR(512),
view_count INTEGERNOTNULLDEFAULT0,
-- ~~~ SUBTYPE-SPECIFIC: only 3 columns, all sparse ~~~-- Article-only: NULL for videos and podcasts.
reading_time_minutes SMALLINT,
-- Video-only: NULL for articles and podcasts.
video_duration_seconds INTEGER,
-- Podcast-only: NULL for articles and videos.
audio_duration_seconds INTEGER
);
-- Composite index for the editorial dashboard query:-- "show me all published content, newest first, for this author"CREATEINDEX idx_content_author_status_published
ONcontent_items (author_id, status, published_at DESC)
WHERE status = 'published';
-- This is the polymorphic query STI was born for.-- Try doing this cleanly with three separate tables — you need a UNION.-- With STI it's a single fast index scan.SELECT
id,
content_type,
title,
slug,
published_at,
view_count
FROM content_items
WHERE
author_id = 42AND status = 'published'ORDERBY published_at DESCLIMIT20;
-- Now here's why the polymorphic query advantage is real.-- Compare the STI approach above to what you'd need with separate tables:-- WITH SEPARATE TABLES (Class Table Inheritance) — same query:SELECT id, 'article'AS content_type, title, slug, published_at, view_count
FROM articles
WHERE author_id = 42AND status = 'published'UNIONALLSELECT id, 'video', title, slug, published_at, view_count
FROM videos
WHERE author_id = 42AND status = 'published'UNIONALLSELECT id, 'podcast', title, slug, published_at, view_count
FROM podcasts
WHERE author_id = 42AND status = 'published'ORDERBY published_at DESCLIMIT20;
-- Three index scans, a sort across merged results, and a LIMIT applied last.-- At 5M rows across three tables, this hurts. The STI version does not.-- Querying a specific subtype is equally clean with STI.-- The ORM does this automatically — here's the raw SQL it generates:SELECT
id,
title,
slug,
video_duration_seconds
FROM content_items
WHERE
content_type = 'video'AND status = 'published'ORDERBY published_at DESC;
Output
-- Polymorphic query result (sample):
id | content_type | title | slug | published_at | view_count
198 | gRPC vs REST in 2024 | grpc-vs-rest-2024| 2847
(1 row)
Senior Shortcut: The 60% Rule
Count your shared columns, divide by total columns in the proposed STI table. Below 0.6, don't use STI — use Class Table Inheritance (shared columns in a base table, subtype-specific columns in child tables joined by FK). The CMS example above sits at 73% and it's a clean fit. The payments example sits at 24% and it's a mess. That number tells you everything.
Production Insight
The 60% rule is a rule of thumb, but real pain starts at 50%—by then, every new subtype drags down the signal.
I've seen teams ignore the ratio until the table had 15 subtypes and 100 columns; the slow query log became unreadable.
Automate the calculation in your CI pipeline: fail the build if the overlap ratio drops below 0.55.
Key Takeaway
Overlap ratio is your early warning system.
Below 60%? You're accumulating debt with every new subtype.
Stop adding columns to the same table—split out the divergent ones.
Where STI Breaks in Production and How It Does It Quietly
STI doesn't fail loudly. It doesn't throw an error. It degrades — slowly, then suddenly. Here's the exact failure sequence I've watched happen twice.
A team starts with two subtypes, 70% overlap — perfect for STI. Eighteen months later they've added four more subtypes because the ORM made it so easy. Each new subtype adds 8-12 columns. The overlap ratio has dropped from 70% to 31%. Nobody noticed because the app still works. Queries are just… slower. The slow query log starts filling up. Someone adds a composite index, it helps a bit, the team moves on. Then the table hits 50 million rows and the ALTER TABLE to add a column for the new subtype everyone just requested locks the table for 4 minutes during business hours.
The ALTER TABLE lock is the hard wall. In PostgreSQL, adding a column with a DEFAULT value before PG11 rewrote every row — a full table rewrite. Even in PG11+ where nullable columns without defaults are instant, adding a NOT NULL column or one with a non-trivial default still triggers a full rewrite. On a 50M-row STI table, that's a production outage. In MySQL, ALTER TABLE takes an exclusive metadata lock regardless. I watched a team sit through a 7-minute write outage on their vehicles STI table because they needed to add electric_motor_type for a new EV subtype. Seven minutes. On a Monday morning.
The other quiet killer is schema coupling. Every subtype's columns live in the same migration file context. Adding a column for one subtype forces a migration that touches the table all subtypes use. Your Article team's migration can block a deploy for your Video team.
STIBreakingPoint.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
-- io.thecodeforge — Database tutorial-- Diagnosing a degrading STI table in PostgreSQL.-- Run this against any STI table you've inherited to get a full health report.-- Step 1: Measure actual NULL density per column.-- This tells you exactly which columns are poisoning your index selectivity.SELECT
a.attname AS column_name,
s.null_frac AS null_fraction,
s.n_distinct,
-- A null_frac above 0.35 on an indexed column is a red flag.-- The planner may skip the index entirely.CASEWHEN s.null_frac > 0.35THEN'HIGH NULL DENSITY — check for partial index'WHEN s.null_frac > 0.15THEN'MODERATE — monitor'ELSE'OK'ENDAS index_risk_assessment
FROM
pg_attribute a
JOIN pg_stats s ON s.attname = a.attname
AND s.tablename = 'content_items'JOIN pg_class c ON c.oid = a.attrelid
WHERE
c.relname = 'content_items'AND a.attnum > 0-- exclude system columnsANDNOT a.attisdropped -- exclude dropped columnsORDERBY s.null_frac DESC;
-- Step 2: Measure table bloat from NULL storage.-- Postgres stores NULLs efficiently via a null bitmap, but this tells you-- your ACTUAL average row width vs what it would be with separate tables.SELECT
relname AS table_name,
pg_size_pretty(pg_total_relation_size(oid)) AS total_size,
pg_size_pretty(pg_relation_size(oid)) AS table_size,
pg_size_pretty(
pg_total_relation_size(oid)
- pg_relation_size(oid)
) AS index_size,
n_live_tup AS live_rows,
-- Bytes per row: high value on a sparse STI table = storage waste signalpg_relation_size(oid) / NULLIF(n_live_tup, 0) AS bytes_per_row
FROM
pg_stat_user_tables
WHERE
relname = 'content_items';
-- Step 3: The migration risk check.-- Before running ANY alter on a large STI table, estimate the blast radius.-- This shows you row count and current table size so you can calculate-- whether to use pg_repack, a shadow table migration, or a timed maintenance window.SELECT
content_type,
COUNT(*) AS row_count,
-- Proportion of the table this subtype owns.-- Useful for estimating migration lock duration.ROUND(COUNT(*) * 100.0 / SUM(COUNT(*)) OVER (), 2) AS pct_of_table
FROM content_items
GROUPBY content_type
ORDERBY row_count DESC;
-- Step 4: The escape hatch — migrating one subtype OUT of STI-- to its own table without downtime.-- Pattern: create new table, backfill, switch writes, drop STI columns.-- 4a. Create the extracted table — clean, no NULLs, no dead columns.CREATETABLEvideo_content (
id BIGINTPRIMARYKEYREFERENCEScontent_items(id),
video_duration_seconds INTEGERNOTNULL,
-- Add video-specific columns here freely now that they're isolated.
resolution VARCHAR(10),
codec VARCHAR(20),
cdn_asset_key VARCHAR(256)
);
-- 4b. Backfill from the STI table — run this in batches to avoid lock pressure.-- Don't do INSERT ... SELECT on 10M rows in one transaction. Use a loop.INSERTINTOvideo_content (id, video_duration_seconds)
SELECT
id,
COALESCE(video_duration_seconds, 0) -- handle any unexpected NULLs cleanlyFROM content_items
WHERE
content_type = 'video'AND video_duration_seconds ISNOTNULL;
-- 4c. Verify the backfill before you cut over.SELECT
(
SELECTCOUNT(*) FROM content_items WHERE content_type = 'video'
) AS sti_video_count,
(
SELECTCOUNT(*) FROM video_content
) AS extracted_video_count;
-- These two numbers must match before you touch application code.
Output
-- Step 1: NULL density report (example output for a degraded STI table)
Never Do This: ALTER TABLE on a Live STI Table Over 5M Rows
Adding a column with a DEFAULT to a PostgreSQL STI table over 5 million rows without a plan will lock your table for minutes. Use pg_repack for online table rewrites, or add the column as nullable first, backfill in batches using a WHERE id BETWEEN x AND y loop, then add the NOT NULL constraint separately with ALTER TABLE ... ALTER COLUMN ... SET NOT NULL (PG12+ validates without a full lock if you use SET NOT NULL after a CHECK constraint). The symptom you'll see without this: ERROR: deadlock detected across every write to that table while the migration runs.
Production Insight
The ALTER TABLE lock is the event horizon of STI—once you cross that threshold, every future schema change is painful.
I've seen teams spend weeks planning a simple column addition because the table was too big to alter online.
The real lesson: track table size vs subtype count; when the blast radius exceeds your maintenance window, it's time to split.
Key Takeaway
STI failures are quiet until they're not.
Monitor NULL density, overlap ratio, and table size as leading indicators.
If a column ALTER takes longer than 30 seconds on a test load, plan your migration now.
STI vs. Class Table Inheritance vs. Concrete Table: Choosing Without Regret
Once you've seen STI break, the temptation is to swear it off entirely. That's the wrong lesson. The real lesson is that inheritance mapping patterns are tools with specific jobs, not philosophies.
Class Table Inheritance (CTI) keeps shared columns in a base table and puts subtype-specific columns in child tables linked by primary key FK. It's normalised, NULL-free, and scales elegantly — but every polymorphic query pays a JOIN penalty. If you're querying across subtypes constantly, those JOINs add up. A feed that mixes content types, hitting CTI tables at 100 req/s with 3-table JOINs, will saturate your DB connection pool before an equivalent STI setup does.
Concrete Table Inheritance (CTI2 — yes, two different things share 'CTI' as an abbreviation, which is peak database naming) duplicates shared columns into fully independent tables per subtype. Zero JOINs, zero NULLs, zero polymorphic queries. You can't easily ask 'give me all items for this user regardless of type' without a UNION. It's the right call when your subtypes have divergent lifecycles — different retention policies, different archiving schedules, different sharding strategies.
The decision tree is actually short: high overlap + frequent cross-type queries = STI. High overlap + rare cross-type queries or large subtype-specific column sets = Class Table Inheritance. Subtypes that are fundamentally different things that happen to share a name = Concrete Table Inheritance. If you're still not sure, ask yourself: 'Would a DBA opening this table cold understand the relationship between these columns?' If the answer is no, STI is making your schema harder to reason about, not easier.
ClassTableInheritanceCMS.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
-- io.thecodeforge — Database tutorial-- Class Table Inheritance (CTI) — the right pattern when STI's-- NULL density is too high but you still need polymorphic queries.---- Same CMS domain as earlier, but let's say the video subtype has grown-- to 25 unique columns (transcoding metadata, CDN configs, chapters, etc.)-- That kills STI's overlap ratio. CTI is the correct move.-- Base table: ONLY the truly shared columns.-- Every subtype has a corresponding row here.CREATETABLEcontent_items_base (
id BIGSERIALPRIMARYKEY,
content_type VARCHAR(30) NOTNULL,
title VARCHAR(255) NOTNULL,
slug VARCHAR(255) NOTNULLUNIQUE,
author_id BIGINTNOTNULLREFERENCESauthors(id),
published_at TIMESTAMPTZ,
status VARCHAR(20) NOTNULLDEFAULT'draft',
excerpt TEXT,
thumbnail_url VARCHAR(512),
view_count INTEGERNOTNULLDEFAULT0,
created_at TIMESTAMPTZNOTNULLDEFAULTNOW()
);
-- Article subtype table: only article-specific columns.-- PK = FK to base table — one-to-one relationship enforced at DB level.CREATETABLEarticles (
id BIGINTPRIMARYKEYREFERENCEScontent_items_base(id)
ONDELETECASCADE,
body_html TEXTNOTNULL,
reading_time_minutes SMALLINT,
word_count INTEGER,
seo_meta_description VARCHAR(160)
);
-- Video subtype: now we can have 25 columns here without polluting the base table.CREATETABLEvideos (
id BIGINTPRIMARYKEYREFERENCEScontent_items_base(id)
ONDELETECASCADE,
video_duration_seconds INTEGERNOTNULL,
resolution VARCHAR(10),
codec VARCHAR(20),
cdn_asset_key VARCHAR(256) NOTNULL,
transcoding_status VARCHAR(20) NOTNULLDEFAULT'pending',
chapter_count SMALLINT,
has_captions BOOLEANNOTNULLDEFAULTFALSE,
-- ... 17 more columns that would have DESTROYED an STI table's NULL ratio
storage_bytes BIGINT
);
-- Podcast subtype.CREATETABLEpodcasts (
id BIGINTPRIMARYKEYREFERENCEScontent_items_base(id)
ONDELETECASCADE,
audio_duration_seconds INTEGERNOTNULL,
episode_number SMALLINT,
season_number SMALLINT,
audio_cdn_key VARCHAR(256) NOTNULL,
transcript_url VARCHAR(512)
);
-- Index on base table for polymorphic queries.CREATEINDEX idx_content_base_author_status
ONcontent_items_base (author_id, status, published_at DESC)
WHERE status = 'published';
-- Polymorphic query in CTI — note the LEFT JOINs.-- This is the trade-off: it's two lines longer than the STI version-- and touches 4 tables instead of 1. But zero NULLs, clean schema.SELECT
b.id,
b.content_type,
b.title,
b.slug,
b.published_at,
b.view_count,
-- Coalesce across subtypes to get duration regardless of type.COALESCE(
v.video_duration_seconds,
p.audio_duration_seconds
) AS duration_seconds,
a.reading_time_minutes
FROM content_items_base b
LEFTJOIN articles a ON a.id = b.id
LEFTJOIN videos v ON v.id = b.id
LEFTJOIN podcasts p ON p.id = b.id
WHERE
b.author_id = 42AND b.status = 'published'ORDERBY b.published_at DESCLIMIT20;
-- Subtype-specific query in CTI — INNER JOIN, not LEFT JOIN.-- Slightly more explicit than STI but the video table is CLEAN.-- No NULLs from article or podcast columns.SELECT
b.id,
b.title,
v.video_duration_seconds,
v.transcoding_status,
v.has_captions,
v.storage_bytes
FROM content_items_base b
INNERJOIN videos v ON v.id = b.id
WHERE
b.status = 'published'AND v.transcoding_status = 'complete'ORDERBY b.published_at DESC;
Output
-- Polymorphic CTI query result:
id | content_type | title | slug | published_at | view_count | duration_seconds | reading_time_minutes
198 | gRPC vs REST in 2024 | 2847 | complete | t | 2147483648
(1 row)
Interview Gold: The JOIN Cost Is Real But Bounded
When a senior interviewer pushes back on CTI's JOIN penalty, the right answer is: 'It's a PK-to-PK equijoin on indexed columns — the planner executes it as a nested loop with a single index probe per row. At 10M rows, a CTI polymorphic query with 3 LEFT JOINs typically runs in 8-15ms with warm cache, versus 3-5ms for equivalent STI. That 10ms is usually irrelevant. What's not irrelevant is the ALTER TABLE lock you avoid when the video team needs 6 new columns.' Trade-offs, not absolutes.
Production Insight
The JOIN cost of CTI is real but bounded—at scale, the bottleneck is almost never the JOIN itself but the network round trips.
I've seen CTI outperform STI on complex read patterns because the base table stays small and index-friendly.
The true cost of STI is the lock you take when adding columns, not the query time.
Key Takeaway
Forget which pattern is 'best'.
Match the pattern to your data's overlap ratio and query patterns.
STI: high overlap + cross-type queries. CTI: moderate overlap. Concrete: little to no overlap.
Monitoring and Automating STI Health: The CI/CD Check You're Missing
The worst STI problems are the ones you don't see until they're expensive to fix. The solution is to measure the key metrics continuously and fail the build when they cross thresholds. Here's a practical automated health check that runs as part of your database migration pipeline.
The three numbers to track: overlap ratio (shared columns / total columns), average NULL density on indexed columns, and total table size. Add a fourth: the number of subtypes. As subtypes grow, the risk of drift increases. A simple script can query information_schema.columns and pg_stats (or equivalent on other databases) and compare against defined thresholds. If the overlap ratio drops below 55%, the pipeline should flag it for review. If any indexed column has null_frac > 0.35, generate a warning with a suggested partial index. If the table exceeds 10 million rows and has more than 5 subtypes, recommend migration planning.
This isn't hypothetical – I've implemented this check in a CI/CD pipeline for a client who had three different STI tables. The first build after adding the check failed because one table had an overlap ratio of 18%. The team had no idea. They scheduled a migration to CTI over the next sprint, saving themselves from the inevitable ALTER TABLE lock six months later.
STIHealthCheck.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
-- io.thecodeforge — Database tutorial-- PostgreSQL function to assess STI table health.-- Returns warning messages if thresholds are exceeded.-- Call this in a CI step after migrations.CREATEORREPLACEFUNCTION io.thecodeforge.sti_health_check(table_name TEXT)
RETURNSTABLE(metric TEXT, value NUMERIC, threshold NUMERIC, status TEXT, recommendation TEXT) AS $$
DECLARE
total_columns INT;
shared_columns INT;
overlap_ratio NUMERIC;
row_count BIGINT;
subtype_count INT;
rec RECORD;
BEGIN-- Get total columns (excluding system columns and the type discriminator)SELECTCOUNT(*) INTO total_columns
FROM information_schema.columns
WHERE table_name = $1AND column_name NOTIN ('id', 'type', 'created_at', 'updated_at');
-- Count shared columns (columns where null_frac < 0.1 across all subtypes?)-- Simplified: count columns that are NOT NULL for the most common subtypeSELECTCOUNT(*) INTO shared_columns
FROM pg_stats
WHERE tablename = $1AND null_frac < 0.05; -- shared columns are almost never NULL
overlap_ratio := shared_columns::NUMERIC / NULLIF(total_columns, 0);
-- Row countSELECT n_live_tup INTO row_count
FROM pg_stat_user_tables
WHERE relname = $1;
-- Subtype count (from discriminator column)EXECUTEformat('SELECT COUNT(DISTINCT %I) FROM %I',
(SELECT column_name FROM information_schema.columns
WHERE table_name = $1AND column_name IN ('type', 'payment_type', 'content_type')
LIMIT1),
$1) INTO subtype_count;
-- Check overlap ratioIF overlap_ratio < 0.55THEN
metric := 'overlap_ratio';
value := overlap_ratio;
threshold := 0.55;
status := 'FAIL';
recommendation := 'Overlap ratio below 55%. Consider migrating to Class Table Inheritance.';
RETURNNEXT;
ENDIF;
-- Check row count + subtype countIF row_count > 10000000AND subtype_count > 5THEN
metric := 'table_size_and_subtypes';
value := row_count::NUMERIC;
threshold := 10000000;
status := 'WARN';
recommendation := 'Large STI table with many subtypes. Plan migration to CTI or Concrete Inheritance.';
RETURNNEXT;
ENDIF;
-- Check indexed columns with high NULL densityFOR rec INSELECT attname, null_frac
FROM pg_stats
WHERE tablename = $1AND null_frac > 0.35AND attname IN (
SELECT column_name
FROM pg_indexes
WHERE tablename = $1AND indexdef LIKE'%' || attname || '%'
)
LOOP
metric := 'null_density_' || rec.attname;
value := rec.null_frac;
threshold := 0.35;
status := 'FAIL';
recommendation := 'Index on ' || rec.attname || ' has >35% NULL density. Replace with a partial index using WHERE type = ''x'' AND ' || rec.attname || ' IS NOT NULL.';
RETURNNEXT;
ENDLOOP;
-- If no issues, return a single OK rowIFNOTFOUNDTHEN
metric := 'overall';
value := 1;
threshold := 0;
status := 'PASS';
recommendation := 'STI table appears healthy.';
RETURNNEXT;
ENDIF;
END;
$$ LANGUAGE plpgsql;
-- Usage example (add to CI after any schema change):-- SELECT * FROM io.thecodeforge.sti_health_check('payments');
Output
-- Example output for an unhealthy STI table:
metric | value | threshold | status | recommendation
overlap_ratio | 0.28 | 0.55 | FAIL | Overlap ratio below 55%. Consider migrating to Class Table Inheritance.
null_density_swift_code | 0.94 | 0.35 | FAIL | Index on swift_code has >35% NULL density. Replace with a partial index...
null_density_iban | 0.94 | 0.35 | FAIL | Index on iban has >35% NULL density. Replace with a partial index...
(3 rows)
Automate It: CI Integration
Add the STI health check function to your migration scripts. In your CI pipeline (GitHub Actions, GitLab CI, Jenkins), run a step like psql -c "SELECT * FROM io.thecodeforge.sti_health_check('payments');" after every migration. If any row has status 'FAIL', block the deploy and require a review. This catches drift before it hits production.
Production Insight
Without automation, STI health is invisible until a slow query alert wakes someone up.
Integrating this check into CI costs an hour to set up and saves weeks of incident response.
Thresholds should be strict early on; you can relax them after you understand your data's actual behavior.
Key Takeaway
Continuous monitoring is the only defense against STI decay.
Automate overlap ratio and NULL density checks in your pipeline.
A failing health check today is cheaper than a locked table tomorrow.
Migrating from STI to CTI: A Real Production Plan
You've run the health check, the overlap ratio is 0.31, the slow query log is piling up, and your team just requested two new subtypes. It's time to migrate. Here's a production-safe plan that moves one subtype at a time, without downtime.
Phase 1: Choose your first victim. Pick the subtype with the most unique columns—the one causing the most NULL density. For the CMS example, that's videos (25 unique columns). You'll extract it first.
Phase 2: Create the CTI base table if it doesn't exist. In many cases you'll already have a legacy STI table; you can reuse it as the base table by dropping subtype-specific columns later. But safer: create a new base table with only shared columns, then backfill.
Phase 3: Create child tables for each subtype you plan to extract. Use a one-to-one FK relationship. Backfill data in batches—never in one transaction. Use a loop with COMMIT every 5000 rows to keep locks short.
Phase 4: Switch application reads first. Deploy code that reads from the new child table for the extracted subtype. Write a comparison query that checks both sources for a day. When confidence is high, switch writes—either via trigger or application logic.
Phase 5: Drop the old subtype columns from the STI table. Use DROP COLUMN IF EXISTS in a maintenance window. The table is now smaller, index-friendly, and ready for its next life as a clean base table.
It sounds like a lot of steps, but each phase is reversible. The key is to never have a moment where both old and new structures must be consistent without a clear rollback path.
STItoCTIMigration.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
-- io.thecodeforge — Database tutorial-- Production migration script: extract 'video' subtype from STI to CTI.-- Run in a transaction for each batch, commit frequently.-- Step 0: Create base table if not exists (here we assume legacy STI table is the base)-- For this migration, we'll keep the base table and create a separate video_detail table.-- Step 1: Create child tableCREATETABLE io.thecodeforge.video_details (
id BIGINTPRIMARYKEYREFERENCES io.thecodeforge.content_items(id),
video_duration_seconds INTEGERNOTNULL,
resolution VARCHAR(10),
codec VARCHAR(20),
cdn_asset_key VARCHAR(256) NOTNULL,
transcoding_status VARCHAR(20) NOTNULLDEFAULT'pending',
has_captions BOOLEANNOTNULLDEFAULTFALSE,
storage_bytes BIGINT
);
-- Step 2: Backfill in batchesDO $$
DECLARE
batch_size INTEGER := 5000;
offset_val INTEGER := 0;
rows_inserted INTEGER;
BEGINLOOPINSERTINTO io.thecodeforge.video_details (
id, video_duration_seconds, resolution, codec, cdn_asset_key,
transcoding_status, has_captions, storage_bytes
)
SELECT
id,
COALESCE(video_duration_seconds, 0),
COALESCE(resolution, 'unknown'),
COALESCE(codec, 'unknown'),
COALESCE(cdn_asset_key, 'pending'),
COALESCE(transcoding_status, 'pending'),
COALESCE(has_captions, FALSE),
COALESCE(storage_bytes, 0)
FROM io.thecodeforge.content_items
WHERE content_type = 'video'ORDERBY id
LIMIT batch_size OFFSET offset_val;
GETDIAGNOSTICS rows_inserted = ROW_COUNT;
EXITWHEN rows_inserted = 0;
COMMIT;
offset_val := offset_val + batch_size;
RAISENOTICE'Inserted % rows, offset now %', rows_inserted, offset_val;
ENDLOOP;
END $$;
-- Step 3: Verify paritySELECT
(SELECTCOUNT(*) FROM io.thecodeforge.content_items WHERE content_type = 'video') AS sti_count,
(SELECTCOUNT(*) FROM io.thecodeforge.video_details) AS detail_count;
-- Step 4: Deploy code that reads from video_details for video content, writes to both.-- After a day of dual-writes with monitoring, drop video columns from STI table.-- Step 5: Drop columns (off-peak, with lock timeout set)SET lock_timeout = '30s';
ALTERTABLE io.thecodeforge.content_items DROPCOLUMN video_duration_seconds;
ALTERTABLE io.thecodeforge.content_items DROPCOLUMN resolution;
ALTERTABLE io.thecodeforge.content_items DROPCOLUMN codec;
ALTERTABLE io.thecodeforge.content_items DROPCOLUMN cdn_asset_key;
ALTERTABLE io.thecodeforge.content_items DROPCOLUMN transcoding_status;
ALTERTABLE io.thecodeforge.content_items DROPCOLUMN has_captions;
ALTERTABLE io.thecodeforge.content_items DROPCOLUMN storage_bytes;
Output
-- Batch insertion progress:
NOTICE: Inserted 5000 rows, offset now 5000
NOTICE: Inserted 5000 rows, offset now 10000
-- ...
NOTICE: Inserted 1124882 rows, offset now 1125000
-- Parity check:
sti_count | detail_count
-----------+--------------
1124882 | 1124882
(1 row)
The Off-Peak Rule
Never drop columns during business hours unless you have lock_timeout set and are prepared for a rollback. Use SET lock_timeout = '30s' before the ALTER. If the lock can't be acquired in 30 seconds, the statement errors and you can retry later without blocking your production writes.
Production Insight
Batched backfill is the only safe way to move millions of rows from STI to CTI—single-transaction INSERT...SELECT will lock the source table for minutes.
I've seen a 20M-row backfill take 4 seconds per batch of 5000, with zero impact on application queries.
The real risk is not the backfill speed but forgetting to COMMIT between batches.
Key Takeaway
Migrate one subtype at a time, in batches, with a commit after every 5000 rows.
Use parity checks before switching reads.
Never drop columns without a lock timeout—your users deserve better than a 4-minute outage.
● Production incidentPOST-MORTEMseverity: high
The 4 Minute ALTER TABLE Lock That Killed Monday Morning Deploys
Symptom
Deploys hung, payment processing stopped, and the error log showed 'ERROR: canceling statement due to conflict with recovery'. The team couldn't roll out a critical security patch because the ALTER TABLE blocked every write.
Assumption
The team assumed adding a nullable column with DEFAULT would be instant on PostgreSQL 11+. But the NOT NULL constraint combined with a non-trivial default triggered a full table rewrite, even though the column was added as nullable first. The real issue: the backfill after adding the column was slow, and then promoting to NOT NULL caused the lock.
Root cause
The STI table had 34 columns, 22 of which were NULL for most rows. Adding the column was quick, but the subsequent ALTER TABLE ... ALTER COLUMN SET NOT NULL required a full table scan and locked the table. The migration script ran during business hours without a maintenance window.
Fix
Use ADD COLUMN ... DEFAULT without NOT NULL, backfill in batches with UPDATE ... WHERE id BETWEEN x AND y, then add a CHECK constraint with NOT VALID and VALIDATE CONSTRAINT in separate transactions to avoid long locks. Alternatively, use pg_repack for online table rewrites.
Key lesson
Never ALTER TABLE a large STI table during business hours without a plan — the lock duration is unpredictable when NULL density is high.
Use partial indexes with WHERE type = 'x' AND column IS NOT NULL to avoid full table scans on subtype-specific queries.
Track the overlap ratio of your STI table in CI — if it drops below 0.5, schedule a migration to Class Table Inheritance before the next subtype addition.
Production debug guideSymptom → Action steps to diagnose and fix slow queries on STI tables.4 entries
Symptom · 01
Cross-type queries are slow despite indexes
→
Fix
Check EXPLAIN (ANALYZE, BUFFERS) for the polymorphic query. If you see Seq Scan on a table >1M rows, the planner is ignoring your index. Verify null_frac in pg_stats — any index with >0.35 null fraction is likely skipped.
Symptom · 02
Adding a new subtype causes unexpected application errors
→
Fix
Look for missing IS NOT NULL checks in application code. STI columns are nullable by design, but business rules may require them for certain subtypes. Add a CHECK constraint: CONSTRAINT chk_type_fields CHECK (type != 'new_type' OR new_col IS NOT NULL).
Symptom · 03
ALTER TABLE on STI table blocks writes for minutes
→
Fix
Use pg_repack to perform an online table rewrite. If not available, add columns as nullable first, backfill in batches (1000 rows per transaction), then add the NOT NULL constraint with NOT VALID and validate separately.
Symptom · 04
Query planner chooses sequential scan even on indexed columns
→
Fix
Drop and recreate the index as a partial index. Use CREATE INDEX CONCURRENTLY ... WHERE type = 'video' AND column IS NOT NULL. After creating, drop the original full-column index and verify the plan changes.
★ STI Health Check CommandsRun these SQL queries to assess whether your STI table is heading for trouble.
General performance decline−
Immediate action
Measure NULL density per column
Commands
SELECT attname, null_frac FROM pg_stats WHERE tablename = 'your_table' ORDER BY null_frac DESC LIMIT 10;
Check overlap ratio: SELECT (COUNT(*) FILTER (WHERE attname IN (shared_column_list)) * 100.0 / COUNT(*)) AS overlap_pct FROM pg_attribute WHERE attrelid = 'your_table'::regclass AND attnum > 0;
Fix now
Add partial indexes for columns with null_frac > 0.35: CREATE INDEX CONCURRENTLY idx_col ON your_table (col) WHERE type = 'specific_type' AND col IS NOT NULL;
Slow polymorphic queries+
Immediate action
Run EXPLAIN ANALYZE on a typical cross-type query
Commands
EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM your_table WHERE status = 'active' ORDER BY created_at DESC LIMIT 20;
Check if index on (type, status, created_at) is being used: `SELECT schemaname, tablename, indexname, idx_scan FROM pg_stat_user_indexes WHERE tablename = 'your_table';`
Fix now
Drop and recreate the composite index with a WHERE clause filtering on type if the NULL density is skewed.
Migrating a subtype out of STI+
Immediate action
Create child table and verify data parity
Commands
INSERT INTO child_table SELECT id, col1, col2 FROM your_table WHERE type = 'subtype';
SELECT 'STI' as source, count(*) FROM your_table WHERE type = 'subtype' UNION ALL SELECT 'Child', count(*) FROM child_table;
Fix now
Backfill in batches of 5000 rows with a WHILE loop and COMMIT each batch to avoid long transactions.
STI vs CTI vs Concrete Table Inheritance
Feature
Single Table Inheritance (STI)
Class Table Inheritance (CTI)
Concrete Table Inheritance
Table count
One table for all subtypes
One base table + one per subtype
One table per subtype
NULL density
High for subtype-specific columns
Zero (columns exist only where needed)
Zero
Polymorphic queries
Single table scan, no JOINs
Requires LEFT JOIN on base + all subtypes
Requires UNION ALL
Schema coupling
High — all subtypes share one table
Moderate — base table shared, children isolated
Low — completely independent
ALTER TABLE risk
High — locks entire table for all subtypes
Low — only affects specific child table
Low — only affects one table
Best use case
High overlap (>60%), frequent cross-type queries
Moderate overlap, balanced query patterns
Low overlap, divergent lifecycles
Common mistakes to avoid
3 patterns
×
Using STI without measuring overlap ratio
Symptom
Table grows to 50+ columns, most NULL; indexes become bloated and queries slow down gradually.
Fix
Run the overlap ratio query (shared columns / total columns). If below 0.55, plan a migration to CTI. Automate the check in CI.
×
Adding a NOT NULL column with DEFAULT to a large STI table during business hours
Symptom
ALTER TABLE locks the table for minutes, causing write timeouts and application errors.
Fix
Add the column as nullable first, backfill in batches, then add the NOT NULL constraint with NOT VALID and validate separately. Use pg_repack for online rewrites.
×
Creating indexes on subtype-specific columns without a WHERE clause
Symptom
Indexes are 60%+ NULL entries; the query planner skips them and performs sequential scans.
Fix
Drop the full index and create a partial index: CREATE INDEX CONCURRENTLY ... WHERE type = 'video' AND column IS NOT NULL.