Homeβ€Ί Databaseβ€Ί Single Table Inheritance: When It Saves You and When It Destroys Your Schema

Single Table Inheritance: When It Saves You and When It Destroys Your Schema

Where developers are forged. Β· Structured learning Β· Free forever.
πŸ“ Part of: Database Design β†’ Topic 16 of 16
Single Table Inheritance done wrong fills your table with NULLs and kills query performance.
βš™οΈ Intermediate β€” basic Database knowledge assumed
In this tutorial, you'll learn:
  • The 60% rule: if fewer than 60% of your STI table's columns are shared by ALL subtypes, you don't have an inheritance problem β€” you have a normalisation problem. Use Class Table Inheritance.
  • STI's superpower is polymorphic queries β€” a single ORDER BY created_at across all subtypes without a UNION. If you're never querying across subtypes together, you're getting none of the benefit and all of the NULL sprawl.
  • The exact moment to migrate off STI: when your slowest ALTER TABLE would take longer than your deployment window. At that point the table has become a shared mutable state liability that couples every subtype's evolution. Start the extraction before you hit that wall, not after.
✦ Plain-English analogy ✦ Real code with output ✦ Interview questions
⚑ Quick Answer
Imagine a hotel that uses one massive guest registry for every type of booking β€” standard rooms, suites, conference halls, and parking spots β€” but most columns on each row are left blank because a parking spot doesn't have a 'pillow preference.' That's Single Table Inheritance. You're storing fundamentally different things in one table by giving every possible attribute its own column, then just leaving the irrelevant ones empty. It works brilliantly when your 'bookings' are 90% the same. It becomes a disaster when your parking spot needs 40 columns that a hotel room would never touch.

A fintech startup I consulted for had a products table with 94 columns. Forty-six of them were NULL on every single row for two of their three product types. Their most critical report query β€” the one the CEO ran every morning β€” was doing a full sequential scan because the query planner had given up trying to use indexes on columns that were 70% NULL. That query took 11 seconds. The table had 8 million rows and was growing. That's what Single Table Inheritance looks like when nobody made a conscious decision to use it β€” it just… happened.

Single Table Inheritance (STI) is a pattern for storing a class hierarchy in one database table. Instead of splitting subtypes across multiple tables, you jam every attribute of every subtype into one table and add a type discriminator column to tell rows apart. Rails made it famous. ORMs love it because it maps cleanly to inheritance hierarchies in code. The problem is that most teams adopt it because their ORM made it the path of least resistance, not because they thought through what their data actually looks like at 10 million rows.

By the end of this article you'll be able to look at any inheritance problem in your schema and make a deliberate call: STI, Class Table Inheritance, or Concrete Table Inheritance β€” with specific criteria for each choice. You'll know exactly which query patterns STI accelerates, which ones it quietly poisons, and how to migrate away from it before it buries you.

What STI Actually Does to Your Table (And Why ORMs Hide the Ugly Truth)

Before you can decide whether STI is right for your schema, you need to see what it physically produces in the database β€” not in your ORM's pretty model layer.

The core mechanic is simple: one table, one type column, every column from every subtype lives side by side. When you write a CreditCardPayment row, the bank_transfer_reference column sits there as NULL. When you write a BankTransferPayment, the card_last_four column is NULL. The database has no idea these columns are meaningless for certain types β€” it just stores the NULLs faithfully and wastes the space.

The ORM makes this invisible. ActiveRecord, Hibernate, Entity Framework β€” they all filter by the type column automatically and present you clean model objects. You never see the NULLs. You never see the 60-column table with 35 sparse columns. That abstraction is exactly why teams walk into the trap. The schema is rotten but the code looks clean.

What actually goes wrong in production: NULL columns destroy index selectivity. A B-tree index on card_last_four where 60% of rows are NULL is nearly useless β€” the planner often skips it entirely and scans. Partial indexes fix this, but you have to know to add them, and most teams don't until the slow query log starts screaming.

PaymentSTISchema.sql Β· SQL
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128
-- io.thecodeforge β€” Database tutorial

-- This is the STI table a typical payments service ends up with.
-- Notice how many columns are NULL for specific payment types.
-- This isn't hypothetical β€” I've seen this exact shape in three production codebases.

CREATE TABLE payments (
    id                      BIGSERIAL PRIMARY KEY,

    -- The discriminator column: the only thing STI *requires*
    -- Your ORM writes this automatically. Don't forget to index it.
    payment_type            VARCHAR(50)     NOT NULL,

    -- Columns shared by ALL payment types β€” this is STI's sweet spot.
    -- High overlap here is the green flag for using STI.
    amount_cents            INTEGER         NOT NULL,
    currency_code           CHAR(3)         NOT NULL,
    status                  VARCHAR(20)     NOT NULL DEFAULT 'pending',
    created_at              TIMESTAMPTZ     NOT NULL DEFAULT NOW(),
    updated_at              TIMESTAMPTZ     NOT NULL DEFAULT NOW(),
    customer_id             BIGINT          NOT NULL REFERENCES customers(id),

    -- CreditCardPayment-only columns.
    -- NULL for every BankTransfer, CryptoPayment, and WireTransfer row.
    card_last_four          CHAR(4),
    card_network            VARCHAR(20),        -- 'visa', 'mastercard', etc.
    card_expiry_month       SMALLINT,
    card_expiry_year        SMALLINT,
    card_fingerprint        VARCHAR(64),        -- tokenised card identifier
    stripe_charge_id        VARCHAR(64),

    -- BankTransferPayment-only columns.
    -- NULL for every CreditCard, Crypto, and Wire row.
    bank_account_last_four  CHAR(4),
    bank_routing_number     VARCHAR(20),
    ach_trace_number        VARCHAR(15),
    bank_name               VARCHAR(100),

    -- CryptoPayment-only columns.
    -- NULL for everything else.
    blockchain_network       VARCHAR(30),        -- 'ethereum', 'bitcoin', etc.
    wallet_address          VARCHAR(100),
    transaction_hash        VARCHAR(128),
    confirmation_count      INTEGER,

    -- WireTransfer-only columns.
    swift_code              CHAR(11),
    iban                    VARCHAR(34),
    beneficiary_bank_name   VARCHAR(100),
    correspondent_bank_bic  CHAR(11)
);

-- The discriminator index is non-negotiable.
-- Without it, every polymorphic query scans the full table.
CREATE INDEX idx_payments_type
    ON payments (payment_type);

-- Composite index for the most common query pattern:
-- "show me all pending credit card payments for this customer"
CREATE INDEX idx_payments_customer_type_status
    ON payments (customer_id, payment_type, status);

-- Partial index β€” this is the move that saves you when columns are sparse.
-- Standard index on card_fingerprint would be ~60% NULL, near-useless.
-- This index ONLY indexes rows where the column is actually populated.
CREATE INDEX idx_payments_card_fingerprint
    ON payments (card_fingerprint)
    WHERE payment_type = 'credit_card' AND card_fingerprint IS NOT NULL;

-- Same pattern for crypto transaction lookups.
CREATE INDEX idx_payments_txn_hash
    ON payments (transaction_hash)
    WHERE payment_type = 'crypto' AND transaction_hash IS NOT NULL;


-- Now let's see what the data actually looks like at rest.
-- Run this after inserting a mix of payment types and you'll
-- immediately see the NULL sprawl that STI produces.

INSERT INTO payments (
    payment_type, amount_cents, currency_code, status, customer_id,
    card_last_four, card_network, card_expiry_month, card_expiry_year,
    card_fingerprint, stripe_charge_id
) VALUES (
    'credit_card', 4999, 'USD', 'captured', 1001,
    '4242', 'visa', 12, 2027,
    'fp_abc123xyz', 'ch_3OqExample'
);

INSERT INTO payments (
    payment_type, amount_cents, currency_code, status, customer_id,
    bank_account_last_four, bank_routing_number, ach_trace_number, bank_name
) VALUES (
    'bank_transfer', 150000, 'USD', 'pending', 1002,
    '6789', '021000021', '20240115123456789', 'JPMorgan Chase'
);

INSERT INTO payments (
    payment_type, amount_cents, currency_code, status, customer_id,
    blockchain_network, wallet_address, transaction_hash, confirmation_count
) VALUES (
    'crypto', 89900, 'USD', 'confirming', 1003,
    'ethereum',
    '0xAbC123dEf456GhI789jKl012MnO345PqR678StU',
    '0x9f86d081884c7d659a2feaa0c55ad015a3bf4f1b2b0b822cd15d6c15b0f00a08',
    3
);


-- This query exposes the NULL sprawl visually.
-- In a real schema audit, this is the first thing I run.
SELECT
    payment_type,
    COUNT(*)                                                AS total_rows,
    COUNT(card_last_four)                                   AS credit_card_col_populated,
    COUNT(bank_account_last_four)                           AS bank_col_populated,
    COUNT(transaction_hash)                                 AS crypto_col_populated,
    COUNT(swift_code)                                       AS wire_col_populated,

    -- This ratio is your STI health metric.
    -- If it drops below 0.4 for any subtype, you should reconsider STI.
    ROUND(
        COUNT(card_last_four)::NUMERIC / NULLIF(COUNT(*), 0),
        2
    ) AS credit_card_col_fill_ratio
FROM payments
GROUP BY payment_type
ORDER BY total_rows DESC;
β–Ά Output
payment_type | total_rows | credit_card_col_populated | bank_col_populated | crypto_col_populated | wire_col_populated | credit_card_col_fill_ratio
---------------+------------+---------------------------+--------------------+----------------------+--------------------+----------------------------
credit_card | 1 | 1 | 0 | 0 | 0 | 1.00
bank_transfer | 1 | 0 | 1 | 0 | 0 | 0.00
crypto | 1 | 0 | 0 | 1 | 0 | 0.00
⚠️
Production Trap: The NULL Ratio ThresholdRun the fill-ratio query above on any STI table you inherit. If any subtype-specific column has a fill ratio below 0.40 across the full table, your indexes on those columns are working against you. The PostgreSQL query planner uses a NULL fraction estimate from pg_stats β€” once it exceeds ~35%, it often ignores the index entirely. Fix it with partial indexes using WHERE type = 'x' AND column IS NOT NULL, exactly as shown above.

The Three Criteria That Tell You STI Is the Right Call

Stop deciding based on whether your ORM supports STI. Decide based on your actual data shape. There are three concrete tests. Pass all three and STI is a solid choice. Fail any one and you're building technical debt.

Test 1: The overlap ratio. Count the columns shared by ALL subtypes. Divide by total columns in the table. If that ratio is below 0.6, you're going to end up with a sparse, NULL-heavy table that hurts performance and confuses every developer who opens it cold. The payments example above barely passes β€” shared columns are amount_cents, currency_code, status, created_at, updated_at, customer_id β€” 6 out of roughly 25. That's 0.24. That table shouldn't be STI. It got there by accident.

Test 2: Are subtypes ever queried together? STI's genuine superpower is polymorphic queries β€” 'give me all payments for this customer regardless of type,' or 'show me everything in the queue sorted by created_at.' If you never need to query across subtypes in a single result set, you get none of STI's benefits and all of its costs. Separate tables would be cleaner.

Test 3: Do subtypes have meaningfully different behaviour, or just slightly different data? If your AdminUser and RegularUser subtypes have the same 12 columns and differ only in a role field, that's not an inheritance problem β€” that's just a column. Don't reach for STI when a type or role column in a single clean table solves it without the ORM ceremony.

ContentItemSTI.sql Β· SQL
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899
-- io.thecodeforge β€” Database tutorial

-- Here's a schema where STI EARNS its place.
-- A CMS content system: Articles, Videos, and Podcasts.
-- The overlap is high (~75%), the polymorphic query pattern is constant,
-- and the subtype-specific columns are few.

CREATE TABLE content_items (
    id              BIGSERIAL PRIMARY KEY,

    -- Discriminator. Always VARCHAR, never an integer enum --
    -- string values make the data self-documenting in the DB.
    content_type    VARCHAR(30)     NOT NULL,

    -- ~~~ SHARED COLUMNS: 8 out of 11 total = 73% overlap ~~~
    -- This is the green-flag ratio. STI makes sense here.
    title           VARCHAR(255)    NOT NULL,
    slug            VARCHAR(255)    NOT NULL UNIQUE,
    author_id       BIGINT          NOT NULL REFERENCES authors(id),
    published_at    TIMESTAMPTZ,
    status          VARCHAR(20)     NOT NULL DEFAULT 'draft',
    excerpt         TEXT,
    thumbnail_url   VARCHAR(512),
    view_count      INTEGER         NOT NULL DEFAULT 0,

    -- ~~~ SUBTYPE-SPECIFIC: only 3 columns, all sparse ~~~

    -- Article-only: NULL for videos and podcasts.
    reading_time_minutes    SMALLINT,

    -- Video-only: NULL for articles and podcasts.
    video_duration_seconds  INTEGER,

    -- Podcast-only: NULL for articles and videos.
    audio_duration_seconds  INTEGER
);

-- Composite index for the editorial dashboard query:
-- "show me all published content, newest first, for this author"
CREATE INDEX idx_content_author_status_published
    ON content_items (author_id, status, published_at DESC)
    WHERE status = 'published';

-- This is the polymorphic query STI was born for.
-- Try doing this cleanly with three separate tables β€” you need a UNION.
-- With STI it's a single fast index scan.
SELECT
    id,
    content_type,
    title,
    slug,
    published_at,
    view_count
FROM content_items
WHERE
    author_id   = 42
    AND status  = 'published'
ORDER BY published_at DESC
LIMIT 20;


-- Now here's why the polymorphic query advantage is real.
-- Compare the STI approach above to what you'd need with separate tables:

-- WITH SEPARATE TABLES (Class Table Inheritance) β€” same query:
SELECT id, 'article' AS content_type, title, slug, published_at, view_count
FROM   articles
WHERE  author_id = 42 AND status = 'published'

UNION ALL

SELECT id, 'video', title, slug, published_at, view_count
FROM   videos
WHERE  author_id = 42 AND status = 'published'

UNION ALL

SELECT id, 'podcast', title, slug, published_at, view_count
FROM   podcasts
WHERE  author_id = 42 AND status = 'published'

ORDER BY published_at DESC
LIMIT 20;
-- Three index scans, a sort across merged results, and a LIMIT applied last.
-- At 5M rows across three tables, this hurts. The STI version does not.


-- Querying a specific subtype is equally clean with STI.
-- The ORM does this automatically β€” here's the raw SQL it generates:
SELECT
    id,
    title,
    slug,
    video_duration_seconds
FROM content_items
WHERE
    content_type = 'video'
    AND status   = 'published'
ORDER BY published_at DESC;
β–Ά Output
-- Polymorphic query result (sample):
id | content_type | title | slug | published_at | view_count
-----+--------------+---------------------------+------------------------+------------------------------+------------
201 | article | Redis Eviction Strategies | redis-eviction | 2024-11-03 09:15:00+00 | 14823
198 | video | gRPC vs REST in 2024 | grpc-vs-rest-2024 | 2024-10-28 14:00:00+00 | 8341
195 | podcast | Monolith to Microservices | monolith-microservices | 2024-10-20 07:30:00+00 | 3102
(3 rows)

-- Subtype-specific query result:
id | title | slug | video_duration_seconds
-----+----------------------+------------------+------------------------
198 | gRPC vs REST in 2024 | grpc-vs-rest-2024| 2847
(1 row)
⚠️
Senior Shortcut: The 60% RuleCount your shared columns, divide by total columns in the proposed STI table. Below 0.6, don't use STI β€” use Class Table Inheritance (shared columns in a base table, subtype-specific columns in child tables joined by FK). The CMS example above sits at 73% and it's a clean fit. The payments example sits at 24% and it's a mess. That number tells you everything.

Where STI Breaks in Production and How It Does It Quietly

STI doesn't fail loudly. It doesn't throw an error. It degrades β€” slowly, then suddenly. Here's the exact failure sequence I've watched happen twice.

A team starts with two subtypes, 70% overlap β€” perfect for STI. Eighteen months later they've added four more subtypes because the ORM made it so easy. Each new subtype adds 8-12 columns. The overlap ratio has dropped from 70% to 31%. Nobody noticed because the app still works. Queries are just… slower. The slow query log starts filling up. Someone adds a composite index, it helps a bit, the team moves on. Then the table hits 50 million rows and the ALTER TABLE to add a column for the new subtype everyone just requested locks the table for 4 minutes during business hours.

The ALTER TABLE lock is the hard wall. In PostgreSQL, adding a column with a DEFAULT value before PG11 rewrote every row β€” a full table rewrite. Even in PG11+ where nullable columns without defaults are instant, adding a NOT NULL column or one with a non-trivial default still triggers a full rewrite. On a 50M-row STI table, that's a production outage. In MySQL, ALTER TABLE takes an exclusive metadata lock regardless. I watched a team sit through a 7-minute write outage on their vehicles STI table because they needed to add electric_motor_type for a new EV subtype. Seven minutes. On a Monday morning.

The other quiet killer is schema coupling. Every subtype's columns live in the same migration file context. Adding a column for one subtype forces a migration that touches the table all subtypes use. Your Article team's migration can block a deploy for your Video team.

STIBreakingPoint.sql Β· SQL
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899
-- io.thecodeforge β€” Database tutorial

-- Diagnosing a degrading STI table in PostgreSQL.
-- Run this against any STI table you've inherited to get a full health report.

-- Step 1: Measure actual NULL density per column.
-- This tells you exactly which columns are poisoning your index selectivity.
SELECT
    a.attname                                           AS column_name,
    s.null_frac                                         AS null_fraction,
    s.n_distinct,
    -- A null_frac above 0.35 on an indexed column is a red flag.
    -- The planner may skip the index entirely.
    CASE
        WHEN s.null_frac > 0.35 THEN 'HIGH NULL DENSITY β€” check for partial index'
        WHEN s.null_frac > 0.15 THEN 'MODERATE β€” monitor'
        ELSE 'OK'
    END                                                 AS index_risk_assessment
FROM
    pg_attribute           a
    JOIN pg_stats          s ON s.attname = a.attname
                             AND s.tablename = 'content_items'
    JOIN pg_class          c ON c.oid = a.attrelid
WHERE
    c.relname   = 'content_items'
    AND a.attnum > 0          -- exclude system columns
    AND NOT a.attisdropped    -- exclude dropped columns
ORDER BY s.null_frac DESC;


-- Step 2: Measure table bloat from NULL storage.
-- Postgres stores NULLs efficiently via a null bitmap, but this tells you
-- your ACTUAL average row width vs what it would be with separate tables.
SELECT
    relname                                 AS table_name,
    pg_size_pretty(pg_total_relation_size(oid)) AS total_size,
    pg_size_pretty(pg_relation_size(oid))   AS table_size,
    pg_size_pretty(
        pg_total_relation_size(oid)
        - pg_relation_size(oid)
    )                                       AS index_size,
    n_live_tup                              AS live_rows,
    -- Bytes per row: high value on a sparse STI table = storage waste signal
    pg_relation_size(oid) / NULLIF(n_live_tup, 0) AS bytes_per_row
FROM
    pg_stat_user_tables
WHERE
    relname = 'content_items';


-- Step 3: The migration risk check.
-- Before running ANY alter on a large STI table, estimate the blast radius.
-- This shows you row count and current table size so you can calculate
-- whether to use pg_repack, a shadow table migration, or a timed maintenance window.
SELECT
    content_type,
    COUNT(*)                                AS row_count,
    -- Proportion of the table this subtype owns.
    -- Useful for estimating migration lock duration.
    ROUND(COUNT(*) * 100.0 / SUM(COUNT(*)) OVER (), 2) AS pct_of_table
FROM content_items
GROUP BY content_type
ORDER BY row_count DESC;


-- Step 4: The escape hatch β€” migrating one subtype OUT of STI
-- to its own table without downtime.
-- Pattern: create new table, backfill, switch writes, drop STI columns.

-- 4a. Create the extracted table β€” clean, no NULLs, no dead columns.
CREATE TABLE video_content (
    id                      BIGINT PRIMARY KEY REFERENCES content_items(id),
    video_duration_seconds  INTEGER NOT NULL,
    -- Add video-specific columns here freely now that they're isolated.
    resolution              VARCHAR(10),
    codec                   VARCHAR(20),
    cdn_asset_key           VARCHAR(256)
);

-- 4b. Backfill from the STI table β€” run this in batches to avoid lock pressure.
-- Don't do INSERT ... SELECT on 10M rows in one transaction. Use a loop.
INSERT INTO video_content (id, video_duration_seconds)
SELECT
    id,
    COALESCE(video_duration_seconds, 0) -- handle any unexpected NULLs cleanly
FROM content_items
WHERE
    content_type            = 'video'
    AND video_duration_seconds IS NOT NULL;

-- 4c. Verify the backfill before you cut over.
SELECT
    (
        SELECT COUNT(*) FROM content_items  WHERE content_type = 'video'
    ) AS sti_video_count,
    (
        SELECT COUNT(*) FROM video_content
    ) AS extracted_video_count;
-- These two numbers must match before you touch application code.
β–Ά Output
-- Step 1: NULL density report (example output for a degraded STI table)
column_name | null_fraction | n_distinct | index_risk_assessment
------------------------+---------------+------------+--------------------------------------------
swift_code | 0.94 | 12 | HIGH NULL DENSITY β€” check for partial index
iban | 0.94 | -0.8 | HIGH NULL DENSITY β€” check for partial index
transaction_hash | 0.88 | -0.9 | HIGH NULL DENSITY β€” check for partial index
wallet_address | 0.88 | -0.9 | HIGH NULL DENSITY β€” check for partial index
card_fingerprint | 0.61 | -0.7 | HIGH NULL DENSITY β€” check for partial index
video_duration_seconds | 0.33 | 42 | MODERATE β€” monitor
title | 0.00 | -0.6 | OK
status | 0.00 | 5 | OK
(8 rows)

-- Step 2: Table size report
table_name | total_size | table_size | index_size | live_rows | bytes_per_row
---------------+------------+------------+------------+-----------+---------------
content_items | 2341 MB | 1876 MB | 465 MB | 4823901 | 408
(1 row)

-- Step 3: Subtype distribution
content_type | row_count | pct_of_table
--------------+-----------+--------------
article | 3210540 | 66.57
video | 1124882 | 23.32
podcast | 488479 | 10.12
(3 rows)

-- Step 4c: Backfill verification
sti_video_count | extracted_video_count
-----------------+-----------------------
1124882 | 1124882
(1 row)
⚠️
Never Do This: ALTER TABLE on a Live STI Table Over 5M RowsAdding a column with a DEFAULT to a PostgreSQL STI table over 5 million rows without a plan will lock your table for minutes. Use pg_repack for online table rewrites, or add the column as nullable first, backfill in batches using a WHERE id BETWEEN x AND y loop, then add the NOT NULL constraint separately with ALTER TABLE ... ALTER COLUMN ... SET NOT NULL (PG12+ validates without a full lock if you use SET NOT NULL after a CHECK constraint). The symptom you'll see without this: ERROR: deadlock detected across every write to that table while the migration runs.

STI vs. Class Table Inheritance vs. Concrete Table: Choosing Without Regret

Once you've seen STI break, the temptation is to swear it off entirely. That's the wrong lesson. The real lesson is that inheritance mapping patterns are tools with specific jobs, not philosophies.

Class Table Inheritance (CTI) keeps shared columns in a base table and puts subtype-specific columns in child tables linked by primary key FK. It's normalised, NULL-free, and scales elegantly β€” but every polymorphic query pays a JOIN penalty. If you're querying across subtypes constantly, those JOINs add up. A feed that mixes content types, hitting CTI tables at 100 req/s with 3-table JOINs, will saturate your DB connection pool before an equivalent STI setup does.

Concrete Table Inheritance (CTI2 β€” yes, two different things share 'CTI' as an abbreviation, which is peak database naming) duplicates shared columns into fully independent tables per subtype. Zero JOINs, zero NULLs, zero polymorphic queries. You can't easily ask 'give me all items for this user regardless of type' without a UNION. It's the right call when your subtypes have divergent lifecycles β€” different retention policies, different archiving schedules, different sharding strategies.

The decision tree is actually short: high overlap + frequent cross-type queries = STI. High overlap + rare cross-type queries or large subtype-specific column sets = Class Table Inheritance. Subtypes that are fundamentally different things that happen to share a name = Concrete Table Inheritance. If you're still not sure, ask yourself: 'Would a DBA opening this table cold understand the relationship between these columns?' If the answer is no, STI is making your schema harder to reason about, not easier.

ClassTableInheritanceCMS.sql Β· SQL
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111
-- io.thecodeforge β€” Database tutorial

-- Class Table Inheritance (CTI) β€” the right pattern when STI's
-- NULL density is too high but you still need polymorphic queries.
--
-- Same CMS domain as earlier, but let's say the video subtype has grown
-- to 25 unique columns (transcoding metadata, CDN configs, chapters, etc.)
-- That kills STI's overlap ratio. CTI is the correct move.

-- Base table: ONLY the truly shared columns.
-- Every subtype has a corresponding row here.
CREATE TABLE content_items_base (
    id              BIGSERIAL PRIMARY KEY,
    content_type    VARCHAR(30)     NOT NULL,
    title           VARCHAR(255)    NOT NULL,
    slug            VARCHAR(255)    NOT NULL UNIQUE,
    author_id       BIGINT          NOT NULL REFERENCES authors(id),
    published_at    TIMESTAMPTZ,
    status          VARCHAR(20)     NOT NULL DEFAULT 'draft',
    excerpt         TEXT,
    thumbnail_url   VARCHAR(512),
    view_count      INTEGER         NOT NULL DEFAULT 0,
    created_at      TIMESTAMPTZ     NOT NULL DEFAULT NOW()
);

-- Article subtype table: only article-specific columns.
-- PK = FK to base table β€” one-to-one relationship enforced at DB level.
CREATE TABLE articles (
    id                      BIGINT PRIMARY KEY REFERENCES content_items_base(id)
                            ON DELETE CASCADE,
    body_html               TEXT            NOT NULL,
    reading_time_minutes    SMALLINT,
    word_count              INTEGER,
    seo_meta_description    VARCHAR(160)
);

-- Video subtype: now we can have 25 columns here without polluting the base table.
CREATE TABLE videos (
    id                      BIGINT PRIMARY KEY REFERENCES content_items_base(id)
                            ON DELETE CASCADE,
    video_duration_seconds  INTEGER         NOT NULL,
    resolution              VARCHAR(10),
    codec                   VARCHAR(20),
    cdn_asset_key           VARCHAR(256)    NOT NULL,
    transcoding_status      VARCHAR(20)     NOT NULL DEFAULT 'pending',
    chapter_count           SMALLINT,
    has_captions            BOOLEAN         NOT NULL DEFAULT FALSE,
    -- ... 17 more columns that would have DESTROYED an STI table's NULL ratio
    storage_bytes           BIGINT
);

-- Podcast subtype.
CREATE TABLE podcasts (
    id                      BIGINT PRIMARY KEY REFERENCES content_items_base(id)
                            ON DELETE CASCADE,
    audio_duration_seconds  INTEGER         NOT NULL,
    episode_number          SMALLINT,
    season_number           SMALLINT,
    audio_cdn_key           VARCHAR(256)    NOT NULL,
    transcript_url          VARCHAR(512)
);

-- Index on base table for polymorphic queries.
CREATE INDEX idx_content_base_author_status
    ON content_items_base (author_id, status, published_at DESC)
    WHERE status = 'published';


-- Polymorphic query in CTI β€” note the LEFT JOINs.
-- This is the trade-off: it's two lines longer than the STI version
-- and touches 4 tables instead of 1. But zero NULLs, clean schema.
SELECT
    b.id,
    b.content_type,
    b.title,
    b.slug,
    b.published_at,
    b.view_count,
    -- Coalesce across subtypes to get duration regardless of type.
    COALESCE(
        v.video_duration_seconds,
        p.audio_duration_seconds
    )                                   AS duration_seconds,
    a.reading_time_minutes
FROM content_items_base b
    LEFT JOIN articles  a ON a.id = b.id
    LEFT JOIN videos    v ON v.id = b.id
    LEFT JOIN podcasts  p ON p.id = b.id
WHERE
    b.author_id   = 42
    AND b.status  = 'published'
ORDER BY b.published_at DESC
LIMIT 20;


-- Subtype-specific query in CTI β€” INNER JOIN, not LEFT JOIN.
-- Slightly more explicit than STI but the video table is CLEAN.
-- No NULLs from article or podcast columns.
SELECT
    b.id,
    b.title,
    v.video_duration_seconds,
    v.transcoding_status,
    v.has_captions,
    v.storage_bytes
FROM content_items_base b
    INNER JOIN videos v ON v.id = b.id
WHERE
    b.status                = 'published'
    AND v.transcoding_status = 'complete'
ORDER BY b.published_at DESC;
β–Ά Output
-- Polymorphic CTI query result:
id | content_type | title | slug | published_at | view_count | duration_seconds | reading_time_minutes
-----+--------------+---------------------------+------------------------+-------------------------+------------+------------------+---------------------
201 | article | Redis Eviction Strategies | redis-eviction | 2024-11-03 09:15:00+00 | 14823 | NULL | 9
198 | video | gRPC vs REST in 2024 | grpc-vs-rest-2024 | 2024-10-28 14:00:00+00 | 8341 | 2847 | NULL
195 | podcast | Monolith to Microservices | monolith-microservices | 2024-10-20 07:30:00+00 | 3102 | 3600 | NULL
(3 rows)

-- Subtype-specific CTI query result:
id | title | video_duration_seconds | transcoding_status | has_captions | storage_bytes
-----+----------------------+------------------------+--------------------+--------------+---------------
198 | gRPC vs REST in 2024 | 2847 | complete | t | 2147483648
(1 row)
πŸ”₯
Interview Gold: The JOIN Cost Is Real But BoundedWhen a senior interviewer pushes back on CTI's JOIN penalty, the right answer is: 'It's a PK-to-PK equijoin on indexed columns β€” the planner executes it as a nested loop with a single index probe per row. At 10M rows, a CTI polymorphic query with 3 LEFT JOINs typically runs in 8-15ms with warm cache, versus 3-5ms for equivalent STI. That 10ms is usually irrelevant. What's not irrelevant is the ALTER TABLE lock you avoid when the video team needs 6 new columns.' Trade-offs, not absolutes.
Feature / AspectSingle Table Inheritance (STI)Class Table Inheritance (CTI)Concrete Table Inheritance
Polymorphic query costSingle table scan β€” fastestN-way LEFT JOIN β€” moderateUNION ALL across N tables β€” slowest
NULL column densityHigh when subtypes divergeZero β€” subtype columns isolatedZero β€” fully independent tables
Schema migrationsDangerous above 5M rows β€” table lock riskSubtype table isolated β€” low blast radiusFully isolated β€” zero cross-subtype impact
ORM supportNative in Rails, Hibernate, EF β€” trivial setupSupported but requires mapping configOften manual β€” ORM abstractions break down
Ideal subtype overlap ratioAbove 60% shared columns30-60% shared columnsBelow 30% β€” subtypes are truly different things
Index efficiencyDegrades with NULL density β€” partial indexes requiredFull selectivity β€” standard indexes work cleanlyFull selectivity per table
Cross-subtype sorting/paginationTrivial β€” single ORDER BYPossible via JOIN on base tableRequires UNION + application-level merge
Add new subtypeAdd columns to one table β€” risky at scaleAdd a new child table β€” zero impact on othersAdd a new fully independent table β€” zero impact
Subtype-specific constraints (NOT NULL)Impossible β€” columns must be nullable for other typesEnforced at child table level β€” cleanEnforced per table β€” clean
Best real-world fitUser roles, CMS content (low column divergence)E-commerce products, payment methods (moderate divergence)Events with different retention policies, multi-tenant schemas

🎯 Key Takeaways

  • The 60% rule: if fewer than 60% of your STI table's columns are shared by ALL subtypes, you don't have an inheritance problem β€” you have a normalisation problem. Use Class Table Inheritance.
  • STI's superpower is polymorphic queries β€” a single ORDER BY created_at across all subtypes without a UNION. If you're never querying across subtypes together, you're getting none of the benefit and all of the NULL sprawl.
  • The exact moment to migrate off STI: when your slowest ALTER TABLE would take longer than your deployment window. At that point the table has become a shared mutable state liability that couples every subtype's evolution. Start the extraction before you hit that wall, not after.
  • ORMs that make STI trivial to set up are lying to you by omission. The pattern looks clean in code because the ORM hides the NULL columns. Open your STI table in a raw SQL client and look at what's actually stored. That's the honest view of your schema.

⚠ Common Mistakes to Avoid

  • βœ•Mistake 1: Adding a NOT NULL column to a live STI table in production β€” Symptom: writers queue up, deploys hang, and you see 'ERROR: canceling statement due to conflict with recovery' on replicas β€” Fix: always add the column as nullable first, backfill in batches with UPDATE ... WHERE id BETWEEN x AND y AND new_col IS NULL, then promote to NOT NULL using a deferred constraint check (PG12+: add CHECK constraint first with NOT VALID, then VALIDATE CONSTRAINT in a separate transaction to avoid full lock)
  • βœ•Mistake 2: Relying on the ORM discriminator filter without a database-level index on the type column β€” Symptom: EXPLAIN ANALYZE shows Seq Scan on content_items with rows=4800000 on a supposedly filtered query, query time spikes from 2ms to 4 seconds under load β€” Fix: CREATE INDEX CONCURRENTLY idx_tablename_type ON tablename (type); β€” the CONCURRENTLY flag is non-negotiable on a live table, it builds the index without locking writes
  • βœ•Mistake 3: Using STI when subtypes have mutually exclusive NOT NULL requirements β€” Symptom: you can't enforce card_last_four IS NOT NULL for credit card rows without a trigger or application-level check, and eventually a bug inserts a credit card payment with NULL card data that silently passes validation β€” Fix: if your business rules require NOT NULL at the DB level for subtype-specific columns, you've already outgrown STI β€” migrate to Class Table Inheritance where each child table enforces its own constraints cleanly
  • βœ•Mistake 4: Growing an STI table beyond 3 subtypes without re-evaluating overlap ratio β€” Symptom: a schema audit 18 months in reveals the overlap ratio has drifted from 68% to 29% as new subtypes were added, query plans degrade silently, and the team only notices during a Black Friday traffic spike β€” Fix: track column count and NULL density in CI using a schema linting script that fails the build if any STI table's overlap ratio drops below 0.55

Interview Questions on This Topic

  • QYou have an STI table with 12M rows and the product team wants to add a NOT NULL column with a default value for one of four subtypes. Walk me through exactly how you handle that migration without a production outage on PostgreSQL.
  • QWhen would you choose Class Table Inheritance over Single Table Inheritance for a product catalogue with 8 product types, knowing that the editorial dashboard needs to display all product types in a single feed sorted by created_at?
  • QYour STI table's polymorphic index is being ignored by the query planner on 40% of queries β€” EXPLAIN ANALYZE shows a sequential scan despite a composite index on (type, status, created_at). What's causing it and what's your fix?

Frequently Asked Questions

Does single table inheritance cause performance problems at scale?

Yes, but only when the NULL density gets high enough to break index selectivity β€” typically when your subtype-specific columns outnumber shared columns. The concrete threshold is a null_frac above 0.35 in pg_stats for any indexed column, at which point PostgreSQL's query planner starts skipping the index for sequential scans. Fix it with partial indexes (WHERE type = 'x' AND column IS NOT NULL) and monitor NULL density as you add subtypes. At 70%+ shared column overlap, STI performs identically to a single-subtype table.

What's the difference between single table inheritance and class table inheritance?

STI puts every subtype's columns into one table with NULLs for irrelevant rows; CTI puts shared columns in a base table and subtype-specific columns in child tables joined by FK. Use STI when subtypes share more than 60% of columns and you need frequent cross-type queries. Use CTI when subtypes diverge significantly in their columns or when you need NOT NULL constraints on subtype-specific fields β€” something STI can't enforce at the database level.

How do I add a new subtype to an existing STI table in production without downtime?

Add all new columns as nullable first using ALTER TABLE ... ADD COLUMN new_col VARCHAR(50) β€” on PostgreSQL 11+ this is a metadata-only operation and completes instantly. Then backfill existing rows in batches: UPDATE content_items SET new_col = 'default_val' WHERE type = 'new_subtype' AND id BETWEEN :start AND :end. Only after the backfill is complete and verified should you add any NOT NULL constraint, and even then use ADD CONSTRAINT ... CHECK (type != 'new_subtype' OR new_col IS NOT NULL) NOT VALID followed by a separate VALIDATE CONSTRAINT to avoid holding a lock during validation.

Can you enforce database-level NOT NULL constraints on subtype-specific columns in a single table inheritance schema?

Not directly β€” a column in an STI table must be nullable because other subtypes don't populate it, making a blanket NOT NULL constraint impossible. The closest you can get is a conditional CHECK constraint: ALTER TABLE payments ADD CONSTRAINT chk_credit_card_required CHECK (type != 'credit_card' OR card_last_four IS NOT NULL). This enforces the business rule at the DB level without affecting other subtypes. But maintaining a growing set of these constraints across 6+ subtypes is a clear signal you've outgrown STI and need Class Table Inheritance, where each child table enforces its own clean NOT NULL constraints.

πŸ”₯
Naren Founder & Author

Developer and founder of TheCodeForge. I built this site because I was tired of tutorials that explain what to type without explaining why it works. Every article here is written to make concepts actually click.

← PreviousPolyglot Persistence
Forged with πŸ”₯ at TheCodeForge.io β€” Where Developers Are Forged