Skip to content
Home Database Materialized Views SQL - Refresh Order Bug Doubles Revenue

Materialized Views SQL - Refresh Order Bug Doubles Revenue

Where developers are forged. · Structured learning · Free forever.
📍 Part of: SQL Advanced → Topic 13 of 16
Two chained materialized views refreshed in the wrong order.
🔥 Advanced — solid Database foundation required
In this tutorial, you'll learn
Two chained materialized views refreshed in the wrong order.
  • A materialized view is a real table on disk — index it like one. Without indexes, you've pre-computed the answer but you're still reading every row to find it.
  • REFRESH MATERIALIZED VIEW CONCURRENT is almost always the right choice in production, but it needs a UNIQUE INDEX to work — create that index at the same time you create the view, not as an afterthought.
  • Chained materialized views must be refreshed in strict dependency order — the database won't enforce this for you, and wrong order means stale-on-stale data with no errors or warnings.
✦ Plain-English analogy ✦ Real code with output ✦ Interview questions
Quick Answer
  • Materialized views store query results as physical tables on disk
  • Refresh strategies: plain lock, concurrent (no-read-block), incremental (Oracle)
  • Always build indexes after creation — unindexed mat views are half the solution
  • Staleness is the core trade-off: accept lag for speed; monitor last_refresh
  • Biggest mistake: forgetting to refresh chained mat views in dependency order
🚨 START HERE

Materialized View Quick Debug Cheat Sheet

Five common materialized view failures and the commands to fix them immediately.
🟡

View is empty

Immediate ActionCheck if created WITH DATA or not
Commands
SELECT last_refresh FROM pg_matviews WHERE matviewname = 'your_mv';
REFRESH MATERIALIZED VIEW your_mv;
Fix NowAdd WITH DATA to the CREATE statement if rebuilding.
🟡

Concurrent refresh fails

Immediate ActionCheck for unique index
Commands
SELECT indexname FROM pg_indexes WHERE tablename = 'your_mv';
CREATE UNIQUE INDEX idx_mv_pk ON your_mv (pk_col1, pk_col2);
Fix NowAdd required unique index then retry CONCURRENTLY.
🟡

View returns old data after schema change

Immediate ActionForce refresh to detect breakage
Commands
REFRESH MATERIALIZED VIEW your_mv;
If error: roll back migration or fix column references.
Fix NowAdd a deployment step to always REFRESH all dependent mat views after any schema change.
🟡

Chained views show inconsistent data

Immediate ActionIdentify dependency order by inspecting definitions
Commands
SELECT pg_get_viewdef('your_mv', true) AS definition;
Cross-reference with other mat views in information_schema.views.
Fix NowWrite a stored procedure that refreshes leaves first, then dependents, within a transaction.
🟡

Last refresh is hours stale despite scheduled job

Immediate ActionCheck if the scheduler job is still active and runnable
Commands
SELECT * FROM cron.job WHERE jobname LIKE '%refresh%';
SELECT * FROM cron.job_run_details WHERE jobname LIKE '%refresh%' ORDER BY start_time DESC;
Fix NowRestart the cron schedule job and set up alerting on last_refresh being older than 2x the expected interval.
Production Incident

The Dashboard That Showed Tomorrow's Revenue

A finance team viewed revenue projections that were one day ahead — new orders appeared before they even existed in the base table.
SymptomExecutive dashboard showed revenue numbers for the current date that exceeded all known sales channels. At 10 AM, the dashboard already displayed a daily total that was physically impossible given the actual order volume.
AssumptionThe materialized view refresh job ran once per hour at :15 past the hour. The data must be fresh within that window.
Root causeTwo materialized views were chained: mv_raw_orders -> mv_revenue_summary. The cron job refreshed mv_revenue_summary first (the dependent view) and mv_raw_orders second. Because mv_revenue_summary read stale data from mv_raw_orders, the refresh accidentally carried over data from the previous refresh cycle, duplicating rows from two days ago. A bug in the refresh script also failed to wrap the two refreshes in a single transaction, so when the second refresh failed due to a transient deadlock, the first one had already committed — leaving the dashboard with an incomplete, merged dataset.
FixMap out the dependency graph, rearrange the refresh order to start at leaves, and wrap the chain in a stored procedure with transactional boundaries. Also added a data_age check on the dashboard that surfaces 'Data as of X minutes ago' to users.
Key Lesson
Chained materialized views require strict topological refresh order — refresh leaves first, then dependents.Always run chained refreshes inside a single transaction or stored procedure to prevent partial updates.Expose data freshness to end users to prevent silent decision-making on stale data.
Production Debug Guide

Symptom → Action guide for the most common production failures

Materialized view query returns 0 rows but base table has dataCheck if the mat view was created WITH NO DATA. Run REFRESH MATERIALIZED VIEW to populate it. Also verify that the defining query is correct — temporary tables or dropped aliases can cause empty results.
Dashboard shows data that stopped updating days agoCheck pg_matviews.last_refresh. If null or very old, the scheduled refresh job probably crashed silently. Look in pg_cron or external scheduler logs for errors.
REFRESH MATERIALIZED VIEW CONCURRENTLY fails with 'must have unique index'Create a UNIQUE index on the materialized view. Use a composite index on all columns that form the natural key. Without it, concurrent refresh cannot diff the rows.
Queries on the materialized view are slow despite precomputationYou forgot indexes on the mat view. Run EXPLAIN ANALYZE — if it shows a Sequential Scan on a large mat view, create B-tree indexes on your most common filter columns.
After a schema migration, materialized view returns old data with no errorPostgreSQL does not invalidate mat views on schema changes. Run REFRESH immediately after migration. If the refresh fails, the column rename or drop broke the defining query — roll back the migration.

Every production database eventually hits the same wall: a handful of queries that are crushingly expensive, run constantly, and touch the same underlying data. Business intelligence dashboards, leaderboards, financial summaries, multi-table aggregations — these queries can take seconds or even minutes on large datasets, and they get fired dozens of times per minute. That latency doesn't just slow your app down; it locks rows, spikes CPU, and can cascade into a full outage under load.

Materialized views exist to break that cycle. Unlike a regular (virtual) view — which is just a saved SQL statement that re-executes every time you query it — a materialized view physically stores the result set on disk. Query it and the database reads pre-computed rows instead of re-joining six tables and aggregating millions of records. The trade-off is freshness: the stored result can grow stale between refreshes. Understanding how to manage that trade-off is what separates engineers who use materialized views effectively from those who introduce subtle, hard-to-debug data inconsistencies into production.

By the end of this article you'll understand how materialized view storage and indexing works under the hood, how to choose between complete, fast, and incremental refresh strategies, when staleness is acceptable versus dangerous, and the exact production mistakes that bite teams who don't read the fine print. We'll use PostgreSQL for most examples (the most nuanced implementation) and call out Oracle and SQL Server differences where they matter.

How Materialized Views Actually Store Data (And Why That Changes Everything)

A regular view is essentially a macro — the database swaps in its definition at parse time and executes it fresh every time. No storage. No state. A materialized view is a real table on disk that happens to know how it was built.

In PostgreSQL, when you run CREATE MATERIALIZED VIEW, the engine executes the defining query immediately and writes the result rows into a heap file — the same physical storage format used by ordinary tables. From that point on, querying the materialized view is indistinguishable from querying a table. You get sequential scans, bitmap index scans, all of it.

Because it's a real table, you can — and should — build indexes on it. This is the feature most developers miss. If your materialized view aggregates sales by region and date, you can create a B-tree index on (region, sale_date) so point lookups stay sub-millisecond even if the materialized view contains tens of millions of rows.

The defining query isn't re-evaluated on every SELECT. It's only re-evaluated when you explicitly REFRESH MATERIALIZED VIEW. This means the planner can't 'look through' the materialized view to push predicates down into the base tables — it just scans the stored rows. That's usually a win, but it's something to keep in mind when your filter selectivity is very high.

create_sales_materialized_view.sql · SQL
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950
-- Assume we have these base tables in a production analytics schema:
-- orders(order_id, customer_id, product_id, region, order_date, amount)
-- products(product_id, category, unit_cost)

-- This query is expensive: it joins two large tables, aggregates, and
-- gets called by the dashboard on every page load (~200 req/min).
-- Without a materialized view it takes ~4 seconds. Unacceptable.

CREATE MATERIALIZED VIEW mv_regional_sales_summary AS
SELECT
    o.region,
    p.category                          AS product_category,
    DATE_TRUNC('month', o.order_date)   AS sale_month,
    COUNT(o.order_id)                   AS total_orders,
    SUM(o.amount)                       AS total_revenue,
    SUM(o.amount - p.unit_cost)         AS total_profit,
    AVG(o.amount)                       AS avg_order_value
FROM orders o
JOIN products p ON o.product_id = p.product_id
GROUP BY
    o.region,
    p.category,
    DATE_TRUNC('month', o.order_date)
ORDER BY sale_month DESC, total_revenue DESC
WITH DATA;  -- Execute the query NOW and store results immediately.
            -- Use WITH NO DATA to create the structure without populating it.

-- Now build indexes on the most common filter/join patterns.
-- Without this step you've only solved half the problem.
CREATE INDEX idx_mv_sales_region_month
    ON mv_regional_sales_summary (region, sale_month);

CREATE INDEX idx_mv_sales_category
    ON mv_regional_sales_summary (product_category);

-- Verify what was stored — should return rows immediately at near-zero cost
EXPLAIN ANALYZE
SELECT region, total_revenue
FROM mv_regional_sales_summary
WHERE region = 'EMEA'
  AND sale_month >= '2024-01-01';

-- Check the physical size of the materialized view on disk
SELECT
    pg_size_pretty(pg_total_relation_size('mv_regional_sales_summary')) AS total_size,
    pg_size_pretty(pg_relation_size('mv_regional_sales_summary'))       AS table_size,
    pg_size_pretty(
        pg_total_relation_size('mv_regional_sales_summary')
        - pg_relation_size('mv_regional_sales_summary')
    )                                                                   AS index_size;
▶ Output
-- EXPLAIN ANALYZE output (trimmed for clarity):
-- Index Scan using idx_mv_sales_region_month on mv_regional_sales_summary
-- (cost=0.28..8.30 rows=6 width=52)
-- (actual time=0.041..0.063 rows=6 loops=1)
-- Planning Time: 0.18 ms
-- Execution Time: 0.12 ms <-- was 4,200 ms before. That's a 35,000x improvement.

-- Size query output:
-- total_size | table_size | index_size
-- ------------+------------+------------
-- 2184 kB | 1424 kB | 760 kB
💡Pro Tip: Always Index Your Materialized Views
A materialized view without indexes is like a pre-computed answer written on loose paper stuffed in a filing cabinet — you still have to rifle through the whole pile. Add indexes matching your most common WHERE and JOIN predicates immediately after creation. For a dashboard filtered by date range, a BRIN index on the date column is often 10x smaller than a B-tree and nearly as fast for sequential date scans on append-only data.
📊 Production Insight
If you create a materialized view and forget indexes, queries that filter on a column will do a full sequential scan of the mat view heap file. On a 50-million-row mat view, that's seconds per query — defeating the entire purpose.
Production teams often notice the issue when a dashboard that was fast for the first week suddenly slows down because data volume grew. The fix: rebuild indexes periodically or add them at creation.
Rule: create the UNIQUE index for CONCURRENTLY refresh at the same time as your performance indexes.
🎯 Key Takeaway
A materialized view is a real table on disk — index it like one.
Without indexes, you've pre-computed the answer but you're still reading every row to find it.
Always create performance indexes and a UNIQUE index immediately after creation.

Refresh Strategies: Complete, Concurrent and Incremental — Choosing the Right One

Refreshing a materialized view is where most of the complexity lives. You're essentially answering: how stale can this data be, and how much disruption can I tolerate during the refresh itself?

PostgreSQL gives you two flavours of REFRESH. A plain REFRESH MATERIALIZED VIEW takes an exclusive lock — no reads allowed during the swap. It drops the old data, re-executes the full defining query, and writes new rows. On a small view this is fine. On a view with 50 million rows, you can lock out your dashboard for 10 minutes.

REFRESH MATERIALIZED VIEW CONCURRENTLY is the production-safe alternative. It re-computes the full query behind the scenes, diffs it against the current content using a unique index, and swaps only the changed rows — all while allowing concurrent reads. The cost: it takes roughly twice as long and requires a unique index on the materialized view.

Oracle goes further with incremental (fast) refresh, where the engine tracks base-table changes in materialised view logs and applies only the delta — no full re-scan. This is dramatically faster for large datasets with small change volumes, but it comes with strict restrictions on what SQL you can use in the defining query. SQL Server's indexed views are the closest equivalent and they refresh synchronously on every DML statement — zero staleness, but DML overhead you need to account for.

refresh_strategies_demo.sql · SQL
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768
-- ============================================================
-- STRATEGY 1: Plain refresh — simple but locks out readers
-- Use this for: non-critical views, off-hours batch jobs,
--               views that are fast to recompute (< 1 second)
-- ============================================================
REFRESH MATERIALIZED VIEW mv_regional_sales_summary;
-- This acquires AccessExclusiveLock. Concurrent SELECTs will WAIT.
-- Safe for batch jobs at 2am. Not safe during business hours.


-- ============================================================
-- STRATEGY 2: Concurrent refresh — production-safe
-- Requirement: the materialized view MUST have a unique index.
-- ============================================================

-- First, add the required unique index. We create a surrogate
-- key by combining our natural key columns.
CREATE UNIQUE INDEX idx_mv_sales_unique_key
    ON mv_regional_sales_summary (region, product_category, sale_month);
-- ^^^ Without this, CONCURRENT refresh will throw:
-- ERROR: cannot refresh materialized view concurrently without a unique index

-- Now the safe, concurrent refresh:
REFRESH MATERIALIZED VIEW CONCURRENTLY mv_regional_sales_summary;
-- PostgreSQL re-runs the full query in the background, compares
-- results, and performs targeted INSERT/UPDATE/DELETE operations.
-- Readers are NEVER blocked. Writers on the base tables are not blocked.
-- This takes ~2x longer but causes zero downtime.


-- ============================================================
-- STRATEGY 3: Automate refresh with pg_cron (or a scheduler)
-- Refresh every 15 minutes during business hours
-- ============================================================

-- Install pg_cron extension (requires superuser, done once)
CREATE EXTENSION IF NOT EXISTS pg_cron;

-- Schedule concurrent refresh every 15 minutes, 8am-8pm weekdays
SELECT cron.schedule(
    'refresh-sales-summary',          -- job name
    '*/15 8-20 * * 1-5',              -- cron expression
    $$REFRESH MATERIALIZED VIEW CONCURRENTLY mv_regional_sales_summary$$
);

-- Check scheduled jobs
SELECT jobname, schedule, command, active
FROM cron.job
WHERE jobname = 'refresh-sales-summary';


-- ============================================================
-- MONITORING: How stale is our materialized view right now?
-- ============================================================
SELECT
    schemaname,
    matviewname,
    -- last_refresh is NULL if the view was never refreshed
    -- (i.e., created WITH NO DATA)
    last_refresh,
    NOW() - last_refresh AS data_age,
    CASE
        WHEN NOW() - last_refresh > INTERVAL '30 minutes'
        THEN 'STALE — consider refreshing'
        ELSE 'Fresh enough'
    END AS freshness_status
FROM pg_matviews
WHERE matviewname = 'mv_regional_sales_summary';
▶ Output
-- After scheduling the cron job:
-- jobname | schedule | command | active
-- --------------------------+-----------------+--------------------------------------------+--------
-- refresh-sales-summary | */15 8-20 * * 1-5 | REFRESH MATERIALIZED VIEW CONCURRENTLY ... | t

-- Freshness check output (run 22 minutes after last refresh):
-- schemaname | matviewname | last_refresh | data_age | freshness_status
-- ------------+---------------------------+-----------------------------+------------------+----------------------------
-- public | mv_regional_sales_summary | 2024-06-15 14:15:03.421+00 | 00:22:17.843291 | STALE — consider refreshing
⚠ Watch Out: CONCURRENT Refresh Doubles Your I/O
REFRESH MATERIALIZED VIEW CONCURRENTLY re-runs the full defining query AND does a diff against existing data. On a view with 10M rows this can briefly spike disk I/O and CPU significantly more than a plain refresh. Monitor pg_stat_activity and pg_locks during your first few concurrent refreshes in staging before assuming it's 'free'. If the refresh window is acceptable and you can tolerate a brief read lock, plain refresh is often faster overall.
📊 Production Insight
The biggest surprise for teams: CONCURRENTLY refresh is not free. It doubles CPU and I/O temporarily. On a shared production database, a CONCURRENTLY refresh at peak hour can degrade other queries.
We saw a case where a 50M-row mat view refresh caused a 30-second I/O spike that slowed all dashboard queries. The fix: schedule the refresh during a known low-usage window, or switch to a plain refresh if a 10-second lock was acceptable.
Rule: test refresh performance in staging with production-like data volume before committing to a strategy.
🎯 Key Takeaway
CONCURRENTLY refresh is the production-safe choice but costs more resources.
Assess refresh duration and lock tolerance before deciding.
Always create a UNIQUE index even if you start with plain refresh — you'll need it later.
Choosing a Refresh Strategy
IfCan you tolerate a read lock for the refresh duration?
UseYes -> Plain REFRESH (simpler, faster). No -> CONCURRENTLY.
IfDoes the mat view need a UNIQUE index?
UseFor CONCURRENTLY, yes. Otherwise, optional for performance.
IfIs the base table change volume very high?
UseConsider Oracle incremental refresh or summary table with triggers.
IfNeed real-time accuracy (sub-second)?
UseDon't use mat views. Use synchronous summary table or application cache.

Production Gotchas: Staleness, Dependency Chains and Schema Drift

Using materialized views in production means living with three unavoidable tensions: staleness, fragile dependency chains, and schema evolution.

Staleness is obvious but its consequences are subtle. A dashboard showing last-hour totals from a stale materialized view can mislead a sales team into over-selling inventory. Worse, if two systems read from different materialized views that refresh on different schedules, you can get internally inconsistent reports — your revenue total doesn't match your order count total. Always document the refresh cadence alongside every materialized view and surface data_age to end users when it exceeds a threshold.

Dependency chains are the silent killer. Materialized view B built on materialized view A doesn't automatically refresh when A refreshes. You must refresh them in topological order — A first, then B. PostgreSQL won't warn you. You'll get stale-on-stale data silently.

Schema drift happens when someone alters a base table column — renames it, changes its type, drops it — and the materialized view quietly breaks. Unlike regular views, PostgreSQL doesn't invalidate a materialized view when the base table changes (it will error at refresh time, not at query time). Your dashboards will keep returning stale data from the last successful refresh with no error, until the next refresh crashes.

production_dependency_and_staleness.sql · SQL
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889
-- ============================================================
-- SCENARIO: Chained materialized views (A -> B)
-- mv_raw_order_metrics depends on base tables
-- mv_executive_dashboard depends on mv_raw_order_metrics
-- ============================================================

CREATE MATERIALIZED VIEW mv_raw_order_metrics AS
SELECT
    customer_id,
    DATE_TRUNC('day', order_date)  AS order_day,
    COUNT(*)                        AS daily_order_count,
    SUM(amount)                     AS daily_spend
FROM orders
GROUP BY customer_id, DATE_TRUNC('day', order_date)
WITH DATA;

CREATE UNIQUE INDEX idx_raw_metrics_unique
    ON mv_raw_order_metrics (customer_id, order_day);

-- This view depends on the one above, NOT on the base table directly
CREATE MATERIALIZED VIEW mv_executive_dashboard AS
SELECT
    order_day,
    COUNT(DISTINCT customer_id)     AS active_customers,
    SUM(daily_order_count)          AS platform_orders,
    SUM(daily_spend)                AS platform_revenue,
    AVG(daily_spend)                AS avg_customer_spend
FROM mv_raw_order_metrics          -- <-- reading from another mat view!
GROUP BY order_day
ORDER BY order_day DESC
WITH DATA;

-- WRONG: refreshing in the wrong order gives stale-on-stale data
-- REFRESH MATERIALIZED VIEW CONCURRENTLY mv_executive_dashboard;
-- REFRESH MATERIALIZED VIEW CONCURRENTLY mv_raw_order_metrics;
-- ^^^ This refreshes the exec dashboard BEFORE the raw metrics are updated.
-- The exec dashboard will reflect yesterday's raw metrics for another cycle.

-- CORRECT: always refresh in dependency order (leaves first)
REFRESH MATERIALIZED VIEW CONCURRENTLY mv_raw_order_metrics;
REFRESH MATERIALIZED VIEW CONCURRENTLY mv_executive_dashboard;
-- Now the exec dashboard reflects today's raw data.

-- ============================================================
-- DEFENSIVE PATTERN: Wrap chained refreshes in a transaction
-- with error handling using a stored procedure
-- ============================================================
CREATE OR REPLACE PROCEDURE refresh_dashboard_views()
LANGUAGE plpgsql
AS $$
BEGIN
    -- Refresh in strict dependency order
    REFRESH MATERIALIZED VIEW CONCURRENTLY mv_raw_order_metrics;
    RAISE NOTICE 'mv_raw_order_metrics refreshed at %', NOW();

    REFRESH MATERIALIZED VIEW CONCURRENTLY mv_executive_dashboard;
    RAISE NOTICE 'mv_executive_dashboard refreshed at %', NOW();

EXCEPTION
    WHEN OTHERS THEN
        -- Log the error but don't crash silently.
        -- In production you'd INSERT into an error_log table here.
        RAISE WARNING 'Dashboard refresh failed: % — %', SQLSTATE, SQLERRM;
        -- The materialized views retain their last successful data.
        -- Stale data is better than no data for a dashboard.
END;
$$;

-- Schedule this procedure instead of raw REFRESH commands
SELECT cron.schedule(
    'refresh-dashboard-chain',
    '*/15 8-20 * * 1-5',
    'CALL refresh_dashboard_views()'
);

-- ============================================================
-- DETECTING SCHEMA DRIFT BEFORE IT SILENTLY CORRUPTS REPORTS
-- Run this check as part of your deployment pipeline
-- ============================================================
SELECT
    mv.matviewname,
    mv.definition,
    -- If the view references columns that no longer exist, this
    -- will surface as an error when you try to refresh.
    -- Pre-validate by attempting a dry-run EXPLAIN:
    'EXPLAIN (FORMAT TEXT) SELECT * FROM ' || mv.matviewname || ' LIMIT 0' AS validation_query
FROM pg_matviews mv
WHERE mv.schemaname = 'public'
ORDER BY mv.matviewname;
▶ Output
-- After calling CALL refresh_dashboard_views():
-- NOTICE: mv_raw_order_metrics refreshed at 2024-06-15 14:30:01.123456+00
-- NOTICE: mv_executive_dashboard refreshed at 2024-06-15 14:30:04.987654+00

-- If a base table column was dropped and refresh is attempted:
-- WARNING: Dashboard refresh failed: 42703 —
-- column "amount" of relation "orders" does not exist
-- (The materialized views retain their last valid data — no silent data loss)
⚠ Watch Out: Schema Changes Don't Invalidate Materialized Views at Query Time
If you DROP or RENAME a column that a materialized view depends on, PostgreSQL won't error when you SELECT from the view — it'll happily return the stale cached data from before the schema change. The error only surfaces on the next REFRESH. This means you can silently serve wrong data for an entire refresh cycle. Add a post-migration step to your deployment runbook: always REFRESH all dependent materialized views immediately after any schema change, even if it means a brief lock. If the refresh fails, you know immediately and can roll back the migration.
📊 Production Insight
At a previous company, a DBA renamed a column in the orders table from 'amount' to 'total_amount'. The materialized view mv_order_summary continued to return correct-looking data for three days because it had not been refreshed since the migration. The exec team approved a budget based on those numbers. When the weekly refresh finally ran, it crashed with a column-not-found error. The damage was done.
The fix: add a CI pipeline step that runs EXPLAIN on every mat view definition against the current schema before deployment. If the EXPLAIN fails, the migration is blocked.
Rule: schema changes on base tables = immediate refresh of all dependent mat views, or block the deployment.
🎯 Key Takeaway
Schema drift is silent — mat views keep returning old data until the next refresh fails.
Always pair schema migrations with a forced mat view refresh.
Add automated checks to detect column changes before they reach production.

When to Use Materialized Views vs. Regular Views, Caching Layers and Summary Tables

Materialized views aren't always the right tool. Making that call correctly is what distinguishes senior database work from cargo-culting.

Use a regular view when the underlying query is fast (under ~100ms), the data must be absolutely real-time, or you need the planner to push predicates into the base tables. Regular views compose well and have zero refresh overhead.

Use a materialized view when: the query is expensive and runs frequently, a few minutes (or hours) of staleness is acceptable to the business, and you need indexes on the result set. The sweet spot is read-heavy analytics against tables that are written to in batch (nightly ETL, hourly imports). It's a terrible fit for financial ledgers or inventory counts where users will make decisions based on the number.

Use an application-level cache (Redis, Memcached) when your query result is tiny, needs sub-millisecond reads, or needs to be shared across database clusters. A materialized view doesn't help if the bottleneck is network round-trips to the DB.

Use a dedicated summary table with manual INSERT/UPDATE logic when you need incremental updates that materialized view refresh can't support — like a running total that must be accurate to the last second. It's more code, but it's more control.

mv_vs_alternatives_decision.sql · SQL
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677
-- ============================================================
-- SCENARIO: Compare the same business question answered three ways.
-- Question: What is the total lifetime spend per customer?
-- Base table: orders (500 million rows, growing 1M rows/day)
-- ============================================================

-- OPTION 1: Regular View — always fresh, always slow on 500M rows
CREATE OR REPLACE VIEW vw_customer_lifetime_spend AS
SELECT
    customer_id,
    COUNT(order_id)     AS total_orders,
    SUM(amount)         AS lifetime_spend,
    MIN(order_date)     AS first_order_date,
    MAX(order_date)     AS most_recent_order
FROM orders
GROUP BY customer_id;
-- Querying this for a single customer does a full 500M-row aggregation.
-- EXPLAIN shows: HashAggregate -> Seq Scan on orders (~12 seconds)

-- OPTION 2: Materialized View — refreshed nightly, fast reads
CREATE MATERIALIZED VIEW mv_customer_lifetime_spend AS
SELECT
    customer_id,
    COUNT(order_id)     AS total_orders,
    SUM(amount)         AS lifetime_spend,
    MIN(order_date)     AS first_order_date,
    MAX(order_date)     AS most_recent_order
FROM orders
GROUP BY customer_id
WITH DATA;

CREATE UNIQUE INDEX idx_mv_clv_customer
    ON mv_customer_lifetime_spend (customer_id);
-- Now a point lookup for one customer: ~0.1ms. Nightly full refresh: ~45 seconds.
-- Staleness: up to 24 hours. Acceptable for a CRM 'about this customer' panel.
-- NOT acceptable for a real-time fraud check.

-- OPTION 3: Running summary table — always accurate, incremental updates
-- Best when you need real-time accuracy without the full-scan cost.
CREATE TABLE customer_spend_summary (
    customer_id     BIGINT PRIMARY KEY,
    total_orders    BIGINT      NOT NULL DEFAULT 0,
    lifetime_spend  NUMERIC(14,2) NOT NULL DEFAULT 0,
    first_order_date DATE,
    last_order_date  DATE,
    updated_at      TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

-- Trigger that maintains the summary table on every new order
CREATE OR REPLACE FUNCTION fn_update_customer_spend_summary()
RETURNS TRIGGER
LANGUAGE plpgsql
AS $$
BEGIN
    INSERT INTO customer_spend_summary
        (customer_id, total_orders, lifetime_spend, first_order_date, last_order_date, updated_at)
    VALUES
        (NEW.customer_id, 1, NEW.amount, NEW.order_date, NEW.order_date, NOW())
    ON CONFLICT (customer_id) DO UPDATE SET
        total_orders     = customer_spend_summary.total_orders + 1,
        lifetime_spend   = customer_spend_summary.lifetime_spend + NEW.amount,
        -- LEAST/GREATEST keep the min/max without scanning historical data
        first_order_date = LEAST(customer_spend_summary.first_order_date, NEW.order_date),
        last_order_date  = GREATEST(customer_spend_summary.last_order_date, NEW.order_date),
        updated_at       = NOW();
    RETURN NEW;
END;
$$;

CREATE TRIGGER trg_orders_update_spend_summary
    AFTER INSERT ON orders
    FOR EACH ROW
    EXECUTE FUNCTION fn_update_customer_spend_summary();

-- Result: point lookups in ~0.1ms AND data is accurate to the last millisecond.
-- Cost: every INSERT into orders now also writes to customer_spend_summary.
-- On 1M orders/day: ~12ms extra latency per insert batch. Usually acceptable.
▶ Output
-- Benchmark comparison for: SELECT * FROM <source> WHERE customer_id = 99821
--
-- vw_customer_lifetime_spend (regular view): 11,847 ms (full table scan)
-- mv_customer_lifetime_spend (mat view, stale): 0.09 ms (index lookup, up to 24h stale)
-- customer_spend_summary (summary table): 0.07 ms (primary key lookup, real-time)
--
-- Mat view nightly REFRESH duration: 44,231 ms (73 seconds on 500M rows)
-- Summary table trigger overhead per INSERT: ~12 ms (microseconds amortized)
🔥Interview Gold: The Real-Time vs. Staleness Trade-off
Interviewers love asking 'why not just use a materialized view for everything?' The answer is that a materialized view is a read-optimisation that trades staleness for speed. If the business requirement is 'accurate to the last second', a materialized view is the wrong tool — use a summary table with triggers, or an incremental aggregation system like a streaming pipeline. Always start by quantifying how stale is acceptable before reaching for any solution.
📊 Production Insight
We once replaced a materialized view for an inventory dashboard with a trigger-maintained summary table because the sales team needed real-time stock levels. The mat view refreshed every 15 minutes, but the business lost money on oversold items during flash sales. The summary table added 10ms per insert but eliminated overselling entirely.
Trade-off acknowledged: 10ms extra write latency was acceptable for 100% accurate reads.
Rule: quantify staleness tolerance in hours before choosing your tool. If tolerance is < 1 minute, don't use a materialized view.
🎯 Key Takeaway
Materialized views are read-optimisation tools that accept staleness.
If the business requires real-time accuracy, use a summary table with triggers or streaming.
Always start with the question: 'How stale is acceptable?'
Decision Tree: Regular View vs Materialized View vs Summary Table
IfQuery returns data in <100ms on base tables?
UseUse a regular view — no storage overhead, always fresh.
IfQuery is slow but staleness of minutes/hours is acceptable?
UseMaterialized view — fast reads, scheduled refresh.
IfNeed real-time accuracy with fast reads?
UseSummary table with triggers — incremental, always fresh, write penalty.
IfResult set is tiny and shared across app servers?
UseApplication cache (Redis/Memcached) — sub-millisecond reads, no DB load.

Monitoring Materialized View Freshness and Performance in Production

You can't trust a materialized view unless you know how fresh it is and how healthy its refresh process is. PostgreSQL provides the pg_matviews system catalog that exposes last_refresh — but that's just the start.

  1. Freshness lag: query NOW() - last_refresh and alert when it exceeds 2x the expected refresh interval. A stale view with a broken job looks perfectly healthy to the application — queries return instantly, but the data is wrong.
  2. Refresh duration: track how long each REFRESH takes. A sudden increase often means the base table has grown significantly, the defining query has regressed, or the mat view indexes are bloated.
  3. Refresh failures: catch errors from refresh commands. A failed refresh means the mat view keeps its last successful data — users see old data with no error. This is the most dangerous class of failure.

For Oracle, use DBA_MVIEWS and DBA_MVIEW_REFRESH_TIMES. For SQL Server, the sys.objects and sys.dm_db_index_physical_stats provide similar insights.

Advanced technique: create a heartbeat table that gets updated every time a critical mat view is refreshed. The monitoring system checks the heartbeat instead of querying system tables — simpler and cross-database.

monitoring_mat_view_health.sql · SQL
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384
-- ============================================================
-- MONITORING: PostgreSQL mat view health check
-- Run every minute from a monitoring tool (cron, pg_cron, or external)
-- ============================================================

-- Signal 1: Freshness lag
SELECT
    matviewname,
    last_refresh,
    EXTRACT(EPOCH FROM (NOW() - last_refresh)) / 60 AS minutes_since_refresh,
    CASE
        WHEN last_refresh IS NULL THEN 'NEVER_REFRESHED'
        WHEN NOW() - last_refresh > INTERVAL '30 minutes' THEN 'STALE'
        ELSE 'FRESH'
    END AS status
FROM pg_matviews
WHERE schemaname = 'public';

-- Signal 2: Refresh duration (if you log start/end to a custom table)
CREATE TABLE mv_refresh_log (
    id SERIAL PRIMARY KEY,
    matview_name TEXT NOT NULL,
    started_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    finished_at TIMESTAMPTZ,
    duration_ms INTEGER,
    success BOOLEAN,
    error_message TEXT
);

-- Wrap refresh with logging (example for one mat view)
INSERT INTO mv_refresh_log (matview_name, started_at)
VALUES ('mv_regional_sales_summary', NOW());

REFRESH MATERIALIZED VIEW CONCURRENTLY mv_regional_sales_summary;

UPDATE mv_refresh_log
SET finished_at = NOW(),
    duration_ms = EXTRACT(EPOCH FROM (NOW() - started_at)) * 1000,
    success = TRUE
WHERE id = currval('mv_refresh_log_id_seq');

-- Query recent refresh performance
SELECT
    matview_name,
    AVG(duration_ms)::INT AS avg_duration_ms,
    MAX(duration_ms) AS max_duration_ms,
    COUNT(*) AS total_refreshes,
    SUM(CASE WHEN success = FALSE THEN 1 ELSE 0 END) AS failures
FROM mv_refresh_log
WHERE started_at > NOW() - INTERVAL '7 days'
GROUP BY matview_name
ORDER BY avg_duration_ms DESC;

-- Signal 3: Alert on failures
SELECT matview_name, started_at, error_message
FROM mv_refresh_log
WHERE success = FALSE AND started_at > NOW() - INTERVAL '1 hour';

-- ============================================================
-- Heartbeat table approach (cross-database portable)
-- ============================================================
CREATE TABLE mv_heartbeat (
    matview_name TEXT PRIMARY KEY,
    last_refresh_ok TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    expected_refresh_interval INTERVAL NOT NULL
);

-- Update heartbeat after every successful refresh
UPDATE mv_heartbeat
SET last_refresh_ok = NOW()
WHERE matview_name = 'mv_regional_sales_summary';

-- Monitoring query
SELECT
    matview_name,
    last_refresh_ok,
    NOW() - last_refresh_ok AS age,
    expected_refresh_interval,
    CASE
        WHEN NOW() - last_refresh_ok > expected_refresh_interval * 2
        THEN 'ALERT: stale'
        ELSE 'OK'
    END AS status
FROM mv_heartbeat;
▶ Output
-- Hypothetical monitoring output:
-- matview_name | minutes_since_refresh | status
-- -----------------------------+-----------------------+--------
-- mv_regional_sales_summary | 22 | FRESH
-- mv_executive_dashboard | 480 | STALE <-- job likely died

-- Refresh log:
-- matview_name | avg_duration_ms | max_duration_ms | failures
-- --------------------------+-----------------+-----------------+---------
-- mv_regional_sales_summary | 42100 | 45800 | 0
-- mv_executive_dashboard | 78400 | 89120 | 3
💡Best Practice: Expose data_age to Users
Don't hide staleness — surface it. Add a small badge on your dashboard that says 'Data as of 14:22 — 8 minutes ago'. Users who see this will trust the data and know when to hit refresh or wait for the next cycle. It also gives you a natural alert signal: if the badge shows 'Data as of 3 hours ago' when it should be minutes, everyone knows something is broken.
📊 Production Insight
We set up a PagerDuty alert when last_refresh exceeded 60 minutes for a mat view that refreshed every 15. The first week we got ten alerts — all from legitimate issues: the cron extension was not installed, the server had run out of disk space during refresh, and a DBA dropped a column without telling anyone. Each alert prevented a potential misinformed business decision.
Cost: one hour to set up the monitoring. Benefit: saved at least one major financial reporting error per quarter.
Rule: monitor freshness as if the data were perishable food — because in decision-making, it is.
🎯 Key Takeaway
A mat view that returns results instantly can be silently serving stale data.
Monitor last_refresh, refresh duration, and failures.
Surface data_age to end users — transparency builds trust and catches problems faster.
🗂 Choosing Between Regular Views, Materialized Views, and Summary Tables
Key trade-offs for production decision-making
AspectRegular ViewMaterialized ViewSummary Table (Trigger-Maintained)
Data freshnessAlways real-timeStale until next REFRESHReal-time (per-row trigger)
Read performanceAs slow as the defining queryAs fast as a table + indexAs fast as a table + index
Write overheadNoneNone on DML; burst cost on REFRESHSmall overhead on every DML
Storage costZero (no data stored)Full result set on diskFull result set on disk
Supports indexesNo (it's a virtual layer)Yes — build any index you needYes — it is a real table
Refresh mechanismAutomatic (every query)Manual or scheduled REFRESHAutomatic via trigger or app logic
Handles aggregate updatesRecomputes from scratchRecomputes from scratchIncremental delta only
Schema change resilienceErrors immediately on queryErrors only at next REFRESHErrors immediately on INSERT
Concurrent reads during refreshAlways allowedBlocked (plain) / Allowed (CONCURRENT)Always allowed
Best forSimple aliasing, security viewsRead-heavy analytics, dashboardsReal-time counters, running totals

🎯 Key Takeaways

  • A materialized view is a real table on disk — index it like one. Without indexes, you've pre-computed the answer but you're still reading every row to find it.
  • REFRESH MATERIALIZED VIEW CONCURRENT is almost always the right choice in production, but it needs a UNIQUE INDEX to work — create that index at the same time you create the view, not as an afterthought.
  • Chained materialized views must be refreshed in strict dependency order — the database won't enforce this for you, and wrong order means stale-on-stale data with no errors or warnings.
  • Staleness is the core trade-off: materialized views are wrong for real-time financial or inventory data, perfect for analytical aggregations where a 15-minute lag is business-acceptable. Always get that SLA in writing before choosing your tool.
  • Monitor mat view freshness as rigorously as you monitor application health. Stale data that looks correct is more dangerous than a query that fails.

⚠ Common Mistakes to Avoid

    Using plain REFRESH MATERIALIZED VIEW during business hours
    Symptom

    Dashboard timeouts and connection pile-ups during refresh windows. The AccessExclusiveLock blocks all reads on the view for the entire duration of the refresh, which can be minutes on large views.

    Fix

    Always use REFRESH MATERIALIZED VIEW CONCURRENTLY in production. Create a UNIQUE INDEX on the view first (it's required). Accept the ~2x refresh duration — it's worth every second.

    Forgetting to refresh chained materialized views in dependency order
    Symptom

    No error, no warning, but exec dashboard shows wrong numbers. If view B is built on view A, refreshing B before A gives you a view built on yesterday's A.

    Fix

    Map out your materialized view dependency graph, store it in documentation, and codify the correct refresh order in a stored procedure or orchestration DAG (Airflow, dbt). Never issue REFRESH commands ad-hoc in production.

    Treating a materialized view as equivalent to a cache and never checking its age
    Symptom

    Reports show flat lines, impossible totals, or data that stops updating. The mat view with a broken refresh job looks perfectly healthy to the application — queries return instantly but data is wrong.

    Fix

    Add a freshness check to your application layer — query pg_matviews.last_refresh and surface a 'data as of X minutes ago' badge to users. Alert on PagerDuty if last_refresh is older than 2× the expected refresh interval.

Interview Questions on This Topic

  • QWhat's the difference between a regular view and a materialized view, and how would you decide which one to use for a dashboard that's queried 500 times per minute?Mid-levelReveal
    A regular view is a saved SQL definition that re-executes the query each time it's accessed — no storage, always fresh. A materialized view stores the query result as a physical table on disk — fast reads, but data can become stale until it's refreshed. For a dashboard hit 500 times per minute, you'd use a materialized view if the underlying query is at all expensive (e.g., multi-table joins, aggregations) and a few minutes of staleness is acceptable. Add indexes on the mat view for the filter columns the dashboard uses, and schedule a concurrent refresh every 1-5 minutes. If the data must be real-time, you'd use a trigger-maintained summary table instead, accepting the write overhead.
  • QExplain how REFRESH MATERIALIZED VIEW CONCURRENTLY works internally. Why does it require a unique index, and what are the trade-offs compared to a plain refresh?SeniorReveal
    Internally, CONCURRENTLY runs the full defining query to produce a new result set, then diffs it against the current stored rows using the unique index. It performs INSERT, UPDATE, and DELETE operations to apply only the changes — without taking an AccessExclusiveLock. The unique index is required because the diff algorithm needs to match rows between the old and new snapshots by their key. Trade-offs: CONCURRENTLY takes about twice as long because it does both the full query and the diff, and it temporarily doubles disk usage for the new snapshot. However, readers are never blocked. Use plain REFRESH if you can tolerate a short lock (e.g., during maintenance windows) — it's faster and simpler.
  • QYou have a materialized view built on top of another materialized view. A critical bug report comes in saying the executive dashboard shows revenue numbers that are one day behind, even though the refresh jobs have been running. What are the possible causes and how do you diagnose them?SeniorReveal
    Possible causes: (1) The refresh jobs run in incorrect order — the dependent view refreshes before the source view. (2) One of the refresh jobs is failing silently — check cron logs or pg_cron run details. (3) The source view's refresh job is running but taking longer than the interval, causing overlapping runs that get skipped. (4) Schema drift: a column was dropped or renamed on a base table, but the error only surfaces on the next refresh of the dependent view. Diagnosis steps: Check pg_matviews.last_refresh for both views to see when each was last successfully refreshed. Check the refresh job logs for errors. Run EXPLAIN on the defining queries of both views to verify base table references still resolve. If the dependent view was refreshed but the source wasn't, fix the order and add transactional wrapping. If both show recent refreshes but the dependent is still stale, check if the source view's refresh actually changed data — it might have been a no-op due to a transient error.

Frequently Asked Questions

Do materialized views update automatically when the underlying table changes?

No — in PostgreSQL and most databases, materialized views are NOT automatically updated when base table data changes. You must explicitly run REFRESH MATERIALIZED VIEW (or schedule it). The only exception is SQL Server's indexed views, which update synchronously on every DML statement, at the cost of increased write overhead.

Can you add indexes to a materialized view in PostgreSQL?

Yes, and you absolutely should. A PostgreSQL materialized view is stored as a regular heap table, so you can CREATE INDEX on any column just like a normal table. You also need at least one UNIQUE index if you want to use REFRESH MATERIALIZED VIEW CONCURRENTLY. Add indexes immediately after creation — they won't be created automatically.

What happens to a materialized view if I drop or rename a column in the base table?

In PostgreSQL, the materialized view keeps returning its last cached data without any error at query time — it just reads the stored rows, which still exist. The error only surfaces on the next REFRESH MATERIALIZED VIEW call. This means you can silently serve stale, incorrect data until the next scheduled refresh. Always refresh all dependent materialized views immediately after any schema migration to detect breakage right away.

How can I create a materialized view that refreshes automatically in PostgreSQL?

PostgreSQL does not have a built-in automatic refresh mechanism. You must use an external scheduler or the pg_cron extension. With pg_cron, you can schedule a function that runs REFRESH MATERIALIZED VIEW CONCURRENTLY at your desired interval. Other options include using Linux cron, application-level schedulers (like a background thread in your app), or orchestration tools like Airflow or dbt.

🔥
Naren Founder & Author

Developer and founder of TheCodeForge. I built this site because I was tired of tutorials that explain what to type without explaining why it works. Every article here is written to make concepts actually click.

← PreviousPartitioning Tables in SQLNext →Recursive SQL Queries
Forged with 🔥 at TheCodeForge.io — Where Developers Are Forged