PostgreSQL Read Replicas: Internals, Setup & Production Gotchas
At a certain scale, your PostgreSQL primary database starts groaning under the weight of thousands of simultaneous SELECT queries. Analytics dashboards, reporting jobs, search endpoints — they all hammer the same box that's also trying to handle your writes. The result is lock contention, CPU saturation, and latency spikes that affect paying users. This isn't a theoretical problem: it's the exact scaling wall that companies like Shopify, GitHub, and Notion have all hit and solved with read replicas.
Read replicas solve this by exploiting a fundamental truth about most web applications: roughly 80-90% of database operations are reads. By shipping every write from your primary to one or more standby servers in near-real-time using PostgreSQL's Write-Ahead Log (WAL), you can spread that read load horizontally without any application-level sharding complexity. The primary stays focused on writes; replicas absorb reads.
By the end of this article you'll understand exactly how WAL-based streaming replication works under the hood, how to set up a production-grade replica from scratch, how to route queries intelligently between primary and replica, and — critically — how to avoid the replication lag traps that have caused production incidents at companies you've heard of. This is the article that gets you from 'I know replicas exist' to 'I can architect and operate them confidently.'
How PostgreSQL Streaming Replication Actually Works Under the Hood
Every change in PostgreSQL — an INSERT, UPDATE, DELETE, even a schema change — is first written to the Write-Ahead Log (WAL) before it touches the actual data files. This isn't just for crash recovery; it's the backbone of replication.
When you enable streaming replication, the replica connects to the primary and starts a persistent TCP connection via the replication protocol. The primary runs a walsender process; the replica runs a walreceiver process. As new WAL records are generated on the primary, walsender streams them across the wire in near-real-time. The walreceiver writes them to the replica's own WAL, and then a separate startup process (in recovery mode) replays those WAL records against the replica's data files.
This replay is sequential and single-threaded by design — WAL records encode physical page changes, so they must be applied in exact order. This is why replicas have a hard ceiling on replay throughput: if your primary generates WAL faster than the replica can replay it, you accumulate replication lag.
Critically, the replica is in a permanent 'hot standby' state — it can serve read queries while simultaneously replaying WAL. PostgreSQL achieves this with careful conflict management: if a query on the replica touches a page that the WAL replay needs to modify, the query is cancelled with the error ERROR: canceling statement due to conflict with recovery. This is one of the most misunderstood behaviours in PostgreSQL replication.
-- ============================================================ -- Run this on your PRIMARY to inspect all connected replicas. -- pg_stat_replication is the single most important view for -- understanding the health of your replication setup. -- ============================================================ SELECT client_addr, -- IP address of the replica application_name, -- Name set by the replica (useful when you have many) state, -- 'streaming' is healthy; 'catchup' means it's behind sent_lsn, -- Last WAL position sent from primary write_lsn, -- Last WAL position written to disk on replica flush_lsn, -- Last WAL position flushed (durable) on replica replay_lsn, -- Last WAL position actually applied to replica data files -- Calculate lag in bytes — this is raw distance, not time (sent_lsn - replay_lsn) AS replication_lag_bytes, -- Calculate lag as a human-readable time estimate -- write_lag: network delay; flush_lag: disk delay; replay_lag: apply delay write_lag, flush_lag, replay_lag, sync_state -- 'async' or 'sync' — see the comparison table below FROM pg_stat_replication ORDER BY replay_lag DESC NULLS LAST; -- Put the most-lagged replica first -- ============================================================ -- Run this on your REPLICA to check its own recovery state. -- pg_is_in_recovery() = true means this server IS a replica. -- ============================================================ SELECT pg_is_in_recovery() AS is_replica, pg_last_wal_receive_lsn() AS last_wal_received, pg_last_wal_replay_lsn() AS last_wal_replayed, -- This is the lag in bytes from what was received vs replayed pg_last_wal_receive_lsn() - pg_last_wal_replay_lsn() AS replay_queue_bytes, pg_last_xact_replay_timestamp() AS last_transaction_replayed_at, -- How many seconds behind is this replica right now? EXTRACT(EPOCH FROM (now() - pg_last_xact_replay_timestamp())) AS lag_seconds;
client_addr | application_name | state | replication_lag_bytes | write_lag | flush_lag | replay_lag | sync_state
-------------+------------------+-----------+-----------------------+------------------+------------------+------------------+------------
10.0.1.22 | replica-us-east | streaming | 0 bytes | 00:00:00.001823 | 00:00:00.002104 | 00:00:00.018431 | async
10.0.1.45 | replica-us-west | streaming | 16384 bytes | 00:00:00.002011 | 00:00:00.003982 | 00:00:00.312044 | async
-- On REPLICA (pg_last_* functions):
is_replica | last_wal_received | last_wal_replayed | replay_queue_bytes | last_transaction_replayed_at | lag_seconds
-----------+-------------------+-------------------+--------------------+-------------------------------+-------------
t | 0/A1234F00 | 0/A1234D80 | 384 | 2024-11-14 09:23:41.882+00 | 0.31
Setting Up a Production-Grade Streaming Replica from Scratch
Setting up a replica isn't just running one command — there's a specific sequence of steps and each one has a production-relevant reason. We'll walk through it properly.
First, the primary needs the right configuration. wal_level must be replica or logical (not minimal), max_wal_senders must be greater than zero (set it to at least the number of replicas plus 2 for headroom), and wal_keep_size (PostgreSQL 13+) or wal_keep_segments ensures WAL files aren't deleted before the replica has consumed them — critical during network interruptions.
Next, you create a dedicated replication user on the primary. Never use a superuser for replication; the replication role only needs the REPLICATION attribute, which limits what it can do if compromised.
The initial data copy uses pg_basebackup — it takes a consistent snapshot of the primary's data directory while the database stays online. This is your replica's starting point. Then you add a standby.signal file and configure postgresql.conf on the replica to point at the primary. The replica boots, connects, requests WAL from where the base backup ended, and streams from there.
PG 12 changed recovery configuration significantly: recovery.conf was absorbed into postgresql.conf. If you're on PG 12+ (you should be), use primary_conninfo in postgresql.conf and create an empty standby.signal file — not recovery.conf.
#!/bin/bash # ============================================================ # STEP 1: Configure the PRIMARY server (run these on primary) # ============================================================ # Edit postgresql.conf on the primary — these are the minimal # required settings for streaming replication. cat >> /etc/postgresql/16/main/postgresql.conf << 'EOF' # --- Replication settings --- wal_level = replica # Enables WAL content needed for replication max_wal_senders = 5 # Max simultaneous replica connections (+2 buffer) wal_keep_size = 1024 # Keep 1GB of WAL on disk for lagging replicas # Prevents "requested WAL segment has already been removed" hot_standby = on # Allow replicas to serve read queries (default on in PG 10+) # Optional but important for monitoring: track_commit_timestamp = on # Needed for pg_last_committed_xact() on replica EOF # Create a dedicated replication user — minimal privileges, dedicated password psql -U postgres -c " CREATE ROLE replication_user WITH REPLICATION LOGIN PASSWORD 'use-a-real-secret-manager-in-production'; " # Allow the replica's IP to connect for replication in pg_hba.conf # Replace 10.0.1.22/32 with your actual replica IP echo "host replication replication_user 10.0.1.22/32 scram-sha-256" \ >> /etc/postgresql/16/main/pg_hba.conf # Reload configuration (no restart needed for pg_hba changes) pg_ctlcluster 16 main reload # ============================================================ # STEP 2: Take a base backup (run on the REPLICA server) # ============================================================ # pg_basebackup copies the primary's data directory live. # -R flag automatically writes primary_conninfo into postgresql.conf # AND creates the standby.signal file — huge time saver. # -P shows progress. -Xs streams WAL during the backup itself # so no WAL is missed between backup start and end. pg_basebackup \ --host=10.0.1.10 \ --port=5432 \ --username=replication_user \ --pgdata=/var/lib/postgresql/16/main \ --wal-method=stream \ --write-recovery-conf \ --checkpoint=fast \ --progress \ --verbose # Output during backup: # pg_basebackup: initiating base backup, waiting for checkpoint to complete # pg_basebackup: checkpoint completed # pg_basebackup: write-ahead log start point: 0/3000028 on timeline 1 # pg_basebackup: starting background WAL receiver # 30521/30521 kB (100%), 1/1 tablespace # pg_basebackup: write-ahead log end point: 0/3000100 # pg_basebackup: syncing data to disk ... # pg_basebackup: renaming backup_manifest.tmp to backup_manifest # pg_basebackup: base backup completed # ============================================================ # STEP 3: Verify the replica auto-configured correctly # (pg_basebackup -R does this, but let's see what it wrote) # ============================================================ grep -A5 'primary_conninfo' /var/lib/postgresql/16/main/postgresql.conf # Expected output: # primary_conninfo = 'host=10.0.1.10 port=5432 user=replication_user \ # password=use-a-real-secret-manager-in-production \ # application_name=replica-us-east' ls -la /var/lib/postgresql/16/main/standby.signal # Expected output: # -rw------- 1 postgres postgres 0 Nov 14 09:15 standby.signal # An empty file — its mere presence tells PostgreSQL "I am a replica" # ============================================================ # STEP 4: Start the replica # ============================================================ pg_ctlcluster 16 main start # Tail the logs to confirm it connected and started streaming: tail -f /var/log/postgresql/postgresql-16-main.log # Healthy startup output: # LOG: entering standby mode # LOG: redo starts at 0/3000028 # LOG: consistent recovery state reached at 0/3000100 # LOG: database system is ready to accept read-only connections # LOG: started streaming WAL from primary at 0/3000000 on timeline 1
psql -U postgres -c "SELECT client_addr, state, replay_lag FROM pg_stat_replication;"
client_addr | state | replay_lag
-------------+-----------+------------
10.0.1.22 | streaming | 00:00:00.021
# On REPLICA — confirm it's in recovery and serving reads:
psql -U postgres -c "SELECT pg_is_in_recovery(), pg_last_xact_replay_timestamp();"
pg_is_in_recovery | pg_last_xact_replay_timestamp
-------------------+-------------------------------
t | 2024-11-14 09:23:55.123+00
Routing Reads to Replicas: Connection Pooling, Drivers and Lag-Aware Logic
Having a replica running is only half the job. The harder part is routing traffic correctly — and doing it in a way that doesn't silently corrupt your application's data consistency guarantees.
The naive approach is splitting reads and writes at the application level: your ORM sends SELECT to a read DSN and everything else to a write DSN. This works, but it breaks for any pattern where you write and immediately read your own write — the replica may not have replayed that WAL yet, so you read stale data. Classic example: user registers, you redirect to their profile page, the profile page reads from the replica, and it 404s because the replica hasn't replayed the INSERT yet.
There are three increasingly sophisticated solutions: (1) Read critical post-write queries from the primary for a short window. (2) Use synchronous commit for specific transactions. (3) Track the LSN of your write and wait for the replica to catch up past it before reading from it.
For connection management, PgBouncer sits in front of your databases and provides connection pooling — essential because each PostgreSQL connection consumes ~5-10MB of RAM and a backend process. For read/write splitting with automatic failover awareness, tools like HAProxy with custom health checks, or purpose-built solutions like RDS Proxy or pgpool-II, are common in production.
The cleanest modern approach for application-level routing is using the PostgreSQL driver's built-in target server type support — psycopg3, asyncpg and the JDBC driver all support specifying target_session_attrs=any or prefer-standby in the connection string.
import psycopg from psycopg.rows import dict_row import logging from contextlib import contextmanager from typing import Generator logging.basicConfig(level=logging.INFO) logger = logging.getLogger(__name__) # ============================================================ # Connection strings for primary and replica. # In production, pull these from environment variables or # a secrets manager — never hardcode credentials. # ============================================================ PRIMARY_DSN = ( "host=10.0.1.10 " "port=5432 " "dbname=app_production " "user=app_user " "password=secret " "target_session_attrs=read-write" # Guarantees we connect to writable primary ) REPLICA_DSN = ( "host=10.0.1.22,10.0.1.45 " # List multiple replicas — driver picks one "port=5432 " "dbname=app_production " "user=app_user " "password=secret " "target_session_attrs=prefer-standby" # Prefer replica, fall back to primary ) @contextmanager def get_primary_connection() -> Generator[psycopg.Connection, None, None]: """Use for all writes and reads where consistency is critical.""" with psycopg.connect(PRIMARY_DSN, row_factory=dict_row) as conn: yield conn @contextmanager def get_replica_connection() -> Generator[psycopg.Connection, None, None]: """Use for reads that can tolerate a small amount of lag.""" with psycopg.connect(REPLICA_DSN, row_factory=dict_row) as conn: yield conn def get_replica_lag_seconds(replica_conn: psycopg.Connection) -> float: """ Check how many seconds behind this replica is before trusting it for a time-sensitive read. Returns 0.0 if on primary. """ with replica_conn.cursor() as cursor: cursor.execute(""" SELECT CASE WHEN pg_is_in_recovery() THEN EXTRACT(EPOCH FROM (now() - pg_last_xact_replay_timestamp())) ELSE 0 -- We're actually on the primary, no lag END AS lag_seconds; """) result = cursor.fetchone() return float(result['lag_seconds'] or 0) def create_user_and_fetch_profile(username: str, email: str) -> dict: """ Demonstrates the write-then-read consistency problem. WRONG pattern: write to primary, immediately read from replica. The replica may not have replayed the INSERT yet. RIGHT pattern: write to primary, confirm WAL position, then either read from primary or wait for replica to catch up. """ created_user_id = None write_lsn = None # --- WRITE: Always goes to primary --- with get_primary_connection() as primary_conn: with primary_conn.cursor() as cursor: cursor.execute( """ INSERT INTO users (username, email, created_at) VALUES (%s, %s, now()) RETURNING id; """, (username, email) ) created_user_id = cursor.fetchone()['id'] primary_conn.commit() # Capture the LSN immediately after commit. # This is the WAL position the replica must reach # before it will have our new user. cursor.execute("SELECT pg_current_wal_lsn() AS current_lsn;") write_lsn = cursor.fetchone()['current_lsn'] logger.info(f"User {created_user_id} created. WAL LSN at write: {write_lsn}") # --- READ: Use replica, but verify it's caught up past our write --- with get_replica_connection() as replica_conn: with replica_conn.cursor() as cursor: # Wait for the replica to replay past our write LSN. # pg_wal_lsn_diff returns 0 when replica has caught up. # Timeout of 5 seconds is a production-reasonable ceiling. cursor.execute( """ SELECT pg_wal_replay_wait(%s, timeout => 5000); """, (write_lsn,) # NOTE: pg_wal_replay_wait is PostgreSQL 16+. # For older versions, use a polling loop with # pg_last_wal_replay_lsn() and pg_wal_lsn_diff(). ) lag_after_wait = get_replica_lag_seconds(replica_conn) logger.info(f"Replica lag after wait: {lag_after_wait:.3f}s") # Now safe to read — replica has replayed our write cursor.execute( """ SELECT id, username, email, created_at FROM users WHERE id = %s; """, (created_user_id,) ) user_profile = cursor.fetchone() return dict(user_profile) # ============================================================ # ANALYTICS QUERIES — these are perfect replica candidates: # large scans, aggregate reports, no freshness requirement. # ============================================================ def get_daily_signups_report(days_back: int = 30) -> list[dict]: """Heavy aggregation — ideal for replica to absorb.""" with get_replica_connection() as conn: lag = get_replica_lag_seconds(conn) if lag > 60: # If replica is more than 60s behind, warn but continue — # for a daily report, 60s stale data is acceptable. logger.warning(f"Replica is {lag:.1f}s behind. Report may be slightly stale.") with conn.cursor() as cursor: cursor.execute( """ SELECT date_trunc('day', created_at) AS signup_date, COUNT(*) AS new_users FROM users WHERE created_at >= now() - make_interval(days => %s) GROUP BY 1 ORDER BY 1 DESC; """, (days_back,) ) return [dict(row) for row in cursor.fetchall()] if __name__ == "__main__": # Demonstrate lag-safe write-then-read profile = create_user_and_fetch_profile("alice_dev", "alice@example.com") print(f"Profile fetched from replica: {profile}") # Demonstrate analytics query routed to replica report = get_daily_signups_report(days_back=7) print(f"Daily signups (last 7 days): {report}")
INFO:replica_aware_database_client:Replica lag after wait: 0.019s
Profile fetched from replica: {'id': 4821, 'username': 'alice_dev', 'email': 'alice@example.com', 'created_at': datetime.datetime(2024, 11, 14, 9, 24, 1, 123456, tzinfo=datetime.timezone.utc)}
Daily signups (last 7 days): [
{'signup_date': datetime.datetime(2024, 11, 14, ...), 'new_users': 312},
{'signup_date': datetime.datetime(2024, 11, 13, ...), 'new_users': 298},
{'signup_date': datetime.datetime(2024, 11, 12, ...), 'new_users': 341}
]
Synchronous vs Asynchronous Replication — The Durability vs Performance Trade-off
By default, PostgreSQL replication is asynchronous: the primary commits a transaction and returns success to the client before confirming the replica has received or replayed that WAL. This is fast — sub-millisecond commits — but it means if the primary crashes at exactly the wrong moment, you could lose the last few committed transactions on the replica. The data is gone from the replica's perspective.
Synchronous replication changes this guarantee: the primary won't return success to the client until at least one designated synchronous replica has written (or flushed) the WAL. This gives you zero-RPO (Recovery Point Objective) — no committed transaction is ever lost. The cost is latency: every commit now waits for a network round-trip to the replica. On a LAN this might add 1-2ms. Cross-datacenter, it could add 20-80ms per commit — devastating for write-heavy OLTP workloads.
The synchronous_commit parameter is PostgreSQL's genius here: it's configurable per-session or per-transaction. You don't have to choose globally. You can require synchronous durability for financial transactions and use async for event logging, in the same application.
The synchronous_standby_names setting controls which replicas participate in synchronous commits, and it supports powerful quorum syntax: ANY 2 (replica1, replica2, replica3) means the primary waits for any 2 of 3 replicas — a quorum approach that tolerates one replica being down without stalling all writes.
One sharp edge: if synchronous_standby_names names a replica that disconnects, the primary will stall ALL commits indefinitely. This has taken down production systems. The fix is monitoring replica connectivity with aggressive alerting and having a runbook to disable synchronous commit if a sync replica fails.
-- ============================================================ -- On PRIMARY: configure which replicas must acknowledge commits. -- The application_name in primary_conninfo must match these names. -- ============================================================ -- Option A: Wait for ONE specific replica (simple, brittle) -- ALTER SYSTEM SET synchronous_standby_names = 'replica-us-east'; -- Option B: Quorum — wait for ANY 1 of 2 replicas (resilient) -- If one replica goes down, commits still succeed via the other. ALTER SYSTEM SET synchronous_standby_names = 'ANY 1 (replica-us-east, replica-eu-west)'; SELECT pg_reload_conf(); -- Apply without restart -- Verify the setting took effect: SHOW synchronous_standby_names; -- Result: ANY 1 (replica-us-east, replica-eu-west) -- ============================================================ -- Per-transaction synchronous_commit control. -- Use this to selectively require durability without -- making every transaction pay the latency cost. -- ============================================================ -- Example: financial transfer requires synchronous durability BEGIN; SET LOCAL synchronous_commit = on; -- This transaction: wait for replica ACK UPDATE accounts SET balance = balance - 500.00 WHERE account_id = 1001; UPDATE accounts SET balance = balance + 500.00 WHERE account_id = 2004; INSERT INTO transfer_log (from_account, to_account, amount, transferred_at) VALUES (1001, 2004, 500.00, now()); COMMIT; -- The COMMIT here will block until at least 1 of the 2 replicas acknowledges -- the WAL write. Safe to tell the user "your transfer is confirmed." -- Example: event tracking can tolerate some loss — use async for speed BEGIN; SET LOCAL synchronous_commit = off; -- This transaction: don't wait for replica INSERT INTO page_view_events (user_id, page_url, viewed_at) VALUES (8821, '/products/widget', now()); COMMIT; -- Returns immediately. In a crash scenario, this event might be lost. -- That's an acceptable trade-off for non-critical analytics data. -- ============================================================ -- Monitor synchronous replication ACK status in real time -- ============================================================ SELECT application_name, sync_state, -- 'sync', 'potential', or 'async' sent_lsn, flush_lsn, replay_lsn, -- 'sync': currently the active synchronous replica -- 'potential': would become sync if the active one fails -- 'async': purely asynchronous CASE sync_state WHEN 'sync' THEN 'ACTIVE sync replica — commits wait for this one' WHEN 'potential' THEN 'STANDBY sync replica — takes over if sync fails' WHEN 'async' THEN 'Pure async — commits never wait for this one' END AS sync_role_description FROM pg_stat_replication ORDER BY sync_state;
synchronous_standby_names
------------------------------------------
ANY 1 (replica-us-east, replica-eu-west)
-- pg_stat_replication sync states:
application_name | sync_state | sync_role_description
--------------------+------------+-----------------------------------------------
replica-us-east | sync | ACTIVE sync replica — commits wait for this one
replica-eu-west | potential | STANDBY sync replica — takes over if sync fails
| Aspect | Asynchronous Replication | Synchronous Replication |
|---|---|---|
| Commit latency | Unaffected — returns before replica ACKs | Adds 1-80ms per commit (network round-trip to replica) |
| Data loss on primary crash (RPO) | Possible — last few transactions lost | Zero — no committed transaction is ever lost |
| Write throughput | Full primary throughput | Limited by slowest sync replica's throughput |
| Replica failure impact | None — primary continues normally | All writes stall until replica reconnects or config changes |
| Best for | Analytics replicas, reporting, read scaling | Financial data, compliance workloads, zero-RPO requirements |
| synchronous_commit setting | off / local / remote_write | on / remote_apply |
| Quorum support | N/A | ANY N (replica1, replica2, ...) syntax |
| Typical production usage | 90%+ of replica deployments | Used alongside async for mixed workloads |
🎯 Key Takeaways
- PostgreSQL streaming replication is WAL-based: every change is first written to WAL, then streamed to replicas via a persistent walsender/walreceiver TCP connection. Replay is sequential and single-threaded, which is your ceiling on how fast lag can be recovered.
- Replication lag has three distinct components — write_lag (network), flush_lag (disk on replica), replay_lag (WAL application). Each points to a different bottleneck. Monitoring all three in pg_stat_replication is non-negotiable for production operations.
- The write-then-read consistency trap is the #1 real-world bug caused by replicas. The correct solution is LSN-based read-after-write validation using pg_wal_replay_wait() in PG16+, not 'sleep for 100ms and hope' or always routing to primary.
- Synchronous replication with a single named standby is an availability trap: one replica failure stalls all primary writes indefinitely. Always use quorum syntax (ANY N of M) in synchronous_standby_names for any synchronous setup with production SLAs.
⚠ Common Mistakes to Avoid
- ✕Mistake 1: Not setting wal_keep_size high enough — Symptom: replica fails with 'requested WAL segment has already been removed' after any network hiccup or maintenance window, forcing a full pg_basebackup re-sync. Fix: Set wal_keep_size to at least 1-2GB, AND configure a WAL archive (archive_mode = on with archive_command pointing to S3 or similar) so replicas can fetch historical WAL from the archive instead of failing outright. WAL archiving is your safety net.
- ✕Mistake 2: Reading from a replica immediately after a write without accounting for replication lag — Symptom: your application writes a record then immediately reads it back from the replica, getting a 'not found' result or stale data — a phantom read from the application's perspective. Users see their own changes disappear. Fix: Either use pg_wal_replay_wait() (PG16+) to wait for the replica to catch up to your write's LSN before reading, or route post-write reads to the primary for a short session window. Never assume replica freshness after a write without explicit verification.
- ✕Mistake 3: Forgetting that long-running queries on hot standby get cancelled — Symptom: replica users see ERROR: canceling statement due to conflict with recovery sporadically and unpredictably, especially during heavy writes on the primary. Fix: Tune hot_standby_feedback = on so the replica tells the primary 'don't vacuum rows my active queries are reading'. Also increase max_standby_streaming_delay and max_standby_archive_delay. Be aware that hot_standby_feedback can cause table bloat on the primary by preventing VACUUM from cleaning dead rows needed by replica queries — it's a trade-off you have to consciously accept.
Interview Questions on This Topic
- QWalk me through what happens at the byte level when a transaction commits on a PostgreSQL primary and how that change propagates to a streaming replica. Where can replication lag accumulate in this pipeline?
- QYour application writes a user record and immediately redirects the user to their profile page, which reads from a replica. Users are intermittently seeing 404s on their own profiles. What's happening and how would you fix it without routing all reads to the primary?
- QYou've configured synchronous replication with a single synchronous replica. At 2am the replica host crashes. What happens to your primary? What's your immediate remediation, and how would you redesign the setup to prevent this class of outage?
Frequently Asked Questions
Can a PostgreSQL read replica fall behind and how do you fix it?
Yes — replication lag accumulates when the primary generates WAL faster than the replica can replay it. This is most common during bulk writes, large batch jobs, or when the replica's disk is under pressure. Check replay_lag in pg_stat_replication on the primary. To reduce lag: ensure the replica has equivalent or better disk I/O than the primary, avoid running heavy analytical queries on the replica during peak write periods, and consider increasing work_mem on the replica to speed up WAL replay for certain operations.
Does a PostgreSQL read replica need to be the same hardware spec as the primary?
For a pure read replica, you can sometimes under-provision CPU compared to the primary since it's not handling writes or WAL generation overhead. However, disk I/O on the replica must be at least as fast as the primary because WAL replay is disk-bound. If the replica also needs to serve heavy analytical queries, it may actually need MORE RAM than the primary to maintain large working sets without evicting WAL replay buffers.
What's the difference between a read replica and a standby in PostgreSQL?
They're the same thing — the terminology is interchangeable in PostgreSQL's world. A 'hot standby' is a replica that accepts read queries while in recovery mode. It's called a 'standby' because it can be promoted to primary if the primary fails (via pg_promote() or a failover tool like Patroni). 'Read replica' is the application-centric term emphasising its role in read scaling. Every read replica is a hot standby, but a hot standby's primary purpose is high availability — read scaling is a bonus.
Written and reviewed by senior developers with real-world experience across enterprise, startup and open-source projects. Every article on TheCodeForge is written to be clear, accurate and genuinely useful — not just SEO filler.