Skip to content
Home Database MySQL Replication — Semi-Sync Timeout Lost 50 Transactions

MySQL Replication — Semi-Sync Timeout Lost 50 Transactions

Where developers are forged. · Structured learning · Free forever.
📍 Part of: MySQL & PostgreSQL → Topic 10 of 13
Detect semi-sync fallback with SHOW REPLICA STATUS after timeout lost 50 orders.
🔥 Advanced — solid Database foundation required
In this tutorial, you'll learn
Detect semi-sync fallback with SHOW REPLICA STATUS after timeout lost 50 orders.
  • Always use ROW-based binary logging in production — STATEMENT mode causes silent data divergence with non-deterministic functions like UUID() and NOW(), with zero error output to alert you.
  • GTID mode with SOURCE_AUTO_POSITION=1 is the only replication setup worth building in 2024 — failover drops from a manual 20-step process to 3 commands, and errant transactions become detectable instead of silent.
  • Seconds_Behind_Source is an unreliable lag metric on its own — pair it with a heartbeat table and GTID set comparison to get a lag number you can actually alert on in production.
✦ Plain-English analogy ✦ Real code with output ✦ Interview questions
Quick Answer
  • MySQL replication duplicates writes from a primary to one or more replicas using the binary log.
  • GTID mode automatically tracks transaction IDs, making failover simple and safe.
  • Position-based replication requires manual binlog file and offset tracking.
  • ROW-based logging is mandatory for consistency — STATEMENT mode causes silent data drift with functions like UUID().
  • Seconds_Behind_Source is unreliable — production monitoring needs heartbeat tables and GTID gap checks.
  • Parallel apply workers cut lag under heavy write loads, but must use LOGICAL_CLOCK parallelism.
🚨 START HERE

MySQL Replication Quick Debug Cheat Sheet

First-response commands for production replication issues
🟡

Replica lag > 10 seconds

Immediate ActionRun SHOW REPLICA STATUS to check Seconds_Behind_Source (SBS). Then check the GTID gap with SELECT MASTER_POS_WAIT('', -1) on replica and compare with primary.
Commands
SHOW REPLICA STATUS\G
SHOW PROCESSLIST; (on replica to see SQL/IO threads)
Fix NowIf lag is due to a single large statement, consider setting replica_parallel_workers = 4 and replica_parallel_type = LOGICAL_CLOCK. Restart replica SQL thread: STOP REPLICA; START REPLICA;
🟡

IO thread not running

Immediate ActionCheck Last_IO_Error in SHOW REPLICA STATUS. Verify network and credentials.
Commands
SHOW REPLICA STATUS\G | grep -E '(IO|Error)'
ping <primary_ip> ; mysql -u replication_user -p -h <primary_ip> -e 'SELECT 1'
Fix NowReset replication connection: STOP REPLICA; CHANGE REPLICATION SOURCE TO SOURCE_PASSWORD='...'; START REPLICA;
🟡

SQL thread stopped due to duplicate key

Immediate ActionIdentify the errant transaction GTID, skip it, then restart the SQL thread.
Commands
SHOW REPLICA STATUS\G | grep Last_SQL_Error
SHOW BINLOG EVENTS IN 'mysql-bin.00000X' LIMIT 10; (if using position) or SHOW GTID_SUBTRACT('retrieved', 'executed');
Fix NowSET GTID_NEXT='<errant_gtid>'; BEGIN; COMMIT; SET GTID_NEXT='AUTOMATIC'; START REPLICA;
Production Incident

The 50 Lost Transactions: Semi-Sync Timeout Betrayed Us

When the primary crashed during peak traffic, we promoted the replica and immediately saw missing orders. Customers were charged but their orders never existed in the new primary.
SymptomAfter promoting the replica to primary, the application reported 'Order not found' for 50 recently created orders that existed on the crashed primary's binary log but never reached the replica.
AssumptionSemi-synchronous replication was enabled, so every transaction acknowledged by the primary must have been received by at least one replica. The replica should have all committed data.
Root causeSemi-sync was configured with rpl_semi_sync_source_timeout = 10000 (10 seconds). During a network hiccup, the primary timed out waiting for the replica's ACK, fell back to asynchronous replication, and committed 50 transactions that the replica never received. The primary log showed these transactions with after_sync flag false.
FixSwitch to after_sync mode (rpl_semi_sync_source_wait_point = AFTER_SYNC) so the primary only commits after the replica ACKs. Set timeout to a value higher than the maximum observed replication latency, and alert on any fallback to async. Add synchronous mode (rpl_semi_sync_source_enabled = AFTER_SYNC) and require at least one replica ACK before commit.
Key Lesson
Semi-sync with AFTER_SYNC (after commit) is misleading — it allows commits before replica ACK. Always use AFTER_SYNC (wait before commit).Monitor rpl_semi_sync_source_status and rpl_semi_sync_source_clients to detect fallback.Never assume semi-sync guarantees zero data loss without verifying the mode and timeout settings.
Production Debug Guide

Common replication problems and the exact commands to run

SHOW REPLICA STATUS shows Replica_IO_Running: NoCheck Last_IO_Error for connection error. Verify network connectivity, primary is reachable, and replication user credentials are correct. Run mysql -u replication_user -p -h <primary_ip> to test basic connectivity.
SHOW REPLICA STATUS shows Replica_SQL_Running: NoCheck Last_SQL_Error for apply error (e.g., duplicate key, row not found). Use SHOW BINLOG EVENTS or stop replica, skip the offending transaction with SET GTID_NEXT='<gtid>'; BEGIN; COMMIT; / or use sql_slave_skip_counter (only if not using GTID). Then START REPLICA.
Seconds_Behind_Source is growing continuouslyCheck if replica_parallel_workers > 0 and replica_parallel_type = LOGICAL_CLOCK. Run SHOW PROCESSLIST on replica to identify stuck SQL threads. Check disk I/O on replica using iostat. Verify the replication heartbeat table lag to get true lag.
Replica lag suddenly jumps to a very high valueIdentify the large transaction that caused the jump. Use SHOW BINLOG EVENTS on primary or check MySQL slow query log for long-running transactions. Consider splitting large bulk operations into batches to reduce replication lag spike.
GTID replication fails with 'The slave is connecting using CHANGE MASTER TO MASTER_AUTO_POSITION = 1, but the master has purged binary logs containing GTIDs that the slave requires'The primary purged binlogs containing GTIDs the replica hasn't applied. Use SET GLOBAL binlog_expire_logs_seconds to a longer retention. Rebuild the replica from a fresh consistent backup taken after the missing GTIDs were still available, or manually inject missing GTIDs using SET GTID_NEXT (if you know the missing GTIDs).

Every production database running at scale eventually hits the same wall: a single server can only handle so many reads before query times creep up, and a single point of failure is a liability no on-call engineer wants at 2 AM. MySQL replication is the foundational answer to both problems — it lets you distribute reads across multiple servers and gives you a warm standby that can take over within seconds if the primary crashes. It's not optional infrastructure. At any company running MySQL seriously, replication is day one.

The core problem replication solves is deceptively simple: how do you keep two or more MySQL instances in sync without locking everything or doubling write traffic? The answer lives in the binary log — a sequential, append-only journal of every state-changing event on the primary. Replicas connect, stream that log, and apply events in order. But the devil is in the details — GTID vs file-position tracking, parallel apply workers, semi-synchronous acknowledgment, replication filters, and lag under heavy write load are all areas where teams get burned in production.

By the end of this article you'll be able to configure a production-grade primary/replica pair from scratch, understand the exact bytes flowing between them, switch between GTID and position-based replication with confidence, monitor and diagnose replication lag, and avoid the three most expensive mistakes engineers make when they first set this up. This is not a tutorial for spinning up a throwaway demo — it's the setup you'd be comfortable putting in front of real traffic.

How MySQL Replication Actually Works Under the Hood

Before touching a config file, you need a mental model of what's actually happening. MySQL replication is driven by three threads. On the primary, the binlog dump thread streams binary log events to any replica that connects. On the replica, the I/O thread maintains a persistent TCP connection to the primary, receives those events, and writes them into a local file called the relay log. Meanwhile, the SQL thread (or parallel apply workers in modern MySQL) reads the relay log and executes each event against the replica's local storage engine.

The binary log is not a simple text log — it's a structured binary format that records events at the statement level (STATEMENT mode), the row level (ROW mode), or a hybrid of both (MIXED mode). ROW-based replication is what you want in production. It records the actual before and after image of each changed row, which makes it deterministic — no ambiguity about what a non-deterministic function like UUID() or NOW() returned. STATEMENT mode can cause silent data divergence when queries contain functions whose output differs between servers.

GTID (Global Transaction Identifier) is a unique ID assigned to every committed transaction on the primary, formatted as source_uuid:transaction_sequence. GTIDs make failover and replica reconfiguration dramatically safer because a replica can tell any server 'here are the GTIDs I already have — send me everything else,' without needing to know a specific file name and byte offset. For any new setup in 2024, GTID mode is the right default.

check_replication_internals.sql · SQL
123456789101112131415161718192021222324252627282930313233343536
-- ============================================================
-- Run these on the PRIMARY to inspect binlog state
-- ============================================================

-- Show all binary log files and their sizes on disk
SHOW BINARY LOGS;
-- Output tells you how many logs exist and how much disk they consume

-- Inspect the live stream of binary log events (human-readable)
-- Replace 'mysql-bin.000003' with your current binlog file
SHOW BINLOG EVENTS IN 'mysql-bin.000003' LIMIT 20;

-- Confirm GTID mode is enabled and see executed GTID sets
SHOW VARIABLES LIKE 'gtid_mode';          -- Should return ON
SHOW VARIABLES LIKE 'enforce_gtid_consistency'; -- Should return ON
SHOW MASTER STATUS\G
-- gtid_executed column shows every GTID this server has committed

-- ============================================================
-- Run these on the REPLICA to inspect the three replication threads
-- ============================================================

-- The single most important replication health command
SHOW REPLICA STATUS\G
-- Key fields to check:
--   Replica_IO_Running: Yes        -- I/O thread is connected to primary
--   Replica_SQL_Running: Yes       -- SQL thread is applying relay log
--   Seconds_Behind_Source: 0       -- 0 means replica is caught up
--   Last_IO_Error: (empty)         -- Any value here means connection trouble
--   Last_SQL_Error: (empty)        -- Any value here means apply-side failure
--   Retrieved_Gtid_Set            -- GTIDs received from primary
--   Executed_Gtid_Set             -- GTIDs applied to this replica's data

-- Check how many parallel SQL apply workers are configured
SHOW VARIABLES LIKE 'replica_parallel_workers'; -- 0 = single-threaded
SHOW VARIABLES LIKE 'replica_parallel_type';    -- LOGICAL_CLOCK is the modern choice
▶ Output
-- SHOW BINARY LOGS output (example):
+------------------+-----------+-----------+
| Log_name | File_size | Encrypted |
+------------------+-----------+-----------+
| mysql-bin.000001 | 177 | No |
| mysql-bin.000002 | 52428800 | No |
| mysql-bin.000003 | 1024312 | No |
+------------------+-----------+-----------+

-- SHOW MASTER STATUS\G output (example with GTID):
*************************** 1. row ***************************
File: mysql-bin.000003
Position: 1024312
Binlog_Do_DB:
Binlog_Ignore_DB:
Executed_Gtid_Set: a1b2c3d4-0000-0000-0000-000000000001:1-58291

-- SHOW REPLICA STATUS\G (healthy replica):
Replica_IO_Running: Yes
Replica_SQL_Running: Yes
Seconds_Behind_Source: 0
Last_IO_Error:
Last_SQL_Error:
Retrieved_Gtid_Set: a1b2c3d4-0000-0000-0000-000000000001:1-58291
Executed_Gtid_Set: a1b2c3d4-0000-0000-0000-000000000001:1-58291
⚠ Watch Out: STATEMENT vs ROW Replication
If your binlog_format is set to STATEMENT and any query calls UUID(), RAND(), NOW(), or a user-defined function, the replica will silently execute that function again and get a different result. You'll have data drift with zero error messages. Set binlog_format=ROW in production — always. The extra disk usage is worth the determinism guarantee.

Step-by-Step: Configuring a GTID-Based Primary/Replica Pair

Let's build a working replication setup. We'll use two servers: primary at 10.0.1.10 and replica at 10.0.1.11, both running MySQL 8.0+. The configuration changes require a restart, so plan your maintenance window accordingly — or use MySQL's SET PERSIST for variables that support online changes.

The primary needs the binary log enabled with a unique server ID, and it needs a dedicated replication user with minimal privileges. The replication user should only have REPLICATION SLAVE privilege — nothing else. Least privilege matters here because this account's credentials live in plaintext in older versions, and even with mysql.slave_master_info table encryption in 8.0, you don't want a compromised replica credential giving an attacker DML access.

On the replica side, you set a distinct server ID, point it at the primary using CHANGE REPLICATION SOURCE TO (the modern syntax replacing CHANGE MASTER TO), and start the replication threads. With GTID mode, you don't need to know a binlog file or position — you just tell the replica 'start from the beginning of GTIDs I don't have.' That's the magic of GTID: the protocol figures out the gap automatically.

For an initial data load, use mysqldump with --single-transaction --master-data=2 --gtids, or for large datasets, Percona XtraBackup which takes a hot physical copy without locking tables. Never skip the initial consistent snapshot — if you do, the replica starts with wrong data and GTID tracking gives you a false sense of correctness.

mysql_primary_replica_config.cnf · INI
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152
# ============================================================
# PRIMARY SERVER: /etc/mysql/mysql.conf.d/mysqld.cnf
# Apply to server at 10.0.1.10, then restart MySQL
# ============================================================
[mysqld]
# Every server in a replication topology MUST have a unique ID
server-id                    = 1

# Enable the binary log — replication cannot work without this
log-bin                      = mysql-bin
binlog_format                = ROW          # Deterministic row-level events
binlog_row_image             = FULL         # Log complete before+after row images

# Retain binlogs for 7 days — long enough for replicas to catch up after downtime
binlog_expire_logs_seconds   = 604800

# GTID settings — both must be ON together
gtid_mode                    = ON
enforce_gtid_consistency     = ON

# Crash safety: flush binlog to disk on every commit (safer, slightly slower)
sync_binlog                  = 1
innodb_flush_log_at_trx_commit = 1

# ============================================================
# REPLICA SERVER: /etc/mysql/mysql.conf.d/mysqld.cnf
# Apply to server at 10.0.1.11, then restart MySQL
# ============================================================
[mysqld]
server-id                    = 2            # Must differ from primary's ID

# Relay log stores events received from primary before they're applied
relay-log                    = relay-bin
relay_log_recovery           = ON           # Auto-recover relay log on crash

# Write replica's own changes to its binlog — essential if this replica
# might ever become a primary (chained or cascaded replication)
log-bin                      = mysql-bin
log_replica_updates          = ON

# GTID — must match primary
gtid_mode                    = ON
enforce_gtid_consistency     = ON

# Parallel apply workers dramatically reduce lag under heavy write workloads
# LOGICAL_CLOCK preserves transaction ordering from the primary's commit order
replica_parallel_workers     = 4
replica_parallel_type        = LOGICAL_CLOCK

# Make replica read-only — prevents accidental writes that cause divergence
read_only                    = ON
super_read_only              = ON           # Also blocks SUPER privilege users
▶ Output
# After restarting MySQL on both servers, verify with:
# On PRIMARY:
mysql> SHOW VARIABLES LIKE 'server_id';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| server_id | 1 |
+---------------+-------+

mysql> SHOW VARIABLES LIKE 'gtid_mode';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| gtid_mode | ON |
+---------------+-------+

# On REPLICA:
mysql> SHOW VARIABLES LIKE 'server_id';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| server_id | 2 |
+---------------+-------+

mysql> SHOW VARIABLES LIKE 'read_only';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| read_only | ON |
+---------------+-------+
💡Pro Tip: super_read_only Is Your Safety Net
Setting only read_only=ON still allows users with the SUPER privilege to write to the replica — and in MySQL 8.0 almost every application user has been granted it carelessly at some point. Set super_read_only=ON on every replica. It blocks writes from ALL users including root, and is the only reliable guard against accidental replica writes that cause silent data divergence. You can disable it temporarily in an emergency with SET GLOBAL super_read_only=OFF.

Bootstrapping Replication, Monitoring Lag, and Handling Failover

Configuration files are just the foundation. Actually starting replication requires three steps: create the replication user on the primary, take a consistent snapshot of primary data and load it into the replica, then issue CHANGE REPLICATION SOURCE TO on the replica and START REPLICA.

Lag monitoring is where teams get caught off guard. Seconds_Behind_Source in SHOW REPLICA STATUS is calculated as the difference between the current clock time and the timestamp embedded in the relay log event being applied. This sounds fine until you realize it resets to NULL when the SQL thread stalls, reports 0 when the I/O thread is lagging (the replica thinks it's caught up because it hasn't received the new events yet), and can spike from 0 to thousands of seconds instantly during a large transaction. Production monitoring should combine Seconds_Behind_Source with the GTID gap and heartbeat tables.

For failover, the clean approach with GTIDs is to run STOP REPLICA on the old replica, verify Executed_Gtid_Set matches the primary's gtid_executed, then promote with STOP REPLICA; RESET REPLICA ALL; SET GLOBAL read_only=OFF. Tools like Orchestrator or MySQL Router automate this entire flow and handle edge cases like errant transactions that exist on a replica but not the primary — a situation that can permanently break GTID-based replication if you try to re-add that server.

bootstrap_and_monitor_replication.sql · SQL
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104
-- ============================================================
-- STEP 1: Run on PRIMARY — create the replication user
-- ============================================================
CREATE USER 'replication_user'@'10.0.1.11'
    IDENTIFIED WITH caching_sha2_password  -- Modern auth plugin for MySQL 8.0
    BY 'StrongR3plPass!2024';

-- Only grant the minimum required privilege
GRANT REPLICATION SLAVE ON *.* TO 'replication_user'@'10.0.1.11';
FLUSH PRIVILEGES;

-- Verify the user exists with the right privileges
SHOW GRANTS FOR 'replication_user'@'10.0.1.11';

-- ============================================================
-- STEP 2: Take a consistent snapshot on PRIMARY
-- Run this in a shell, NOT in mysql client
-- --single-transaction avoids table locks for InnoDB
-- --source-data=2 adds CHANGE REPLICATION SOURCE comment to dump
-- --triggers --routines --events captures full schema
-- ============================================================
-- $ mysqldump \
--     --single-transaction \
--     --source-data=2 \
--     --set-gtid-purged=ON \
--     --triggers \
--     --routines \
--     --events \
--     --all-databases \
--     -u root -p > /backup/primary_snapshot_$(date +%Y%m%d_%H%M%S).sql

-- ============================================================
-- STEP 3: Load snapshot on REPLICA (shell command)
-- ============================================================
-- $ mysql -u root -p < /backup/primary_snapshot_2024XXXX_XXXXXX.sql

-- ============================================================
-- STEP 4: Run on REPLICA — connect to primary and start replication
-- GTID mode means we don't need SOURCE_LOG_FILE or SOURCE_LOG_POS
-- ============================================================
CHANGE REPLICATION SOURCE TO
    SOURCE_HOST      = '10.0.1.10',
    SOURCE_PORT      = 3306,
    SOURCE_USER      = 'replication_user',
    SOURCE_PASSWORD  = 'StrongR3plPass!2024',
    SOURCE_AUTO_POSITION = 1,   -- This is the GTID magic: find my own gap automatically
    SOURCE_SSL       = 1,       -- Always use SSL for replication in production
    SOURCE_SSL_CA    = '/etc/mysql/ssl/ca.pem',
    SOURCE_SSL_CERT  = '/etc/mysql/ssl/replica-cert.pem',
    SOURCE_SSL_KEY   = '/etc/mysql/ssl/replica-key.pem',
    SOURCE_CONNECT_RETRY = 10,  -- Retry connection every 10 seconds on disconnect
    SOURCE_RETRY_COUNT  = 86400; -- Retry for up to 24 hours before giving up

-- Start both replication threads
START REPLICA;

-- Confirm everything is healthy
SHOW REPLICA STATUS\G

-- ============================================================
-- ONGOING MONITORING: Detect real lag using a heartbeat approach
-- Run on PRIMARY every 5 seconds (via cron or application job)
-- ============================================================
CREATE TABLE IF NOT EXISTS monitoring.replication_heartbeat (
    server_id        INT          NOT NULL,
    heartbeat_ts     DATETIME(6)  NOT NULL,
    PRIMARY KEY (server_id)
) ENGINE=InnoDB;

-- Insert/update the heartbeat timestamp
INSERT INTO monitoring.replication_heartbeat (server_id, heartbeat_ts)
    VALUES (@@server_id, NOW(6))
    ON DUPLICATE KEY UPDATE heartbeat_ts = NOW(6);

-- Run this on REPLICA to calculate true lag in seconds
SELECT
    TIMESTAMPDIFF(
        MICROSECOND,
        heartbeat_ts,
        NOW(6)
    ) / 1000000.0 AS replication_lag_seconds,
    heartbeat_ts   AS last_primary_write
FROM monitoring.replication_heartbeat
WHERE server_id = 1;  -- Primary's server_id

-- ============================================================
-- FAILOVER: Promote replica to primary (after primary failure)
-- ============================================================
-- Verify replica has all transactions before promoting
SHOW REPLICA STATUS\G  -- Confirm Seconds_Behind_Source: 0

-- Stop replica threads cleanly
STOP REPLICA;

-- Disconnect from old primary and clear replication config
RESET REPLICA ALL;

-- Lift read-only restrictions
SET GLOBAL super_read_only = OFF;
SET GLOBAL read_only       = OFF;

-- Verify this server is now writable and has all GTIDs
SHOW MASTER STATUS\G
SHOW VARIABLES LIKE 'read_only';
▶ Output
-- SHOW GRANTS output:
+------------------------------------------------------------------+
| Grants for replication_user@10.0.1.11 |
+------------------------------------------------------------------+
| GRANT REPLICATION SLAVE ON *.* TO `replication_user`@`10.0.1.11`|
+------------------------------------------------------------------+

-- SHOW REPLICA STATUS\G (after START REPLICA — healthy state):
Replica_IO_State: Waiting for source to send event
Source_Host: 10.0.1.10
Source_Port: 3306
Connect_Retry: 10
Source_Log_File: mysql-bin.000003
Read_Source_Log_Pos: 1024312
Relay_Log_File: relay-bin.000004
Relay_Log_Pos: 487
Relay_Source_Log_File: mysql-bin.000003
Replica_IO_Running: Yes
Replica_SQL_Running: Yes
Seconds_Behind_Source: 0
Last_IO_Error:
Last_SQL_Error:
Retrieved_Gtid_Set: a1b2c3d4-...:1-58291
Executed_Gtid_Set: a1b2c3d4-...:1-58291
Auto_Position: 1

-- Heartbeat lag query output (replica fully caught up):
+------------------------+----------------------------+
| replication_lag_seconds | last_primary_write |
+------------------------+----------------------------+
| 0.000312 | 2024-03-15 14:32:01.887441 |
+------------------------+----------------------------+
🔥Interview Gold: Why Seconds_Behind_Source Can Lie
Interviewers love this one. Seconds_Behind_Source measures the timestamp difference between now and the relay log event being applied — but if the I/O thread is lagging (primary is far ahead but events haven't been received yet), the metric shows 0 while you're actually minutes behind. Always pair it with GTID set comparison or a heartbeat table for trustworthy lag detection in production alerting.

Replication Lag in Production: Root Causes and Monitoring Beyond Seconds_Behind_Source

Replication lag is the silent killer of read scalability. Your application reads from replicas thinking it's getting fresh data, but if the replica is behind, every query returns stale results. The problem is that nobody notices until a customer complains about missing an order that was created 30 seconds ago.

The root causes of lag fall into three categories
  • Long-running transactions on primary: A single UPDATE that modifies a million rows generates a giant binlog event. The replica must apply that entire event in a single transaction, which can block the SQL thread for minutes.
  • Parallel replication misconfiguration: If you set replica_parallel_workers > 0 but leave replica_parallel_type = DATABASE, parallelism only works across different databases. A single-database app gets single-threaded apply. LOGICAL_CLOCK is the fix.
  • Hardware bottlenecks on replica: Replicas often run on cheaper hardware with slower disks. A write-heavy primary overwhelms the replica's I/O capacity, causing relay log buildup and lag.

Monitoring lag properly requires a three-metric approach: (1) Seconds_Behind_Source for quick glance, (2) GTID gap: compute the difference between primary's gtid_executed and replica's retrieved_gtid_set, (3) Heartbeat table: insert a timestamp on primary every 5 seconds and read it on replica to calculate true lag independent of clock skew.

check_lag.sh · BASH
123456789101112131415
#!/bin/bash
# Simple lag check combining GTID and heartbeat
PRIMARY_GTID=$(mysql -h 10.0.1.10 -e "SELECT @@gtid_executed" -s 2>/dev/null)
REPLICA_GTID=$(mysql -h 10.0.1.11 -e "SELECT @@gtid_executed" -s 2>/dev/null)
MISSING=$(mysql -h 10.0.1.11 -e "SELECT GTID_SUBTRACT('$PRIMARY_GTID', '$REPLICA_GTID')" -s 2>/dev/null)
if [ "$MISSING" != "" ]; then
  echo "WARNING: Replica missing $MISSING"
else
  echo "GTID sets are identical."
fi

HEARTBEAT_LAG=$(mysql -h 10.0.1.11 -e "\
  SELECT TIMESTAMPDIFF(MICROSECOND, heartbeat_ts, NOW(6))/1000000.0 \
  FROM monitoring.replication_heartbeat WHERE server_id=1" -s 2>/dev/null)
echo "Heartbeat lag: $HEARTBEAT_LAG seconds"
Mental Model
The Water Tank Mental Model for Replication Lag
Think of the primary as a tap filling a tank (relay log) on the replica, and the replica's SQL thread as a drain at the bottom.
  • Tap flow = write rate on primary (events per second).
  • Drain flow = apply rate on replica (constrained by hardware, parallelism, and transaction size).
  • Tank level = relay log backlog. If tap flows faster than drain, tank overflows → replica crashes or disk runs out.
  • Seconds_Behind_Source is like measuring the time since the last drop of water hit the drain — not how much water is in the tank.
  • The heart of the problem: you need a lag metric that measures how far the water level is from the top, not just the drop rate.
📊 Production Insight
The single most common lag cause we see: replica_parallel_type = DATABASE when the app uses one schema.
LOGICAL_CLOCK is the only mode that parallelizes within a single database, using the primary's binary log group commit timestamps.
Rule: if your app is single-database, set replica_parallel_type=LOGICAL_CLOCK or accept single-threaded apply.
🎯 Key Takeaway
Don't trust Seconds_Behind_Source alone.
Pair it with GTID gap and heartbeat table.
The three-metric approach catches all lag types.
Replication Lag Diagnosis Decision Tree
IfSeconds_Behind_Source > 0 AND Relay log file growing
UseReplica apply speed < primary write speed. Check I/O, parallel workers, and long transactions.
IfSeconds_Behind_Source = 0 BUT application sees stale data
UseI/O thread is lagging — primary has new events not yet received. Check network latency or primary load.
IfSeconds_Behind_Source NULL
UseSQL thread stalled (error or stop). Check Last_SQL_Error.
IfHeartbeat lag > 5 seconds despite Seconds_Behind_Source = 0
UseClock skew between servers. Sync NTP on both.

Failover Strategies and Common Pitfalls

Promoting a replica to primary is the moment of truth. You need to guarantee that the promoted replica has all the data the old primary committed, and that there's no split-brain scenario where two servers accept writes.

With GTID, the procedure is: STOP REPLICA; RESET REPLICA ALL; SET GLOBAL read_only=OFF. That's it — the replica already has all transactions the old primary committed (accounting for potential loss with async replication). The risk comes from errant transactions: a write that accidentally happened on the replica (e.g., a stray admin query, a cron job, or a buggy migration). That errant GTID will exist on the replica's executed set but not on the primary's. When you promote the replica, that transaction is now part of the new primary's history, and if you ever try to re-add the old primary, the old primary will see a GTID it doesn't have and refuse to connect.

Tools like Orchestrator handle this by comparing gtid_purged and purging errant transactions with a controlled GTID skip. But the best defense is prevention: super_read_only=ON on all replicas, strict access controls, and never running any DML on replicas.

Another common pitfall: assuming semi-synchronous replication guarantees zero data loss. As we saw in the production incident, AFTER_SYNC mode (value at 1) allows the primary to commit before receiving the replica's ACK if the timeout expires. For zero loss, use AFTER_SYNC with a very high timeout or synchronous replication using a plugin that blocks the commit until the ACK arrives.

failover_rollback.sh · BASH
12345678910111213141516171819
#!/bin/bash
# Script to safely promote replica and handle possible errant transactions
# Usage: ./promote_replica.sh <replica_ip>

set -e

REPLICA_IP=${1:?"Usage: $0 <replica_ip>"}

# Step 1: Stop replica and check errant GtIDS
MISSING=$(mysql -h $REPLICA_IP -e "SELECT GTID_SUBTRACT(@@gtid_executed, @@gtid_purged)" -s 2>/dev/null)
if [ -n "$MISSING" ]; then
  echo "WARNING: Errant transactions detected: $MISSING"
  echo "Manual intervention required to skip before promotion."
  exit 1
fi

# Step 2: Stop replication and promote
mysql -h $REPLICA_IP -e "STOP REPLICA; RESET REPLICA ALL; SET GLOBAL super_read_only=OFF; SET GLOBAL read_only=OFF;"
echo "Replica promoted to primary. Update application connection string to $REPLICA_IP"
⚠ The Errant Transaction Trap
Errant transactions are the #1 cause of GTID replication failures after failover. A single stray INSERT on a replica creates a GTID that the old primary won't have, making it impossible to re-add the old primary using auto-position. The fix requires manually injecting an empty skip transaction — step that many engineers forget until they're in a firefight. Prevention: super_read_only=ON in every replica config, not just at runtime.
📊 Production Insight
We once lost an entire day of replication recovery because a legacy cron job ran UPDATE on a replica.
The errant GTID blocked re-adding the old primary after a failover test.
Fix: always set super_read_only=ON and audit replica access.
🎯 Key Takeaway
Failover is not the hard part.
Re-adding the old primary and handling errant transactions is where teams break.
Prevent errant writes with super_read_only; script the skip procedure before you need it.
Failover Decision Tree: Which Method to Use
IfPlanned switchover with both servers accessible
UseControlled failover: set old primary read-only, verify replica caught up, promote replica. Zero data loss possible with after_sync semi-sync.
IfPrimary unreachable, async replication
UseEmergency failover: promote the most advanced replica. Expect potential data loss of transactions in flight. Check Relay_Master_Log_File and Exec_Master_Log_Pos for guess.
IfPrimary unreachable, after_sync semi-sync enabled
UseFair chance of zero loss if the ACKed transactions were committed. Promote the replica that acknowledged them. Still verify via GTID gap.
IfNeed to re-add old primary after failover
UseHandle errant transactions: compare GTID sets, skip mismatches via SET GTID_NEXT. Use SET GLOBAL gtid_purged to align if needed (advanced, dangerous).
🗂 GTID vs Position-Based Replication
Which one should you use in 2024?
Feature / AspectGTID-Based ReplicationPosition-Based (File+Offset) Replication
Failover complexitySimple — CHANGE REPLICATION SOURCE with AUTO_POSITION=1Complex — must find exact binlog file + byte position on new primary
Errant transaction riskHigh visibility — GTID gap is detectable and blocks re-addSilent — divergence can go undetected until queries return wrong data
Multi-source replicationStraightforward — GTIDs are globally unique across all sourcesError-prone — file/position namespaces can collide
Replication filter compatibilityRequires care — filtered transactions still consume a GTIDFilters work without GTID bookkeeping complications
Initial setup complexitySlightly more config (enforce_gtid_consistency restrictions)Simpler initial config
Tooling support (Orchestrator, ProxySQL)First-class support — topology-aware failover built around GTIDsSupported but requires more manual coordination
mysqldump compatibilityRequires --set-gtid-purged=ON flagWorks with default --master-data=2 flag
Crash recoveryAutomatic GTID recovery — no manual position huntingManual: must check binlog position from InnoDB recovery
Production recommendation (2024)✅ Default choice for all new setupsLegacy setups only — migrate to GTID when possible

🎯 Key Takeaways

  • Always use ROW-based binary logging in production — STATEMENT mode causes silent data divergence with non-deterministic functions like UUID() and NOW(), with zero error output to alert you.
  • GTID mode with SOURCE_AUTO_POSITION=1 is the only replication setup worth building in 2024 — failover drops from a manual 20-step process to 3 commands, and errant transactions become detectable instead of silent.
  • Seconds_Behind_Source is an unreliable lag metric on its own — pair it with a heartbeat table and GTID set comparison to get a lag number you can actually alert on in production.
  • super_read_only=ON must live in my.cnf, not just SET GLOBAL at runtime — a replica restart without it in the config file silently drops read protection, and the next errant write breaks your GTID topology at the worst possible moment.
  • Parallel apply workers with LOGICAL_CLOCK type are the difference between a replica that keeps up under load and one that falls behind into permanent lag.

⚠ Common Mistakes to Avoid

    Duplicate server-id across the topology
    Symptom

    Replication connects but events are silently skipped or looped; SHOW REPLICA STATUS shows IO thread connecting and disconnecting in a loop with error 'Fatal error: The slave I/O thread stops because master and slave have equal MySQL server ids'.

    Fix

    Every server in the entire topology (primaries, replicas, and intermediate replicas) must have a globally unique server-id. Audit all servers with SELECT @@server_id before adding a new node, and document IDs in your infrastructure config management system.

    Writing directly to a replica
    Symptom

    Data exists on the replica that never existed on the primary; GTID replication breaks with 'Error 1236: The slave is connecting using CHANGE MASTER TO MASTER_AUTO_POSITION = 1, but the master has purged binary logs containing GTIDs that the slave requires' on the next failover attempt.

    Fix

    Set both read_only=ON AND super_read_only=ON in the replica's my.cnf, not just at runtime. Runtime-only settings are lost on restart. Errant transactions that already exist must be skipped by injecting an empty transaction with the errant GTID: SET GTID_NEXT='errant-gtid:N'; BEGIN; COMMIT; SET GTID_NEXT='AUTOMATIC';

    Setting replica_parallel_workers without LOGICAL_CLOCK
    Symptom

    Replica lag is high under write-heavy workloads even with parallel workers enabled; or worse, replica applies transactions out of order causing intermittent constraint violations and SQL thread errors.

    Fix

    If you enable replica_parallel_workers > 0, also set replica_parallel_type=LOGICAL_CLOCK and replica_preserve_commit_order=ON. LOGICAL_CLOCK uses the primary's binary log group commit timestamps to apply transactions in parallel only when it's safe to do so, preserving the original commit ordering. The default DATABASE parallelism mode only parallelizes across different schemas, which is nearly useless for single-database applications.

Interview Questions on This Topic

  • QA replica is showing Seconds_Behind_Source: 0 in SHOW REPLICA STATUS, but your application is reading stale data from it. How do you diagnose the true replication lag, and what could explain this discrepancy?SeniorReveal
    Seconds_Behind_Source measures the difference between the timestamp of the relay log event currently being applied and the current time on the replica. If the I/O thread is lagging (events not yet received), no events are being applied, so Seconds_Behind_Source may report 0 (since the last applied event is very recent relative to the replica's clock). To diagnose true lag: compare GTID sets between primary and replica using SELECT GTID_SUBTRACT(primary_gtid_executed, replica_gtid_executed). Also set up a heartbeat table on the primary that updates every second; read that value on the replica to see actual data freshness. Common causes: network latency, primary overloaded with writes, or large transactions that haven't been transmitted yet.
  • QExplain the difference between a retrieved GTID set and an executed GTID set on a replica, and describe a scenario where they would differ significantly. What does that difference tell you operationally?Mid-levelReveal
    Retrieved_Gtid_Set contains GTIDs that the I/O thread has received from the primary but not yet applied. Executed_Gtid_Set contains GTIDs that have been applied by the SQL thread. A significant difference (retrieved >> executed) indicates that the relay log backlog is growing — the replica is receiving events faster than it can apply them. This is a sign of apply lag. Operational action: check replica I/O, parallel worker config, or whether a large transaction is blocking the SQL thread.
  • QYour team needs to promote a replica to primary after an unexpected primary failure. You notice the replica's Executed_Gtid_Set is missing 50 transactions compared to the crashed primary's last known gtid_executed. The crashed primary is unrecoverable. How do you handle the missing transactions, and what are the tradeoffs of your decision?SeniorReveal
    In this scenario, the 50 missing transactions are permanently lost because the primary is unrecoverable. You have two options: 1) Promote the replica anyway and accept data loss: This is the fastest recovery but the application must handle the inconsistency (e.g., reconcile with logs, user reports). You'll lose the 50 transactions. This is the pragmatic choice for most systems where RPO is not zero. 2) Attempt to recover the transactions from the binary logs of the crashed primary if the disk is accessible: If the primary's disk is intact, you can extract the binlog files and apply the missing GTIDs to the replica manually. This adds downtime but can recover the data. Tradeoff: RPO vs RTO. Option 1 gives RTO in minutes with some data loss; option 2 can achieve zero data loss but may take hours if the logs need to be transferred and applied. Also, after promoting, any attempt to re-add the crashed primary later will hit errant transaction issues because the old primary has GTIDs the new primary never had. Those must be handled via GTID skip.
  • QWhat is the difference between asynchronous, semi-synchronous, and synchronous replication in MySQL? Describe a production scenario where each is appropriate.SeniorReveal
    Asynchronous (default): Primary commits before sending binlog to replica. Fastest writes, but risk of data loss on primary failure. Appropriate for read replicas where some data loss is acceptable (e.g., reporting, caching). Semi-synchronous: Primary waits until at least one replica acknowledges receipt of the binlog event before committing. Reduces data loss window to almost zero if after_sync mode is used. Appropriate for high-value data like transaction logs. Synchronous (via Galera or Group Replication): Primary commits only when all replicas have committed. Zero data loss, but write latency increases to the slowest replica's commit time. Appropriate for financial or compliance workloads where consistency is paramount.

Frequently Asked Questions

How do I add a replica to an existing MySQL primary without downtime?

Use Percona XtraBackup to take a hot physical backup of the primary — it streams a consistent snapshot without locking tables or interrupting writes. Restore the backup on the new replica server, note the GTID position embedded in the xtrabackup_binlog_info file, configure the replica's my.cnf with a unique server-id and GTID settings, then run CHANGE REPLICATION SOURCE TO with SOURCE_AUTO_POSITION=1 and START REPLICA. The replica will automatically request only the GTIDs it's missing.

What's the difference between MySQL semi-synchronous and asynchronous replication?

In standard asynchronous replication (the default), the primary commits a transaction and returns success to the client before confirming that any replica received the binlog events — so in a crash, you can lose committed data. Semi-synchronous replication (rpl_semi_sync_source_enabled=ON) makes the primary wait for at least one replica to acknowledge receipt of the binlog event before returning success to the client, guaranteeing zero data loss on primary failure. The tradeoff is added write latency equal to your network round-trip time to the nearest replica, typically 1-5ms in the same datacenter.

Can I replicate only specific tables or databases instead of everything?

Yes, using replication filters — set replicate-do-db, replicate-do-table, replicate-ignore-db, or replicate-ignore-table in the replica's my.cnf. However, these filters interact dangerously with GTID mode: even filtered-out transactions consume a GTID on the primary but aren't applied on the replica, creating a permanent GTID gap that causes errors if you ever try to use that replica as a new primary. The safer pattern for partial replication is to use logical replication tools like Debezium or to restructure your schema so independent data lives on separate MySQL instances entirely.

How do I recover from a binary log corruption on the primary?

If the binary log is corrupted and replication breaks, first determine the last good position: use mysqlbinlog with --start-position to read logs until error. If possible, skip the corrupt event using mysqlbinlog --skip-gtid-purge or stop replica on the replica, then start from a known good GTID position using CHANGE REPLICATION SOURCE TO with a file position. In severe cases, you may need to rebuild the replica from a new backup. The best prevention is to enable binary log checksums: SET GLOBAL binlog_checksum = CRC32; and monitor log integrity with CHECKSUM TABLE.

What is the impact of using a delayed replica (MASTER_DELAY)?

A delayed replica intentionally lags behind the primary by a configurable number of seconds (e.g., 1 hour). This is useful for recovery from accidental destructive operations (like DROP TABLE). The primary commits the change, and the replica waits the specified delay before applying it. You have that window to stop replication, set GTID_NEXT to skip the bad transaction, and restart without applying it. The tradeoff is that the replica is always behind, so it cannot be promoted in a failover unless you accept that delay.

🔥
Naren Founder & Author

Developer and founder of TheCodeForge. I built this site because I was tired of tutorials that explain what to type without explaining why it works. Every article here is written to make concepts actually click.

← PreviousPostgreSQL TriggersNext →Database Monitoring Tools
Forged with 🔥 at TheCodeForge.io — Where Developers Are Forged