Senior 7 min · March 05, 2026

SQL INSERT, UPDATE, DELETE — 14,000 Lost to Missing WHERE

An UPDATE missing WHERE clause wiped 14,000 salaries instantly.

N
Naren · Founder
Plain-English first. Then code. Then the interview question.
About
 ● Production Incident 🔎 Debug Guide
Quick Answer
  • INSERT adds new rows — always name columns explicitly to survive schema changes
  • UPDATE modifies existing rows — WHERE clause is mandatory in practice, not optional syntax
  • DELETE removes entire rows — without WHERE it empties the whole table instantly with no warning
  • Bulk INSERT with multi-row VALUES reduces network round-trips by 10-50x vs single-row inserts in a loop
  • Production rule: run SELECT with your WHERE clause first, verify the row count, then convert to UPDATE or DELETE
  • Biggest mistake: forgetting WHERE on UPDATE/DELETE — no undo without an open transaction
  • Soft Delete (UPDATE SET is_deleted = true) is the default pattern in regulated systems — Hard Delete is a deliberate architectural choice, not a casual convenience
Plain-English First

Think of a database table like a whiteboard in a classroom. INSERT is writing something new on the board. UPDATE is erasing one word and replacing it with a corrected version. DELETE is wiping an entire row clean with a damp cloth — gone, not just hidden.

That's literally the whole model: three actions, one purpose, keeping your data accurate.

The part that trips people up: the whiteboard has no undo button unless you took a photo first. The photo is a transaction — a checkpoint you set before making changes, so you can restore the board if something goes wrong. Senior engineers take the photo every time before touching real data. Juniors learn why after the first incident.

INSERT, UPDATE, and DELETE are the three DML commands that every live application depends on. Every user signup maps to an INSERT. Every profile edit maps to an UPDATE. Every account deletion maps to a DELETE. If you've ever used a web application, these three statements have been running on your behalf dozens of times a day.

The commands themselves are not complicated. A beginner can write a correct INSERT in five minutes. What separates junior from senior engineers is not knowledge of the syntax — it's the operational discipline around execution. A missing WHERE clause on an UPDATE or DELETE is not a recoverable learning exercise in production. It is a P1 incident. Entire salary columns get zeroed. Customer records vanish. Finance teams notice six hours later during batch processing, not immediately, which means the damage window is wide.

I've reviewed post-mortems on incidents exactly like this. The SQL was technically valid. The database executed it correctly. There was no bug. There was just a missing five-character clause, autocommit was on, and there was no pre-execution verification step in the team's process.

This guide covers syntax, execution patterns, and the operational habits that prevent catastrophic data loss — not because the material is exotic, but because the habits have to be built before you're running DML against a production table with ten million rows and no rollback window.

Setting the Scene — The Table We'll Work With Throughout This Guide

Before touching INSERT, UPDATE, or DELETE, we need a stable foundation. Think of a database table like a structured spreadsheet where the column names are fixed by the schema and every row represents one unique real-world entity. The column definitions — their data types, constraints, and nullability — are the contract that every DML statement must honor.

We're going to use an employees table for a fictional company throughout this guide. It's simple enough to understand at a glance but realistic enough to demonstrate the production patterns that actually matter: a primary key that prevents duplicate records, a NOT NULL constraint on the name that enforces data quality, and a DECIMAL type for salary that stores exact financial values without the floating-point rounding errors you'd get from FLOAT.

That last one is worth pausing on. The difference between DECIMAL(10,2) and FLOAT for a salary column is not a pedantic data-type argument. In a payroll system, FLOAT arithmetic introduces rounding errors at the cent level. Multiply a FLOAT salary by a tax rate across ten thousand employees and the accumulated error becomes meaningful. DECIMAL stores exact values. Always use DECIMAL or NUMERIC for financial columns.

Also notice that department is nullable — no NOT NULL constraint. This is intentional. In a real organization, an employee might exist in the system before being assigned to a department (newly onboarded, contractor pending assignment). Forcing NOT NULL on department would cause INSERTs to fail for those legitimate cases. Schema design decisions like this directly determine what DML is possible later.

io/thecodeforge/setup/create_employees_table.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
-- io.thecodeforge: Production employees schema
-- Design decisions documented inline — schema choices constrain DML forever.

CREATE TABLE employees (
    employee_id   INT            PRIMARY KEY,    -- Unique identifier: rejects duplicates at the DB level
    full_name     VARCHAR(100)   NOT NULL,       -- Mandatory: INSERT fails if omitted or NULL
    department    VARCHAR(50),                   -- Nullable: employees can exist before assignment
    salary        DECIMAL(10, 2)                 -- Exact precision: never use FLOAT for financial values
);

-- Verify the schema before writing any DML.
-- In PostgreSQL, \d employees shows columns, types, and constraints.
-- In MySQL, DESCRIBE employees gives the same information.
-- Run this in every target environment before deploying DML scripts.

-- \d employees

-- Confirm the table is empty before the guide begins.
SELECT * FROM employees;
Output
-- \d employees (PostgreSQL output)
Table "public.employees"
Column | Type | Nullable
--------------+------------------------+---------
employee_id | integer | not null
full_name | character varying(100) | not null
department | character varying(50) |
salary | numeric(10,2) |
Indexes:
"employees_pkey" PRIMARY KEY, btree (employee_id)
-- SELECT * FROM employees;
employee_id | full_name | department | salary
-------------+-----------+------------+--------
(0 rows)
The Schema Is the Contract Your DML Must Honor
  • PRIMARY KEY columns reject duplicate values at the storage engine level — INSERT fails immediately if the key already exists, no matter what the application expects
  • NOT NULL columns require a value on every INSERT — omitting them or explicitly passing NULL raises an error before a single row is written
  • DECIMAL(10,2) stores exact financial values — FLOAT introduces sub-cent rounding errors that compound across millions of calculations
  • Nullable columns (no NOT NULL) accept NULL as a valid value — WHERE column = NULL never matches; you must use WHERE column IS NULL
  • Schema changes between environments (staging vs production) cause DML that works in one environment to fail in another — always run \d or DESCRIBE in the target environment before deploying DML scripts
Production Insight
Schema drift between environments is the silent killer of deployment scripts. A column that exists in staging but not in production, or a constraint that was added to production but not reflected in staging, causes DML to fail at the worst possible time — during a deployment window, not during testing.
The correct discipline: run DESCRIBE table_name or \d table_name in every target environment as the first step of any DML deployment. Treat schema differences between environments as a blocking issue, not a minor discrepancy.
Rule: version your schema changes with migration tools (Flyway, Liquibase, Alembic) so that environment schemas are deterministic and auditable. If you cannot tell what schema version any given environment is running, your DML deployments are operating on assumptions rather than facts.
Key Takeaway
The table schema is the contract that all DML must honor. Every constraint — PRIMARY KEY, NOT NULL, UNIQUE, data type — is enforced at execution time, not at statement-writing time. Understanding the schema before writing DML is not preparation. It is the first step of the work.
Skipping schema verification before deploying DML scripts is the equivalent of deploying code without reading the API contract. The failure will happen — the only question is whether it happens in staging or in production.
Punchline: two minutes with \d employees or DESCRIBE employees prevents the class of bugs where valid-looking SQL fails in production because someone added a NOT NULL column last week.

SQL INSERT — Writing New Rows Into Your Table

INSERT is how data enters your system. Every user account, every order, every log event starts life as an INSERT statement somewhere in the call stack. The syntax has one job: map values to columns and write a new row to the table.

There are two patterns you'll use in practice, and they serve different purposes. Single-row INSERT handles real-time events — one user completing a signup form, one sensor publishing a reading, one webhook arriving. Bulk (multi-row) INSERT handles batch operations — seeding a test environment, importing a dataset, processing a queue of queued events in a background job. The performance difference between them is not academic. Single-row INSERTs inside a loop make one network round-trip per row. Bulk INSERT with multi-row VALUES makes one round-trip for the entire batch. For a 10,000-row import, that's the difference between 10,000 individual network calls and one.

The non-negotiable rule: always name your columns explicitly. The position-based form — INSERT INTO employees VALUES (1, 'Sarah', 'Engineering', 72000.00) — works until the moment someone adds a middle_name column between full_name and department. After that schema change, every position-based INSERT either fails with a type mismatch or, worse, silently stores values in the wrong columns. Explicit column naming makes your INSERT statements resilient to schema evolution and self-documenting for anyone reading the code later.

One more pattern worth knowing for production: INSERT ... ON CONFLICT, which handles the case where a row might or might not already exist. Without it, a duplicate primary key causes the entire INSERT to fail. With ON CONFLICT, you can choose to skip the row, update specific columns, or raise a custom error. This is what's commonly called an 'upsert' — a single statement that handles both the insert case and the update case without requiring a SELECT first.

io/thecodeforge/dml/insert_employees.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
-- io.thecodeforge: Production INSERT patterns

-- PATTERN 1: Single-row INSERT with explicit column names.
-- This is the only safe INSERT form — position-based inserts break on schema changes.
INSERT INTO employees (employee_id, full_name, department, salary)
VALUES (1, 'Sarah Mitchell', 'Engineering', 72000.00);

-- PATTERN 2: Bulk INSERT — one statement, one network round-trip, six rows.
-- Preferred over looping single-row inserts for any batch > 10 rows.
-- Target 500-1000 rows per statement for optimal throughput.
INSERT INTO employees (employee_id, full_name, department, salary)
VALUES
    (2, 'James Okafor',   'Marketing',   55000.00),
    (3, 'Priya Nair',     'Engineering', 68000.00),
    (4, 'Carlos Mendes',  'HR',          48000.00),
    (5, 'Lisa Thornton',  'Engineering', 76000.00),
    (6, 'Derek Huang',    'Marketing',   59000.00);

-- PATTERN 3: Upsert — handles the case where the row may already exist.
-- ON CONFLICT (employee_id): if a row with this primary key already exists...
-- DO UPDATE SET ...: update only the specified columns (not the whole row).
-- EXCLUDED. prefix refers to the VALUES that were attempted in the INSERT.
-- Use this for idempotent operations — safe to run multiple times.
INSERT INTO employees (employee_id, full_name, department, salary)
VALUES (3, 'Priya Nair', 'Engineering', 72000.00)  -- Priya already exists (id=3)
ON CONFLICT (employee_id)
DO UPDATE SET
    salary     = EXCLUDED.salary,
    department = EXCLUDED.department;
-- Result: Priya's salary is updated to 72000.00, no duplicate row created.
-- MySQL equivalent: INSERT ... ON DUPLICATE KEY UPDATE salary = VALUES(salary)

-- Verify all rows after insertion.
SELECT * FROM employees ORDER BY employee_id ASC;
Output
-- After all three patterns execute:
employee_id | full_name | department | salary
-------------+----------------+-------------+-----------
1 | Sarah Mitchell | Engineering | 72000.00
2 | James Okafor | Marketing | 55000.00
3 | Priya Nair | Engineering | 72000.00 <- updated via upsert
4 | Carlos Mendes | HR | 48000.00
5 | Lisa Thornton | Engineering | 76000.00
6 | Derek Huang | Marketing | 59000.00
(6 rows)
Always List Column Names — Position-Based INSERTs Are Schema Bombs
INSERT INTO employees VALUES (1, 'Sarah', 'Engineering', 72000.00) works today. When a DBA adds an 'employment_type' column between full_name and department next month, every position-based INSERT in your codebase either fails with a type mismatch or silently writes 'Engineering' into the employment_type column and 72000.00 into department. Both outcomes are bad. Neither produces an immediate error. Always be explicit: INSERT INTO employees (employee_id, full_name, department, salary) VALUES (...). The extra typing is insurance against silent data corruption.
Production Insight
Single-row INSERTs inside a loop are the most common cause of slow bulk data loads I've seen in production codebases. A loop that inserts 50,000 rows one at a time makes 50,000 network round-trips, holds 50,000 individual row locks sequentially, and typically takes 10-20 minutes. The same 50,000 rows inserted as batches of 1,000 take under 30 seconds.
For very large loads — millions of rows — neither approach is optimal. PostgreSQL's COPY command and MySQL's LOAD DATA INFILE bypass the row-by-row insert machinery entirely and write directly to the storage engine. For a 10-million-row import, COPY completes in minutes where even batched INSERT takes hours.
Rule: single-row INSERT for real-time events, multi-row VALUES batches of 500-1000 for batch jobs, COPY or LOAD DATA INFILE for bulk migrations. Know which pattern you're using and why.
Key Takeaway
INSERT is not just syntax — it is a performance decision disguised as a data operation. The pattern you choose (single-row vs bulk vs COPY) determines whether your data load takes seconds or hours.
Explicit column names in INSERT statements are not a style preference. They are the mechanism that makes INSERT statements resilient to schema evolution. Without them, a column addition becomes a production incident.
Punchline: if your application inserts rows in a loop with one INSERT per iteration, you have a performance problem waiting to become an incident when the data volume grows.
INSERT Pattern Selection
IfReal-time single event — user signup, order placed, log event written
UseSingle-row INSERT with explicit column names — one statement, executed immediately, errors surface immediately
IfBatch import or background job processing — 10 to ~10,000 rows
UseMulti-row VALUES INSERT in batches of 500-1000 rows — dramatically reduces network round-trips and lock overhead
IfBulk migration or data load — millions of rows, one-time or scheduled
UsePostgreSQL COPY FROM or MySQL LOAD DATA INFILE — bypasses row-level insert overhead entirely
IfRow might already exist — idempotent operation, retry-safe pipeline
UseINSERT ... ON CONFLICT (PostgreSQL) or INSERT ... ON DUPLICATE KEY UPDATE (MySQL) — handles both new and existing rows in a single statement without a preceding SELECT
IfInserting results of a SELECT query from another table
UseINSERT INTO target_table (col1, col2) SELECT col1, col2 FROM source_table WHERE ... — no VALUES clause needed

SQL UPDATE — Modifying Existing State Without Breaking It

UPDATE is the command that keeps data current. Password changes, price adjustments, address corrections, status transitions — all of these are UPDATEs under the hood. The mechanism is straightforward: identify which rows to change (WHERE), specify what to change them to (SET), and the database engine applies the modification atomically to every matched row.

The WHERE clause is where the danger lives. Without it, 'every matched row' means 'every row in the table.' The database engine doesn't distinguish between 'I want to update one row' and 'I want to update all rows' — it executes whatever you tell it to. A missing WHERE clause on a 10,000-row employees table is not caught by the SQL parser, not caught by the query planner, and not caught by the storage engine. It executes correctly and commits instantly if autocommit is on. The only thing that catches it is you, before you run it.

The SELECT-first workflow is the habit that prevents incidents. Before writing an UPDATE, write the SELECT equivalent first: SELECT FROM employees WHERE employee_id = 1. Verify that the result set contains exactly the rows you intend to modify. Count them. Then, and only then, convert the statement to an UPDATE by replacing SELECT with UPDATE employees SET salary = ... . This takes an extra 10 seconds. Over a career, it will save you hours.

For batch UPDATEs affecting large numbers of rows, the performance concern shifts from correctness to concurrency. An UPDATE that modifies 5 million rows in a single statement holds row-level locks on all affected rows for the duration of the operation — potentially minutes. Other transactions trying to write to any of those rows queue behind it. On a busy system, this causes cascading timeout failures across your API layer. The mitigation is batching: update 10,000 rows at a time with a COMMIT between batches, keeping the lock window short enough that concurrent writes are not blocked for an unacceptable duration.

io/thecodeforge/dml/update_employees.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
-- io.thecodeforge: Production UPDATE patterns

-- PATTERN 1: Single-row targeted update via Primary Key.
-- Primary Key in WHERE ensures exactly one row is matched — the safest UPDATE pattern.
UPDATE employees
SET salary = 80000.00
WHERE employee_id = 1;  -- WHERE clause written first — non-negotiable habit
-- Verify: SELECT full_name, salary FROM employees WHERE employee_id = 1;

-- PATTERN 2: Multi-column update in one statement.
-- Multiple SET assignments separated by commas — one atomic operation.
-- Carlos is moving from HR to Engineering with a corresponding salary adjustment.
UPDATE employees
SET
    department = 'Engineering',
    salary     = 62000.00
WHERE employee_id = 4;

-- PATTERN 3: Conditional batch update — applying logic to a segment.
-- Increase Marketing salaries by 10%. salary * 1.10 uses the current value.
-- SELECT-first verification step (run this, confirm row count, then UPDATE):
-- SELECT full_name, salary, salary * 1.10 AS new_salary
-- FROM employees WHERE department = 'Marketing';
UPDATE employees
SET salary = salary * 1.10
WHERE department = 'Marketing';

-- PATTERN 4: Safe batch UPDATE for large tables — update in chunks.
-- Never run a multi-million-row UPDATE in a single transaction.
-- This example updates 1000 rows per batch, releasing locks between commits.
-- In a real script this would be a loop with a LIMIT and OFFSET or cursor.
UPDATE employees
SET salary = salary * 1.05
WHERE department = 'Engineering'
  AND employee_id BETWEEN 1 AND 1000;  -- batch boundaries
-- COMMIT; -- release locks, then run next batch

-- Audit the changes after all patterns.
SELECT
    employee_id,
    full_name,
    department,
    salary
FROM employees
ORDER BY employee_id;
Output
-- After all updates:
employee_id | full_name | department | salary
-------------+----------------+-------------+-----------
1 | Sarah Mitchell | Engineering | 84000.00 <- 80000 * 1.05 (Engineering batch)
2 | James Okafor | Marketing | 64900.00 <- 59000 * 1.10 (Marketing 10% raise)
3 | Priya Nair | Engineering | 75600.00 <- 72000 * 1.05 (Engineering batch)
4 | Carlos Mendes | Engineering | 65100.00 <- 62000 * 1.05 (Engineering batch)
5 | Lisa Thornton | Engineering | 79800.00 <- 76000 * 1.05 (Engineering batch)
6 | Derek Huang | Marketing | 64900.00 <- 59000 * 1.10 (Marketing 10% raise)
(6 rows)
UPDATE Without WHERE Is the Most Common Production Incident in SQL
UPDATE employees SET salary = 0 with no WHERE clause executes without error and zeros every salary in the table. The database did exactly what you told it to. There is no warning, no confirmation prompt, and no undo — unless you are inside an open transaction you can roll back. The discipline: write WHERE before you write SET. Physically type the WHERE clause first, even if you fill in the condition afterward. Then write SELECT * FROM employees WHERE [your condition] and run that first. Verify the rows. Count them. Then convert to UPDATE. This habit is what separates engineers who have had incidents from engineers who haven't had them yet.
Production Insight
An UPDATE without an index on the WHERE clause column does a full table scan to find matching rows. On a 50-million-row table, that scan takes minutes and holds locks the entire time. Every other write to that table queues behind it. API latency spikes. Connection pools fill. The query that looked instantaneous in staging on 10,000 rows becomes a 15-minute lock event in production on 50 million.
Always run EXPLAIN UPDATE ... before executing a batch UPDATE on a large table. Look for 'Seq Scan' in the execution plan — that means no index is being used. Add an index on the WHERE clause column before running the UPDATE. For PostgreSQL, use CREATE INDEX CONCURRENTLY to build the index without blocking concurrent reads and writes.
Rule: EXPLAIN before UPDATE on any table with more than 100,000 rows. Never estimate performance based on staging row counts.
Key Takeaway
WHERE is not optional on UPDATE — it is the difference between a targeted fix and a production incident. No WHERE means every row, and the database engine will execute that without hesitation.
The SELECT-first pattern takes 10 seconds. Recovery from a WHERE-less UPDATE on a large table takes hours to days, depending on backup recency. The math is simple.
Punchline: if you find yourself in a hurry and thinking about skipping the SELECT-first verification, that is exactly the moment to slow down. Urgency is when WHERE clauses go missing.

SQL DELETE — Removing Records Safely and Responsibly

DELETE removes entire rows from a table. Not a column. Not a value. The entire row, with all its data, removed from the storage engine. This distinction matters: if you want to clear a value in a specific column while keeping the row, you use UPDATE SET column = NULL. DELETE is for when the record itself should no longer exist.

DELETE is irreversible without a transaction or a backup. Unlike UPDATE, where the old values might be recoverable from audit logs or CDC streams, a committed DELETE removes the row from the table immediately. Vacuum processes in PostgreSQL and InnoDB compaction in MySQL will eventually reclaim the physical disk space, but from the application's perspective, the data is gone the moment the DELETE commits.

This permanence is why most production systems for user-facing data default to Soft Delete rather than Hard Delete. A Soft Delete doesn't use the DELETE command at all — it uses UPDATE to set an is_deleted flag to true or a deleted_at timestamp to the current time. The row remains in the table, queries that should exclude deleted records add WHERE is_deleted = false or WHERE deleted_at IS NULL, and the data can be restored by flipping the flag back. The trade-off is table bloat over time, which requires periodic archival jobs to move soft-deleted rows to cold storage.

Hard Delete has legitimate use cases: GDPR right-to-erasure requests (where soft delete doesn't satisfy the legal requirement to actually remove data), clearing temporary staging tables that are re-populated on each run, and removing log or event data past its retention window. In all three cases, the DELETE should be inside a transaction, preceded by a SELECT to verify the target rows, and — for large tables — executed in batches to avoid holding locks for an unacceptable duration.

The batching requirement for large DELETEs is more critical than for large UPDATEs. Every deleted row generates a transaction log entry proportional to the row size. A single DELETE of 50 million rows generates enough WAL or redo log volume to fill disk on many systems. Batching with COMMIT between chunks keeps the per-transaction log volume manageable and keeps the lock window short enough for other writes to proceed.

io/thecodeforge/dml/delete_employees.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
-- io.thecodeforge: Production DELETE patterns

-- PATTERN 1: Targeted Hard Delete via Primary Key
-- Primary Key in WHERE guarantees exactly one row is affected — the safest DELETE pattern.

-- Step 1: Always verify the target row before deleting.
SELECT * FROM employees WHERE employee_id = 2;
-- Verify this is James Okafor. Confirm there is exactly one row.

-- Step 2: Delete only after verifying the target.
DELETE FROM employees
WHERE employee_id = 2;  -- Primary Key: guaranteed one row, no ambiguity

-- PATTERN 2: Soft Delete — the default for user-facing data
-- Requires an is_deleted column (ALTER TABLE employees ADD COLUMN is_deleted BOOLEAN DEFAULT false)
-- Never physically removes the row — just hides it from application queries.
-- This example is illustrative; it works once the column exists.
-- UPDATE employees
-- SET is_deleted = true, deleted_at = NOW()
-- WHERE employee_id = 2;
-- Restore: UPDATE employees SET is_deleted = false, deleted_at = NULL WHERE employee_id = 2;

-- PATTERN 3: Batch delete by category
-- Add temporary staging rows to demonstrate scoped batch delete.
INSERT INTO employees (employee_id, full_name, department, salary)
VALUES
    (7, 'Angela Foster', 'HR', 47000.00),
    (8, 'Ben Stafford',  'HR', 44000.00);

-- Step 1: Count affected rows — verify before deleting.
SELECT COUNT(*) FROM employees WHERE department = 'HR';
-- Expected: 2 rows (Angela and Ben — Carlos moved to Engineering in the UPDATE section)

-- Step 2: Wrap in a transaction for safety. Verify row count inside BEGIN.
BEGIN;
    DELETE FROM employees
    WHERE department = 'HR';
    -- The database engine returns the rows-affected count here.
    -- If it matches your expectation from the COUNT above, proceed.
COMMIT;
-- If the count is wrong: ROLLBACK; instead of COMMIT;

-- PATTERN 4: Batched DELETE for large tables (pseudocode structure)
-- Never delete millions of rows in one transaction.
-- Loop until no rows remain in the target set:
-- DELETE FROM employees
-- WHERE department = 'archived'
--   AND ctid IN (
--       SELECT ctid FROM employees
--       WHERE department = 'archived'
--       LIMIT 10000
--   );
-- COMMIT; -- release locks between batches

-- Final state after all deletes.
SELECT * FROM employees ORDER BY employee_id;
Output
-- SELECT * FROM employees WHERE employee_id = 2;
employee_id | full_name | department | salary
-------------+--------------+------------+---------
2 | James Okafor | Marketing | 64900.00
(1 row)
-- Confirmed: this is the correct target. Proceeding with DELETE.
-- SELECT COUNT(*) FROM employees WHERE department = 'HR';
count
-------
2
(1 row)
-- Two rows will be deleted. Confirmed. COMMIT.
-- Final state:
employee_id | full_name | department | salary
-------------+----------------+-------------+-----------
1 | Sarah Mitchell | Engineering | 84000.00
3 | Priya Nair | Engineering | 75600.00
4 | Carlos Mendes | Engineering | 65100.00
5 | Lisa Thornton | Engineering | 79800.00
6 | Derek Huang | Marketing | 64900.00
(5 rows)
DELETE Without WHERE Empties the Entire Table — Instantly, With No Confirmation
DELETE FROM employees with no WHERE clause removes every row. The table structure survives — columns, indexes, constraints all remain — but every record is gone. The database engine does not ask 'are you sure?' It executes. The three rules: First, always write WHERE before FROM when typing a DELETE. Second, always run SELECT * FROM employees WHERE [your condition] first and verify the result set is exactly what you intend to remove. Third, always wrap production DELETEs in BEGIN/COMMIT and verify the rows-affected count before committing. These three habits, applied consistently, mean that a missing WHERE clause gets caught before it commits — not after.
Production Insight
Batching is not optional for large DELETEs — it is a correctness requirement for systems that depend on transaction log space. A single DELETE across 50 million rows generates WAL entries proportional to all 50 million rows' worth of data. On a database with 10GB of WAL space configured, this can fill the transaction log mid-operation, causing the database to suspend all writes until the log is archived. Recovering from a filled WAL in production is a multi-hour event.
The batching pattern: DELETE from a target set 10,000 rows at a time, COMMIT between batches, and include a brief pg_sleep(0.1) or application-level sleep between iterations on high-throughput systems to allow replication lag to stay manageable. The total wall-clock time for a batched delete is longer, but the peak impact on concurrent transactions and log space is dramatically lower.
Rule: for any DELETE affecting more than 100,000 rows, implement batching before executing. Estimate affected rows with SELECT COUNT(*) first, then calculate whether the single-statement approach is safe given your WAL/redo log configuration.
Key Takeaway
DELETE is permanent — there is no recycle bin in most databases. A committed DELETE without a backup or transaction is a one-way door.
Soft Delete is the correct default for user-facing data in any system where audit trails, regulatory compliance, or accidental deletion recovery matter. Hard Delete is a deliberate architectural choice for specific use cases, not the lazy default.
Punchline: if you are ever uncertain whether to use Soft Delete or Hard Delete, choose Soft Delete. The cost of over-retaining data is manageable. The cost of accidentally losing data that a regulator asks for later is not.
Delete Strategy Selection
IfUser-facing data — accounts, profiles, orders — needs audit trail or potential recovery
UseSoft Delete: UPDATE employees SET is_deleted = true, deleted_at = NOW() WHERE id = X. Recoverable. Auditable. Does not break foreign key references.
IfGDPR right-to-erasure or legal requirement to permanently remove data
UseHard Delete inside a transaction: BEGIN; DELETE FROM employees WHERE id = X; COMMIT. Verify with SELECT first. Document the deletion event in a separate audit log.
IfClearing temporary staging or work tables before re-population
UseTRUNCATE TABLE staging_table — faster than DELETE, resets auto-increment, minimal transaction log impact. Only use when the table structure is preserved and all rows are being cleared.
IfLarge-scale data purge — millions of rows past a retention window
UseBatched DELETE: 10,000 rows per transaction with COMMIT between batches. Keep lock windows short and WAL volume per transaction manageable.
IfDecommissioning an entire table permanently
UseDROP TABLE — removes the table structure, all data, all indexes, all permissions. Irreversible. Never use DROP when you mean DELETE.
● Production incidentPOST-MORTEMseverity: high

Missing WHERE Clause Zeros All Employee Salaries — 14,000 Rows, 6-Hour Data Loss Window

Symptom
Payroll batch processing reported zero net pay for all employees in the nightly run. The finance team flagged the discrepancy six hours after the UPDATE executed, during end-of-day reconciliation. By that point, payroll notifications had already been sent to employees at two of the three regional offices.
Assumption
The developer believed the UPDATE was targeting a single employee — employee_id 847. They had written the SELECT query correctly, verified the row, then copy-pasted the structure into an UPDATE statement and accidentally dropped the WHERE clause during the edit. The query ran in under two seconds on an indexed table. No error. No warning. No rows-affected check. The developer moved on to the next task.
Root cause
The executed statement was: UPDATE employees SET salary = 75000.00 — no WHERE clause. The application database session had autocommit enabled, which is the default in most ORM configurations and database clients. This meant the UPDATE was committed the moment it completed, with no open transaction to roll back. There was no pre-execution SELECT verification step in the team's deployment checklist, no row-count confirmation prompt in the database client, and no peer review requirement for DML affecting more than a threshold number of rows.
Fix
The immediate fix was restoring salaries from the last nightly backup, which represented a six-hour data loss window. Any salary changes made during business hours that day — approved raises, corrections, new hire onboarding — had to be manually reconstructed from HR system audit logs and re-applied. Total engineering recovery time: 11 hours across three engineers. The process changes implemented: mandatory BEGIN/COMMIT transaction wrapping for all production DML, a pre-execution SELECT step added to the deployment checklist with a required row-count sign-off, a database client configuration change to disable autocommit for the application service account, and a policy requiring peer review for any DML statement affecting more than 100 rows.
Key lesson
  • Always write the WHERE clause before writing the SET clause — physically type WHERE first, even if you fill it in after
  • Disable autocommit in production database sessions — the default behavior in most clients is the unsafe one
  • Run SELECT with the identical WHERE clause to verify target rows and confirm the count before executing UPDATE or DELETE
  • Require peer review for any DML touching more than 100 rows — a second pair of eyes costs minutes, a recovery costs hours
  • Add a rows-affected check after DML execution — if the count is unexpected, ROLLBACK immediately while the transaction is still open
Production debug guideSymptom-driven diagnosis for SQL data modification failures — start with the symptom, not the assumption6 entries
Symptom · 01
INSERT fails with duplicate key violation — error 23505 in PostgreSQL, error 1062 in MySQL
Fix
The INSERT is attempting to write a value that already exists in a PRIMARY KEY or UNIQUE-constrained column. First, verify whether the row genuinely exists: SELECT * FROM table WHERE id = <value>. If it does and you want to update it, use INSERT ... ON CONFLICT DO UPDATE (PostgreSQL) or INSERT ... ON DUPLICATE KEY UPDATE (MySQL). If it doesn't and the error still fires, check whether the sequence or auto-increment counter is out of sync with the actual max ID in the table — this happens after a manual bulk load bypasses the sequence.
Symptom · 02
UPDATE executes successfully but affects 0 rows — no error raised
Fix
The WHERE clause matched zero rows. This is not an error condition from the database's perspective — it simply found nothing to update. Verify with SELECT COUNT(*) FROM table WHERE <your condition>. If that returns 0, your filter is wrong. Common causes: string case mismatch (WHERE department = 'hr' vs 'HR'), trailing whitespace in stored values, or filtering on a column that stores NULL (WHERE manager_id = NULL never matches — use WHERE manager_id IS NULL instead).
Symptom · 03
DELETE executes but rows reappear after refreshing the query
Fix
The transaction was never committed. Your session is showing you its own uncommitted view while other sessions still see the old data. Run COMMIT; if you intended to persist the change. Run ROLLBACK; if you want to discard it. To verify cross-session visibility before committing, open a second database session and query the same table — if the rows are still there in the second session, your first session's transaction is open and uncommitted.
Symptom · 04
INSERT succeeds but data appears in the wrong columns — values shifted by one position
Fix
Column order mismatch from a position-based INSERT without explicit column names. When a column was added to the table, every subsequent value shifted one position right in the schema, but your INSERT statement still passes values in the original order. The fix is always to name columns explicitly: INSERT INTO table (col1, col2, col3) VALUES (v1, v2, v3). Check the current schema with \d table_name (PostgreSQL) or DESCRIBE table (MySQL) and compare it against your INSERT statement.
Symptom · 05
UPDATE or DELETE is extremely slow on a large table — query runs for minutes
Fix
The WHERE clause is doing a full table scan because there is no index on the filtered column. Run EXPLAIN UPDATE ... (PostgreSQL/MySQL) or look at the execution plan to confirm. A Seq Scan on a multi-million-row table for a DELETE will take minutes and hold locks the entire time, blocking all other writes to the table. Add an index on the WHERE clause column: CREATE INDEX CONCURRENTLY idx_employees_dept ON employees(department). CONCURRENTLY builds the index without blocking reads or writes.
Symptom · 06
INSERT succeeds but the auto-increment ID jumps unexpectedly — IDs are non-sequential with large gaps
Fix
This is expected behavior, not a bug. Sequences and auto-increment counters increment on each attempted INSERT, including ones that fail due to constraint violations. If your application retries failed INSERTs, each retry consumes a sequence value. Similarly, if a transaction generates an ID and then rolls back, the ID is not reused. Auto-increment IDs should be treated as unique identifiers, not as sequential counters — gaps are normal and harmless.
★ DML Quick Debug Cheat SheetWhen INSERT, UPDATE, or DELETE misbehaves in production, run these commands in order. Match the symptom first — do not start at the bottom and work up.
Duplicate key error on INSERT — operation rejected, row not written
Immediate action
Verify whether the conflicting row already exists and decide whether the correct action is skip, update, or error
Commands
psql -c "SELECT * FROM employees WHERE employee_id = 7;" yourdb
psql -c "\d employees" yourdb
Fix now
If the row exists and you want to update it: use INSERT INTO employees (...) VALUES (...) ON CONFLICT (employee_id) DO UPDATE SET salary = EXCLUDED.salary. If you want to silently skip duplicates: use ON CONFLICT DO NOTHING. If the sequence is out of sync (inserts keep conflicting despite no visible rows): run SELECT setval('employees_employee_id_seq', (SELECT MAX(employee_id) FROM employees)); to reset it.
UPDATE or DELETE affected wrong number of rows — too many or too few+
Immediate action
Count target rows inside a transaction before committing — verify the number matches your expectation before the change becomes permanent
Commands
psql -c "BEGIN; SELECT COUNT(*) FROM employees WHERE department = 'HR';" yourdb
psql -c "ROLLBACK;" yourdb
Fix now
Always wrap production DML in BEGIN/ROLLBACK first: BEGIN; UPDATE employees SET salary = 62000.00 WHERE department = 'HR'; — then check the rows affected count. If it matches your expectation, run COMMIT;. If not, run ROLLBACK; and fix the WHERE clause. Never COMMIT until you have verified the affected row count.
Transaction blocking other queries — API timeouts increasing, write queue backing up+
Immediate action
Find the blocking session, confirm it is idle in transaction (started a transaction and never committed or rolled back), then terminate it
Commands
psql -c "SELECT pid, usename, application_name, state, query_start, query FROM pg_stat_activity WHERE state = 'idle in transaction' ORDER BY query_start ASC;" yourdb
psql -c "SELECT pg_terminate_backend(<pid>);" yourdb
Fix now
After terminating the blocking session, set idle_in_transaction_session_timeout in postgresql.conf to auto-cancel sessions that hold a transaction open beyond a threshold: idle_in_transaction_session_timeout = '5min'. This prevents runaway transactions from accumulating during incidents. Also audit the application code path that initiated the transaction — find where the COMMIT or ROLLBACK is missing.
INSERT vs UPDATE vs DELETE — Three Commands, Three Purposes
AspectINSERTUPDATEDELETE
What it doesAdds new rows to the table — does not modify existing dataChanges specific column values in existing rows — row count stays the sameRemoves entire rows from the table — column values in those rows are gone
Requires WHERE?No — INSERT always targets new rows, WHERE has no roleYes — omitting WHERE updates every row in the table simultaneouslyYes — omitting WHERE deletes every row in the table simultaneously
Affects row count?Increases by the number of rows successfully insertedNo change — the same rows exist before and afterDecreases by the number of rows matched by the WHERE clause
Can target multiple rows?Yes — multi-row VALUES syntax inserts many rows in one statementYes — WHERE clause can match thousands of rows; all are updated atomicallyYes — WHERE clause can match thousands of rows; all are removed atomically
Reversible without a transaction?Only by running a matching DELETE — which requires you to know what was insertedOnly by running another UPDATE with the old values — which requires you to know what they wereNot easily — the data is gone. Requires point-in-time recovery or a backup restore.
Safe execution patternName columns explicitly. For batch: multi-row VALUES. For idempotent: ON CONFLICT clause.SELECT-first to verify targets. Wrap in BEGIN/COMMIT. Check rows-affected count before committing.SELECT-first to verify targets. Wrap in BEGIN/COMMIT. Batch large deletes. Prefer Soft Delete for user data.
Typical real-world triggerNew user signup, new order placed, log event recorded, configuration entry createdUser updates profile, price adjusted, order status changes, permission modifiedAccount closed, record archived, GDPR erasure request, staging table cleared

Key takeaways

1
INSERT adds new rows
always name your columns explicitly in every INSERT statement. Position-based inserts are schema bombs that detonate silently when someone adds a column to the table later.
2
UPDATE changes existing rows
the WHERE clause is the only thing preventing every row in the table from being modified simultaneously. Write WHERE before SET, every time, without exception. No WHERE equals every row.
3
DELETE removes entire rows permanently
the row and all its column values are gone. Without WHERE, the entire table is emptied. Run SELECT with the same WHERE clause first, verify the row count, then DELETE inside a BEGIN/COMMIT block.
4
The SELECT-before-DML habit is one of the highest-value practices in database work. It costs 10 seconds and prevents recoveries that cost 10 hours. Build it as a reflex, not a procedure you follow when you remember.
5
Bulk INSERT with multi-row VALUES syntax is 10-50x faster than single-row INSERT in a loop
the difference is network round-trips. For very large loads, COPY (PostgreSQL) or LOAD DATA INFILE (MySQL) outperforms even batched INSERT by another order of magnitude.
6
Soft Delete (UPDATE SET is_deleted = true) is the correct default for user-facing data in regulated or audit-sensitive systems. Hard Delete is a deliberate choice for specific scenarios
GDPR erasure, staging tables, data past retention windows — not the casual path of least resistance.

Common mistakes to avoid

5 patterns
×

Running DELETE or UPDATE without a WHERE clause

Symptom
Every row in the table gets affected instantly. Salaries zeroed, all records deleted, all statuses overwritten. No error is raised because the syntax is completely valid — the database did exactly what was asked.
Fix
Write the WHERE clause before writing SET or FROM — physically type it first, even if you fill in the condition afterward. Follow the SELECT-first workflow without exception: run SELECT * FROM table WHERE [condition], verify the result set contains exactly the intended rows, confirm the count, then convert the statement to UPDATE or DELETE. Wrap all production DML in BEGIN/COMMIT so there is a ROLLBACK window. If your database client has a 'safe update mode' setting (MySQL's --safe-updates flag), enable it — it rejects UPDATE and DELETE statements without a WHERE clause automatically.
×

Omitting column names in INSERT statements

Symptom
INSERT succeeds today but silently stores data in the wrong columns after a schema change. A new column added between existing ones shifts all subsequent value positions. The error may not surface immediately — wrong values in wrong columns can persist for days before someone notices inconsistencies in application behavior.
Fix
Always name every column explicitly in every INSERT statement: INSERT INTO employees (employee_id, full_name, department, salary) VALUES (...). Never rely on column position. This is the most common SQL practice violation I see in code reviews, and it has caused real production data corruption when schema migrations add columns mid-application.
×

Confusing DELETE, TRUNCATE, and DROP TABLE

Symptom
Developer intends to clear rows for a fresh data seed. They run DROP TABLE, which removes the entire table structure — columns, indexes, constraints, permissions, sequences — not just the data. Recovery requires restoring from a backup or recreating the schema from migration scripts.
Fix
DELETE FROM table WHERE [condition] removes rows matching the condition while preserving the table structure. TRUNCATE TABLE removes all rows quickly (DDL operation, minimal logging, resets auto-increment) while preserving the table structure. DROP TABLE removes the table itself — use only when decommissioning the table permanently. Know which of the three you need before running any of them.
×

Single-row INSERT in a loop for bulk data loading

Symptom
An import job that should take two minutes runs for 45 minutes. Each iteration of the loop opens a connection, sends a query, waits for acknowledgment, and closes. For 100,000 rows, this is 100,000 round-trips. The database connection pool exhausts. Other services start experiencing timeout errors as connections are held waiting for the import to complete.
Fix
Batch INSERTs using multi-row VALUES syntax — insert 500-1000 rows per statement. For very large loads (millions of rows), use PostgreSQL's COPY command or MySQL's LOAD DATA INFILE, which bypass the row-by-row insert machinery and write directly to the storage engine at orders-of-magnitude higher throughput. Profile the import job's duration against the row count — if it scales linearly, you're doing single-row inserts.
×

Fitting the scaler on full data — wrong transaction isolation for concurrent reads during DML

Symptom
An UPDATE or DELETE that works correctly in isolation produces unexpected results when other transactions are writing concurrently. Rows that should have been targeted are skipped. Rows that should have been skipped are modified. The behavior is non-deterministic and difficult to reproduce.
Fix
Understand your database's default isolation level. In PostgreSQL, READ COMMITTED (the default) means an UPDATE's WHERE clause re-evaluates each row at the time that row is processed, not at the time the statement started. If another transaction modifies a row between when the UPDATE started and when it reaches that row, the row's current state determines whether it matches the WHERE clause. For operations that require a consistent snapshot of the table at statement start, use REPEATABLE READ or SERIALIZABLE isolation: BEGIN ISOLATION LEVEL REPEATABLE READ; UPDATE ...; COMMIT;
INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01JUNIOR
What is the difference between DELETE and TRUNCATE? Cover DML vs DDL, tr...
Q02SENIOR
Explain Soft Delete vs Hard Delete. Why would a production banking syste...
Q03SENIOR
LeetCode challenge: Write a SQL query to delete all duplicate emails fro...
Q04JUNIOR
You accidentally run an UPDATE without a WHERE clause and your session i...
Q05SENIOR
How does the database engine handle an INSERT that violates a UNIQUE con...
Q06SENIOR
Explain how a missing index on the WHERE clause column affects the perfo...
Q01 of 06JUNIOR

What is the difference between DELETE and TRUNCATE? Cover DML vs DDL, transaction logging, trigger behavior, and performance.

ANSWER
DELETE is a DML command that removes rows one at a time (conceptually), logs each row deletion individually in the transaction log, respects DELETE triggers (triggers fire once per row or once per statement depending on the database), and can be rolled back if inside an explicit transaction. Because each row deletion is logged, the log volume is proportional to the number of rows deleted — a 50M-row DELETE generates enormous WAL/redo log output. TRUNCATE is a DDL command (in most databases) that deallocates data pages rather than logging individual row deletions. It is dramatically faster — removing 50 million rows via TRUNCATE takes seconds versus hours for DELETE. TRUNCATE resets auto-increment/sequence counters to their starting value, does not fire row-level triggers, and in most databases (MySQL, SQL Server) cannot be rolled back because it is DDL and DDL is implicitly committed. PostgreSQL is a notable exception — TRUNCATE inside a BEGIN/COMMIT block can be rolled back. Practical guidance: use DELETE when you need WHERE filtering, trigger execution, or precise control over which rows are removed. Use TRUNCATE when you need to empty an entire table as fast as possible — common for staging tables, test data reset, and ETL target tables.
FAQ · 5 QUESTIONS

Frequently Asked Questions

01
Can I undo a DELETE or UPDATE after it's been executed?
02
What is the difference between DELETE FROM employees and DROP TABLE employees?
03
Can an INSERT statement add rows to multiple tables at the same time?
04
When should I use Soft Delete vs Hard Delete?
05
Why does my UPDATE affect 0 rows even though the data visually appears to match?
🔥

That's SQL Basics. Mark it forged?

7 min read · try the examples if you haven't

Previous
SQL CREATE TABLE and Data Types
3 / 16 · SQL Basics
Next
SQL SELECT Statement