Skip to content
Home Database SQL INSERT, UPDATE, DELETE Explained — Add, Change and Remove Data

SQL INSERT, UPDATE, DELETE Explained — Add, Change and Remove Data

Where developers are forged. · Structured learning · Free forever.
📍 Part of: SQL Basics → Topic 3 of 16
Master SQL INSERT, UPDATE, and DELETE with beginner-friendly examples, production-grade code, common mistakes, and LeetCode-standard interview tips.
🧑‍💻 Beginner-friendly — no prior Database experience needed
In this tutorial, you'll learn
Master SQL INSERT, UPDATE, and DELETE with beginner-friendly examples, production-grade code, common mistakes, and LeetCode-standard interview tips.
  • INSERT adds new rows — always name your columns explicitly in every INSERT statement. Position-based inserts are schema bombs that detonate silently when someone adds a column to the table later.
  • UPDATE changes existing rows — the WHERE clause is the only thing preventing every row in the table from being modified simultaneously. Write WHERE before SET, every time, without exception. No WHERE equals every row.
  • DELETE removes entire rows permanently — the row and all its column values are gone. Without WHERE, the entire table is emptied. Run SELECT with the same WHERE clause first, verify the row count, then DELETE inside a BEGIN/COMMIT block.
✦ Plain-English analogy ✦ Real code with output ✦ Interview questions
Quick Answer
  • INSERT adds new rows — always name columns explicitly to survive schema changes
  • UPDATE modifies existing rows — WHERE clause is mandatory in practice, not optional syntax
  • DELETE removes entire rows — without WHERE it empties the whole table instantly with no warning
  • Bulk INSERT with multi-row VALUES reduces network round-trips by 10-50x vs single-row inserts in a loop
  • Production rule: run SELECT with your WHERE clause first, verify the row count, then convert to UPDATE or DELETE
  • Biggest mistake: forgetting WHERE on UPDATE/DELETE — no undo without an open transaction
  • Soft Delete (UPDATE SET is_deleted = true) is the default pattern in regulated systems — Hard Delete is a deliberate architectural choice, not a casual convenience
🚨 START HERE
DML Quick Debug Cheat Sheet
When INSERT, UPDATE, or DELETE misbehaves in production, run these commands in order. Match the symptom first — do not start at the bottom and work up.
🟡Duplicate key error on INSERT — operation rejected, row not written
Immediate ActionVerify whether the conflicting row already exists and decide whether the correct action is skip, update, or error
Commands
psql -c "SELECT * FROM employees WHERE employee_id = 7;" yourdb
psql -c "\d employees" yourdb
Fix NowIf the row exists and you want to update it: use INSERT INTO employees (...) VALUES (...) ON CONFLICT (employee_id) DO UPDATE SET salary = EXCLUDED.salary. If you want to silently skip duplicates: use ON CONFLICT DO NOTHING. If the sequence is out of sync (inserts keep conflicting despite no visible rows): run SELECT setval('employees_employee_id_seq', (SELECT MAX(employee_id) FROM employees)); to reset it.
🟡UPDATE or DELETE affected wrong number of rows — too many or too few
Immediate ActionCount target rows inside a transaction before committing — verify the number matches your expectation before the change becomes permanent
Commands
psql -c "BEGIN; SELECT COUNT(*) FROM employees WHERE department = 'HR';" yourdb
psql -c "ROLLBACK;" yourdb
Fix NowAlways wrap production DML in BEGIN/ROLLBACK first: BEGIN; UPDATE employees SET salary = 62000.00 WHERE department = 'HR'; — then check the rows affected count. If it matches your expectation, run COMMIT;. If not, run ROLLBACK; and fix the WHERE clause. Never COMMIT until you have verified the affected row count.
🟡Transaction blocking other queries — API timeouts increasing, write queue backing up
Immediate ActionFind the blocking session, confirm it is idle in transaction (started a transaction and never committed or rolled back), then terminate it
Commands
psql -c "SELECT pid, usename, application_name, state, query_start, query FROM pg_stat_activity WHERE state = 'idle in transaction' ORDER BY query_start ASC;" yourdb
psql -c "SELECT pg_terminate_backend(<pid>);" yourdb
Fix NowAfter terminating the blocking session, set idle_in_transaction_session_timeout in postgresql.conf to auto-cancel sessions that hold a transaction open beyond a threshold: idle_in_transaction_session_timeout = '5min'. This prevents runaway transactions from accumulating during incidents. Also audit the application code path that initiated the transaction — find where the COMMIT or ROLLBACK is missing.
Production IncidentMissing WHERE Clause Zeros All Employee Salaries — 14,000 Rows, 6-Hour Data Loss WindowA junior developer ran an UPDATE without a WHERE clause during a routine salary correction, zeroing salaries for 14,000 employees across three regions. The payroll system processed the zeroed records before anyone noticed.
SymptomPayroll batch processing reported zero net pay for all employees in the nightly run. The finance team flagged the discrepancy six hours after the UPDATE executed, during end-of-day reconciliation. By that point, payroll notifications had already been sent to employees at two of the three regional offices.
AssumptionThe developer believed the UPDATE was targeting a single employee — employee_id 847. They had written the SELECT query correctly, verified the row, then copy-pasted the structure into an UPDATE statement and accidentally dropped the WHERE clause during the edit. The query ran in under two seconds on an indexed table. No error. No warning. No rows-affected check. The developer moved on to the next task.
Root causeThe executed statement was: UPDATE employees SET salary = 75000.00 — no WHERE clause. The application database session had autocommit enabled, which is the default in most ORM configurations and database clients. This meant the UPDATE was committed the moment it completed, with no open transaction to roll back. There was no pre-execution SELECT verification step in the team's deployment checklist, no row-count confirmation prompt in the database client, and no peer review requirement for DML affecting more than a threshold number of rows.
FixThe immediate fix was restoring salaries from the last nightly backup, which represented a six-hour data loss window. Any salary changes made during business hours that day — approved raises, corrections, new hire onboarding — had to be manually reconstructed from HR system audit logs and re-applied. Total engineering recovery time: 11 hours across three engineers. The process changes implemented: mandatory BEGIN/COMMIT transaction wrapping for all production DML, a pre-execution SELECT step added to the deployment checklist with a required row-count sign-off, a database client configuration change to disable autocommit for the application service account, and a policy requiring peer review for any DML statement affecting more than 100 rows.
Key Lesson
Always write the WHERE clause before writing the SET clause — physically type WHERE first, even if you fill it in afterDisable autocommit in production database sessions — the default behavior in most clients is the unsafe oneRun SELECT with the identical WHERE clause to verify target rows and confirm the count before executing UPDATE or DELETERequire peer review for any DML touching more than 100 rows — a second pair of eyes costs minutes, a recovery costs hoursAdd a rows-affected check after DML execution — if the count is unexpected, ROLLBACK immediately while the transaction is still open
Production Debug GuideSymptom-driven diagnosis for SQL data modification failures — start with the symptom, not the assumption
INSERT fails with duplicate key violation — error 23505 in PostgreSQL, error 1062 in MySQLThe INSERT is attempting to write a value that already exists in a PRIMARY KEY or UNIQUE-constrained column. First, verify whether the row genuinely exists: SELECT * FROM table WHERE id = <value>. If it does and you want to update it, use INSERT ... ON CONFLICT DO UPDATE (PostgreSQL) or INSERT ... ON DUPLICATE KEY UPDATE (MySQL). If it doesn't and the error still fires, check whether the sequence or auto-increment counter is out of sync with the actual max ID in the table — this happens after a manual bulk load bypasses the sequence.
UPDATE executes successfully but affects 0 rows — no error raisedThe WHERE clause matched zero rows. This is not an error condition from the database's perspective — it simply found nothing to update. Verify with SELECT COUNT(*) FROM table WHERE <your condition>. If that returns 0, your filter is wrong. Common causes: string case mismatch (WHERE department = 'hr' vs 'HR'), trailing whitespace in stored values, or filtering on a column that stores NULL (WHERE manager_id = NULL never matches — use WHERE manager_id IS NULL instead).
DELETE executes but rows reappear after refreshing the queryThe transaction was never committed. Your session is showing you its own uncommitted view while other sessions still see the old data. Run COMMIT; if you intended to persist the change. Run ROLLBACK; if you want to discard it. To verify cross-session visibility before committing, open a second database session and query the same table — if the rows are still there in the second session, your first session's transaction is open and uncommitted.
INSERT succeeds but data appears in the wrong columns — values shifted by one positionColumn order mismatch from a position-based INSERT without explicit column names. When a column was added to the table, every subsequent value shifted one position right in the schema, but your INSERT statement still passes values in the original order. The fix is always to name columns explicitly: INSERT INTO table (col1, col2, col3) VALUES (v1, v2, v3). Check the current schema with \d table_name (PostgreSQL) or DESCRIBE table (MySQL) and compare it against your INSERT statement.
UPDATE or DELETE is extremely slow on a large table — query runs for minutesThe WHERE clause is doing a full table scan because there is no index on the filtered column. Run EXPLAIN UPDATE ... (PostgreSQL/MySQL) or look at the execution plan to confirm. A Seq Scan on a multi-million-row table for a DELETE will take minutes and hold locks the entire time, blocking all other writes to the table. Add an index on the WHERE clause column: CREATE INDEX CONCURRENTLY idx_employees_dept ON employees(department). CONCURRENTLY builds the index without blocking reads or writes.
INSERT succeeds but the auto-increment ID jumps unexpectedly — IDs are non-sequential with large gapsThis is expected behavior, not a bug. Sequences and auto-increment counters increment on each attempted INSERT, including ones that fail due to constraint violations. If your application retries failed INSERTs, each retry consumes a sequence value. Similarly, if a transaction generates an ID and then rolls back, the ID is not reused. Auto-increment IDs should be treated as unique identifiers, not as sequential counters — gaps are normal and harmless.

INSERT, UPDATE, and DELETE are the three DML commands that every live application depends on. Every user signup maps to an INSERT. Every profile edit maps to an UPDATE. Every account deletion maps to a DELETE. If you've ever used a web application, these three statements have been running on your behalf dozens of times a day.

The commands themselves are not complicated. A beginner can write a correct INSERT in five minutes. What separates junior from senior engineers is not knowledge of the syntax — it's the operational discipline around execution. A missing WHERE clause on an UPDATE or DELETE is not a recoverable learning exercise in production. It is a P1 incident. Entire salary columns get zeroed. Customer records vanish. Finance teams notice six hours later during batch processing, not immediately, which means the damage window is wide.

I've reviewed post-mortems on incidents exactly like this. The SQL was technically valid. The database executed it correctly. There was no bug. There was just a missing five-character clause, autocommit was on, and there was no pre-execution verification step in the team's process.

This guide covers syntax, execution patterns, and the operational habits that prevent catastrophic data loss — not because the material is exotic, but because the habits have to be built before you're running DML against a production table with ten million rows and no rollback window.

Setting the Scene — The Table We'll Work With Throughout This Guide

Before touching INSERT, UPDATE, or DELETE, we need a stable foundation. Think of a database table like a structured spreadsheet where the column names are fixed by the schema and every row represents one unique real-world entity. The column definitions — their data types, constraints, and nullability — are the contract that every DML statement must honor.

We're going to use an employees table for a fictional company throughout this guide. It's simple enough to understand at a glance but realistic enough to demonstrate the production patterns that actually matter: a primary key that prevents duplicate records, a NOT NULL constraint on the name that enforces data quality, and a DECIMAL type for salary that stores exact financial values without the floating-point rounding errors you'd get from FLOAT.

That last one is worth pausing on. The difference between DECIMAL(10,2) and FLOAT for a salary column is not a pedantic data-type argument. In a payroll system, FLOAT arithmetic introduces rounding errors at the cent level. Multiply a FLOAT salary by a tax rate across ten thousand employees and the accumulated error becomes meaningful. DECIMAL stores exact values. Always use DECIMAL or NUMERIC for financial columns.

Also notice that department is nullable — no NOT NULL constraint. This is intentional. In a real organization, an employee might exist in the system before being assigned to a department (newly onboarded, contractor pending assignment). Forcing NOT NULL on department would cause INSERTs to fail for those legitimate cases. Schema design decisions like this directly determine what DML is possible later.

io/thecodeforge/setup/create_employees_table.sql · SQL
12345678910111213141516171819
-- io.thecodeforge: Production employees schema
-- Design decisions documented inline — schema choices constrain DML forever.

CREATE TABLE employees (
    employee_id   INT            PRIMARY KEY,    -- Unique identifier: rejects duplicates at the DB level
    full_name     VARCHAR(100)   NOT NULL,       -- Mandatory: INSERT fails if omitted or NULL
    department    VARCHAR(50),                   -- Nullable: employees can exist before assignment
    salary        DECIMAL(10, 2)                 -- Exact precision: never use FLOAT for financial values
);

-- Verify the schema before writing any DML.
-- In PostgreSQL, \d employees shows columns, types, and constraints.
-- In MySQL, DESCRIBE employees gives the same information.
-- Run this in every target environment before deploying DML scripts.

-- \d employees

-- Confirm the table is empty before the guide begins.
SELECT * FROM employees;
▶ Output
-- \d employees (PostgreSQL output)
Table "public.employees"
Column | Type | Nullable
--------------+------------------------+---------
employee_id | integer | not null
full_name | character varying(100) | not null
department | character varying(50) |
salary | numeric(10,2) |
Indexes:
"employees_pkey" PRIMARY KEY, btree (employee_id)

-- SELECT * FROM employees;
employee_id | full_name | department | salary
-------------+-----------+------------+--------
(0 rows)
Mental Model
The Schema Is the Contract Your DML Must Honor
You cannot write safe INSERT, UPDATE, or DELETE without knowing which columns exist, which are mandatory, which have uniqueness constraints, and what data types they expect. The schema defines what is possible — your DML works within those boundaries.
  • PRIMARY KEY columns reject duplicate values at the storage engine level — INSERT fails immediately if the key already exists, no matter what the application expects
  • NOT NULL columns require a value on every INSERT — omitting them or explicitly passing NULL raises an error before a single row is written
  • DECIMAL(10,2) stores exact financial values — FLOAT introduces sub-cent rounding errors that compound across millions of calculations
  • Nullable columns (no NOT NULL) accept NULL as a valid value — WHERE column = NULL never matches; you must use WHERE column IS NULL
  • Schema changes between environments (staging vs production) cause DML that works in one environment to fail in another — always run \d or DESCRIBE in the target environment before deploying DML scripts
📊 Production Insight
Schema drift between environments is the silent killer of deployment scripts. A column that exists in staging but not in production, or a constraint that was added to production but not reflected in staging, causes DML to fail at the worst possible time — during a deployment window, not during testing.
The correct discipline: run DESCRIBE table_name or \d table_name in every target environment as the first step of any DML deployment. Treat schema differences between environments as a blocking issue, not a minor discrepancy.
Rule: version your schema changes with migration tools (Flyway, Liquibase, Alembic) so that environment schemas are deterministic and auditable. If you cannot tell what schema version any given environment is running, your DML deployments are operating on assumptions rather than facts.
🎯 Key Takeaway
The table schema is the contract that all DML must honor. Every constraint — PRIMARY KEY, NOT NULL, UNIQUE, data type — is enforced at execution time, not at statement-writing time. Understanding the schema before writing DML is not preparation. It is the first step of the work.
Skipping schema verification before deploying DML scripts is the equivalent of deploying code without reading the API contract. The failure will happen — the only question is whether it happens in staging or in production.
Punchline: two minutes with \d employees or DESCRIBE employees prevents the class of bugs where valid-looking SQL fails in production because someone added a NOT NULL column last week.

SQL INSERT — Writing New Rows Into Your Table

INSERT is how data enters your system. Every user account, every order, every log event starts life as an INSERT statement somewhere in the call stack. The syntax has one job: map values to columns and write a new row to the table.

There are two patterns you'll use in practice, and they serve different purposes. Single-row INSERT handles real-time events — one user completing a signup form, one sensor publishing a reading, one webhook arriving. Bulk (multi-row) INSERT handles batch operations — seeding a test environment, importing a dataset, processing a queue of queued events in a background job. The performance difference between them is not academic. Single-row INSERTs inside a loop make one network round-trip per row. Bulk INSERT with multi-row VALUES makes one round-trip for the entire batch. For a 10,000-row import, that's the difference between 10,000 individual network calls and one.

The non-negotiable rule: always name your columns explicitly. The position-based form — INSERT INTO employees VALUES (1, 'Sarah', 'Engineering', 72000.00) — works until the moment someone adds a middle_name column between full_name and department. After that schema change, every position-based INSERT either fails with a type mismatch or, worse, silently stores values in the wrong columns. Explicit column naming makes your INSERT statements resilient to schema evolution and self-documenting for anyone reading the code later.

One more pattern worth knowing for production: INSERT ... ON CONFLICT, which handles the case where a row might or might not already exist. Without it, a duplicate primary key causes the entire INSERT to fail. With ON CONFLICT, you can choose to skip the row, update specific columns, or raise a custom error. This is what's commonly called an 'upsert' — a single statement that handles both the insert case and the update case without requiring a SELECT first.

io/thecodeforge/dml/insert_employees.sql · SQL
12345678910111213141516171819202122232425262728293031323334
-- io.thecodeforge: Production INSERT patterns

-- PATTERN 1: Single-row INSERT with explicit column names.
-- This is the only safe INSERT form — position-based inserts break on schema changes.
INSERT INTO employees (employee_id, full_name, department, salary)
VALUES (1, 'Sarah Mitchell', 'Engineering', 72000.00);

-- PATTERN 2: Bulk INSERT — one statement, one network round-trip, six rows.
-- Preferred over looping single-row inserts for any batch > 10 rows.
-- Target 500-1000 rows per statement for optimal throughput.
INSERT INTO employees (employee_id, full_name, department, salary)
VALUES
    (2, 'James Okafor',   'Marketing',   55000.00),
    (3, 'Priya Nair',     'Engineering', 68000.00),
    (4, 'Carlos Mendes',  'HR',          48000.00),
    (5, 'Lisa Thornton',  'Engineering', 76000.00),
    (6, 'Derek Huang',    'Marketing',   59000.00);

-- PATTERN 3: Upsert — handles the case where the row may already exist.
-- ON CONFLICT (employee_id): if a row with this primary key already exists...
-- DO UPDATE SET ...: update only the specified columns (not the whole row).
-- EXCLUDED. prefix refers to the VALUES that were attempted in the INSERT.
-- Use this for idempotent operations — safe to run multiple times.
INSERT INTO employees (employee_id, full_name, department, salary)
VALUES (3, 'Priya Nair', 'Engineering', 72000.00)  -- Priya already exists (id=3)
ON CONFLICT (employee_id)
DO UPDATE SET
    salary     = EXCLUDED.salary,
    department = EXCLUDED.department;
-- Result: Priya's salary is updated to 72000.00, no duplicate row created.
-- MySQL equivalent: INSERT ... ON DUPLICATE KEY UPDATE salary = VALUES(salary)

-- Verify all rows after insertion.
SELECT * FROM employees ORDER BY employee_id ASC;
▶ Output
-- After all three patterns execute:
employee_id | full_name | department | salary
-------------+----------------+-------------+-----------
1 | Sarah Mitchell | Engineering | 72000.00
2 | James Okafor | Marketing | 55000.00
3 | Priya Nair | Engineering | 72000.00 <- updated via upsert
4 | Carlos Mendes | HR | 48000.00
5 | Lisa Thornton | Engineering | 76000.00
6 | Derek Huang | Marketing | 59000.00
(6 rows)
💡Always List Column Names — Position-Based INSERTs Are Schema Bombs
INSERT INTO employees VALUES (1, 'Sarah', 'Engineering', 72000.00) works today. When a DBA adds an 'employment_type' column between full_name and department next month, every position-based INSERT in your codebase either fails with a type mismatch or silently writes 'Engineering' into the employment_type column and 72000.00 into department. Both outcomes are bad. Neither produces an immediate error. Always be explicit: INSERT INTO employees (employee_id, full_name, department, salary) VALUES (...). The extra typing is insurance against silent data corruption.
📊 Production Insight
Single-row INSERTs inside a loop are the most common cause of slow bulk data loads I've seen in production codebases. A loop that inserts 50,000 rows one at a time makes 50,000 network round-trips, holds 50,000 individual row locks sequentially, and typically takes 10-20 minutes. The same 50,000 rows inserted as batches of 1,000 take under 30 seconds.
For very large loads — millions of rows — neither approach is optimal. PostgreSQL's COPY command and MySQL's LOAD DATA INFILE bypass the row-by-row insert machinery entirely and write directly to the storage engine. For a 10-million-row import, COPY completes in minutes where even batched INSERT takes hours.
Rule: single-row INSERT for real-time events, multi-row VALUES batches of 500-1000 for batch jobs, COPY or LOAD DATA INFILE for bulk migrations. Know which pattern you're using and why.
🎯 Key Takeaway
INSERT is not just syntax — it is a performance decision disguised as a data operation. The pattern you choose (single-row vs bulk vs COPY) determines whether your data load takes seconds or hours.
Explicit column names in INSERT statements are not a style preference. They are the mechanism that makes INSERT statements resilient to schema evolution. Without them, a column addition becomes a production incident.
Punchline: if your application inserts rows in a loop with one INSERT per iteration, you have a performance problem waiting to become an incident when the data volume grows.
INSERT Pattern Selection
IfReal-time single event — user signup, order placed, log event written
UseSingle-row INSERT with explicit column names — one statement, executed immediately, errors surface immediately
IfBatch import or background job processing — 10 to ~10,000 rows
UseMulti-row VALUES INSERT in batches of 500-1000 rows — dramatically reduces network round-trips and lock overhead
IfBulk migration or data load — millions of rows, one-time or scheduled
UsePostgreSQL COPY FROM or MySQL LOAD DATA INFILE — bypasses row-level insert overhead entirely
IfRow might already exist — idempotent operation, retry-safe pipeline
UseINSERT ... ON CONFLICT (PostgreSQL) or INSERT ... ON DUPLICATE KEY UPDATE (MySQL) — handles both new and existing rows in a single statement without a preceding SELECT
IfInserting results of a SELECT query from another table
UseINSERT INTO target_table (col1, col2) SELECT col1, col2 FROM source_table WHERE ... — no VALUES clause needed

SQL UPDATE — Modifying Existing State Without Breaking It

UPDATE is the command that keeps data current. Password changes, price adjustments, address corrections, status transitions — all of these are UPDATEs under the hood. The mechanism is straightforward: identify which rows to change (WHERE), specify what to change them to (SET), and the database engine applies the modification atomically to every matched row.

The WHERE clause is where the danger lives. Without it, 'every matched row' means 'every row in the table.' The database engine doesn't distinguish between 'I want to update one row' and 'I want to update all rows' — it executes whatever you tell it to. A missing WHERE clause on a 10,000-row employees table is not caught by the SQL parser, not caught by the query planner, and not caught by the storage engine. It executes correctly and commits instantly if autocommit is on. The only thing that catches it is you, before you run it.

The SELECT-first workflow is the habit that prevents incidents. Before writing an UPDATE, write the SELECT equivalent first: SELECT FROM employees WHERE employee_id = 1. Verify that the result set contains exactly the rows you intend to modify. Count them. Then, and only then, convert the statement to an UPDATE by replacing SELECT with UPDATE employees SET salary = ... . This takes an extra 10 seconds. Over a career, it will save you hours.

For batch UPDATEs affecting large numbers of rows, the performance concern shifts from correctness to concurrency. An UPDATE that modifies 5 million rows in a single statement holds row-level locks on all affected rows for the duration of the operation — potentially minutes. Other transactions trying to write to any of those rows queue behind it. On a busy system, this causes cascading timeout failures across your API layer. The mitigation is batching: update 10,000 rows at a time with a COMMIT between batches, keeping the lock window short enough that concurrent writes are not blocked for an unacceptable duration.

io/thecodeforge/dml/update_employees.sql · SQL
123456789101112131415161718192021222324252627282930313233343536373839404142434445
-- io.thecodeforge: Production UPDATE patterns

-- PATTERN 1: Single-row targeted update via Primary Key.
-- Primary Key in WHERE ensures exactly one row is matched — the safest UPDATE pattern.
UPDATE employees
SET salary = 80000.00
WHERE employee_id = 1;  -- WHERE clause written first — non-negotiable habit
-- Verify: SELECT full_name, salary FROM employees WHERE employee_id = 1;

-- PATTERN 2: Multi-column update in one statement.
-- Multiple SET assignments separated by commas — one atomic operation.
-- Carlos is moving from HR to Engineering with a corresponding salary adjustment.
UPDATE employees
SET
    department = 'Engineering',
    salary     = 62000.00
WHERE employee_id = 4;

-- PATTERN 3: Conditional batch update — applying logic to a segment.
-- Increase Marketing salaries by 10%. salary * 1.10 uses the current value.
-- SELECT-first verification step (run this, confirm row count, then UPDATE):
-- SELECT full_name, salary, salary * 1.10 AS new_salary
-- FROM employees WHERE department = 'Marketing';
UPDATE employees
SET salary = salary * 1.10
WHERE department = 'Marketing';

-- PATTERN 4: Safe batch UPDATE for large tables — update in chunks.
-- Never run a multi-million-row UPDATE in a single transaction.
-- This example updates 1000 rows per batch, releasing locks between commits.
-- In a real script this would be a loop with a LIMIT and OFFSET or cursor.
UPDATE employees
SET salary = salary * 1.05
WHERE department = 'Engineering'
  AND employee_id BETWEEN 1 AND 1000;  -- batch boundaries
-- COMMIT; -- release locks, then run next batch

-- Audit the changes after all patterns.
SELECT
    employee_id,
    full_name,
    department,
    salary
FROM employees
ORDER BY employee_id;
▶ Output
-- After all updates:
employee_id | full_name | department | salary
-------------+----------------+-------------+-----------
1 | Sarah Mitchell | Engineering | 84000.00 <- 80000 * 1.05 (Engineering batch)
2 | James Okafor | Marketing | 64900.00 <- 59000 * 1.10 (Marketing 10% raise)
3 | Priya Nair | Engineering | 75600.00 <- 72000 * 1.05 (Engineering batch)
4 | Carlos Mendes | Engineering | 65100.00 <- 62000 * 1.05 (Engineering batch)
5 | Lisa Thornton | Engineering | 79800.00 <- 76000 * 1.05 (Engineering batch)
6 | Derek Huang | Marketing | 64900.00 <- 59000 * 1.10 (Marketing 10% raise)
(6 rows)
⚠ UPDATE Without WHERE Is the Most Common Production Incident in SQL
UPDATE employees SET salary = 0 with no WHERE clause executes without error and zeros every salary in the table. The database did exactly what you told it to. There is no warning, no confirmation prompt, and no undo — unless you are inside an open transaction you can roll back. The discipline: write WHERE before you write SET. Physically type the WHERE clause first, even if you fill in the condition afterward. Then write SELECT * FROM employees WHERE [your condition] and run that first. Verify the rows. Count them. Then convert to UPDATE. This habit is what separates engineers who have had incidents from engineers who haven't had them yet.
📊 Production Insight
An UPDATE without an index on the WHERE clause column does a full table scan to find matching rows. On a 50-million-row table, that scan takes minutes and holds locks the entire time. Every other write to that table queues behind it. API latency spikes. Connection pools fill. The query that looked instantaneous in staging on 10,000 rows becomes a 15-minute lock event in production on 50 million.
Always run EXPLAIN UPDATE ... before executing a batch UPDATE on a large table. Look for 'Seq Scan' in the execution plan — that means no index is being used. Add an index on the WHERE clause column before running the UPDATE. For PostgreSQL, use CREATE INDEX CONCURRENTLY to build the index without blocking concurrent reads and writes.
Rule: EXPLAIN before UPDATE on any table with more than 100,000 rows. Never estimate performance based on staging row counts.
🎯 Key Takeaway
WHERE is not optional on UPDATE — it is the difference between a targeted fix and a production incident. No WHERE means every row, and the database engine will execute that without hesitation.
The SELECT-first pattern takes 10 seconds. Recovery from a WHERE-less UPDATE on a large table takes hours to days, depending on backup recency. The math is simple.
Punchline: if you find yourself in a hurry and thinking about skipping the SELECT-first verification, that is exactly the moment to slow down. Urgency is when WHERE clauses go missing.

SQL DELETE — Removing Records Safely and Responsibly

DELETE removes entire rows from a table. Not a column. Not a value. The entire row, with all its data, removed from the storage engine. This distinction matters: if you want to clear a value in a specific column while keeping the row, you use UPDATE SET column = NULL. DELETE is for when the record itself should no longer exist.

DELETE is irreversible without a transaction or a backup. Unlike UPDATE, where the old values might be recoverable from audit logs or CDC streams, a committed DELETE removes the row from the table immediately. Vacuum processes in PostgreSQL and InnoDB compaction in MySQL will eventually reclaim the physical disk space, but from the application's perspective, the data is gone the moment the DELETE commits.

This permanence is why most production systems for user-facing data default to Soft Delete rather than Hard Delete. A Soft Delete doesn't use the DELETE command at all — it uses UPDATE to set an is_deleted flag to true or a deleted_at timestamp to the current time. The row remains in the table, queries that should exclude deleted records add WHERE is_deleted = false or WHERE deleted_at IS NULL, and the data can be restored by flipping the flag back. The trade-off is table bloat over time, which requires periodic archival jobs to move soft-deleted rows to cold storage.

Hard Delete has legitimate use cases: GDPR right-to-erasure requests (where soft delete doesn't satisfy the legal requirement to actually remove data), clearing temporary staging tables that are re-populated on each run, and removing log or event data past its retention window. In all three cases, the DELETE should be inside a transaction, preceded by a SELECT to verify the target rows, and — for large tables — executed in batches to avoid holding locks for an unacceptable duration.

The batching requirement for large DELETEs is more critical than for large UPDATEs. Every deleted row generates a transaction log entry proportional to the row size. A single DELETE of 50 million rows generates enough WAL or redo log volume to fill disk on many systems. Batching with COMMIT between chunks keeps the per-transaction log volume manageable and keeps the lock window short enough for other writes to proceed.

io/thecodeforge/dml/delete_employees.sql · SQL
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556
-- io.thecodeforge: Production DELETE patterns

-- PATTERN 1: Targeted Hard Delete via Primary Key
-- Primary Key in WHERE guarantees exactly one row is affected — the safest DELETE pattern.

-- Step 1: Always verify the target row before deleting.
SELECT * FROM employees WHERE employee_id = 2;
-- Verify this is James Okafor. Confirm there is exactly one row.

-- Step 2: Delete only after verifying the target.
DELETE FROM employees
WHERE employee_id = 2;  -- Primary Key: guaranteed one row, no ambiguity

-- PATTERN 2: Soft Delete — the default for user-facing data
-- Requires an is_deleted column (ALTER TABLE employees ADD COLUMN is_deleted BOOLEAN DEFAULT false)
-- Never physically removes the row — just hides it from application queries.
-- This example is illustrative; it works once the column exists.
-- UPDATE employees
-- SET is_deleted = true, deleted_at = NOW()
-- WHERE employee_id = 2;
-- Restore: UPDATE employees SET is_deleted = false, deleted_at = NULL WHERE employee_id = 2;

-- PATTERN 3: Batch delete by category
-- Add temporary staging rows to demonstrate scoped batch delete.
INSERT INTO employees (employee_id, full_name, department, salary)
VALUES
    (7, 'Angela Foster', 'HR', 47000.00),
    (8, 'Ben Stafford',  'HR', 44000.00);

-- Step 1: Count affected rows — verify before deleting.
SELECT COUNT(*) FROM employees WHERE department = 'HR';
-- Expected: 2 rows (Angela and Ben — Carlos moved to Engineering in the UPDATE section)

-- Step 2: Wrap in a transaction for safety. Verify row count inside BEGIN.
BEGIN;
    DELETE FROM employees
    WHERE department = 'HR';
    -- The database engine returns the rows-affected count here.
    -- If it matches your expectation from the COUNT above, proceed.
COMMIT;
-- If the count is wrong: ROLLBACK; instead of COMMIT;

-- PATTERN 4: Batched DELETE for large tables (pseudocode structure)
-- Never delete millions of rows in one transaction.
-- Loop until no rows remain in the target set:
-- DELETE FROM employees
-- WHERE department = 'archived'
--   AND ctid IN (
--       SELECT ctid FROM employees
--       WHERE department = 'archived'
--       LIMIT 10000
--   );
-- COMMIT; -- release locks between batches

-- Final state after all deletes.
SELECT * FROM employees ORDER BY employee_id;
▶ Output
-- SELECT * FROM employees WHERE employee_id = 2;
employee_id | full_name | department | salary
-------------+--------------+------------+---------
2 | James Okafor | Marketing | 64900.00
(1 row)
-- Confirmed: this is the correct target. Proceeding with DELETE.

-- SELECT COUNT(*) FROM employees WHERE department = 'HR';
count
-------
2
(1 row)
-- Two rows will be deleted. Confirmed. COMMIT.

-- Final state:
employee_id | full_name | department | salary
-------------+----------------+-------------+-----------
1 | Sarah Mitchell | Engineering | 84000.00
3 | Priya Nair | Engineering | 75600.00
4 | Carlos Mendes | Engineering | 65100.00
5 | Lisa Thornton | Engineering | 79800.00
6 | Derek Huang | Marketing | 64900.00
(5 rows)
⚠ DELETE Without WHERE Empties the Entire Table — Instantly, With No Confirmation
DELETE FROM employees with no WHERE clause removes every row. The table structure survives — columns, indexes, constraints all remain — but every record is gone. The database engine does not ask 'are you sure?' It executes. The three rules: First, always write WHERE before FROM when typing a DELETE. Second, always run SELECT * FROM employees WHERE [your condition] first and verify the result set is exactly what you intend to remove. Third, always wrap production DELETEs in BEGIN/COMMIT and verify the rows-affected count before committing. These three habits, applied consistently, mean that a missing WHERE clause gets caught before it commits — not after.
📊 Production Insight
Batching is not optional for large DELETEs — it is a correctness requirement for systems that depend on transaction log space. A single DELETE across 50 million rows generates WAL entries proportional to all 50 million rows' worth of data. On a database with 10GB of WAL space configured, this can fill the transaction log mid-operation, causing the database to suspend all writes until the log is archived. Recovering from a filled WAL in production is a multi-hour event.
The batching pattern: DELETE from a target set 10,000 rows at a time, COMMIT between batches, and include a brief pg_sleep(0.1) or application-level sleep between iterations on high-throughput systems to allow replication lag to stay manageable. The total wall-clock time for a batched delete is longer, but the peak impact on concurrent transactions and log space is dramatically lower.
Rule: for any DELETE affecting more than 100,000 rows, implement batching before executing. Estimate affected rows with SELECT COUNT(*) first, then calculate whether the single-statement approach is safe given your WAL/redo log configuration.
🎯 Key Takeaway
DELETE is permanent — there is no recycle bin in most databases. A committed DELETE without a backup or transaction is a one-way door.
Soft Delete is the correct default for user-facing data in any system where audit trails, regulatory compliance, or accidental deletion recovery matter. Hard Delete is a deliberate architectural choice for specific use cases, not the lazy default.
Punchline: if you are ever uncertain whether to use Soft Delete or Hard Delete, choose Soft Delete. The cost of over-retaining data is manageable. The cost of accidentally losing data that a regulator asks for later is not.
Delete Strategy Selection
IfUser-facing data — accounts, profiles, orders — needs audit trail or potential recovery
UseSoft Delete: UPDATE employees SET is_deleted = true, deleted_at = NOW() WHERE id = X. Recoverable. Auditable. Does not break foreign key references.
IfGDPR right-to-erasure or legal requirement to permanently remove data
UseHard Delete inside a transaction: BEGIN; DELETE FROM employees WHERE id = X; COMMIT. Verify with SELECT first. Document the deletion event in a separate audit log.
IfClearing temporary staging or work tables before re-population
UseTRUNCATE TABLE staging_table — faster than DELETE, resets auto-increment, minimal transaction log impact. Only use when the table structure is preserved and all rows are being cleared.
IfLarge-scale data purge — millions of rows past a retention window
UseBatched DELETE: 10,000 rows per transaction with COMMIT between batches. Keep lock windows short and WAL volume per transaction manageable.
IfDecommissioning an entire table permanently
UseDROP TABLE — removes the table structure, all data, all indexes, all permissions. Irreversible. Never use DROP when you mean DELETE.
🗂 INSERT vs UPDATE vs DELETE — Three Commands, Three Purposes
What each command does, when it is dangerous, and what the correct discipline looks like for each
AspectINSERTUPDATEDELETE
What it doesAdds new rows to the table — does not modify existing dataChanges specific column values in existing rows — row count stays the sameRemoves entire rows from the table — column values in those rows are gone
Requires WHERE?No — INSERT always targets new rows, WHERE has no roleYes — omitting WHERE updates every row in the table simultaneouslyYes — omitting WHERE deletes every row in the table simultaneously
Affects row count?Increases by the number of rows successfully insertedNo change — the same rows exist before and afterDecreases by the number of rows matched by the WHERE clause
Can target multiple rows?Yes — multi-row VALUES syntax inserts many rows in one statementYes — WHERE clause can match thousands of rows; all are updated atomicallyYes — WHERE clause can match thousands of rows; all are removed atomically
Reversible without a transaction?Only by running a matching DELETE — which requires you to know what was insertedOnly by running another UPDATE with the old values — which requires you to know what they wereNot easily — the data is gone. Requires point-in-time recovery or a backup restore.
Safe execution patternName columns explicitly. For batch: multi-row VALUES. For idempotent: ON CONFLICT clause.SELECT-first to verify targets. Wrap in BEGIN/COMMIT. Check rows-affected count before committing.SELECT-first to verify targets. Wrap in BEGIN/COMMIT. Batch large deletes. Prefer Soft Delete for user data.
Typical real-world triggerNew user signup, new order placed, log event recorded, configuration entry createdUser updates profile, price adjusted, order status changes, permission modifiedAccount closed, record archived, GDPR erasure request, staging table cleared

🎯 Key Takeaways

  • INSERT adds new rows — always name your columns explicitly in every INSERT statement. Position-based inserts are schema bombs that detonate silently when someone adds a column to the table later.
  • UPDATE changes existing rows — the WHERE clause is the only thing preventing every row in the table from being modified simultaneously. Write WHERE before SET, every time, without exception. No WHERE equals every row.
  • DELETE removes entire rows permanently — the row and all its column values are gone. Without WHERE, the entire table is emptied. Run SELECT with the same WHERE clause first, verify the row count, then DELETE inside a BEGIN/COMMIT block.
  • The SELECT-before-DML habit is one of the highest-value practices in database work. It costs 10 seconds and prevents recoveries that cost 10 hours. Build it as a reflex, not a procedure you follow when you remember.
  • Bulk INSERT with multi-row VALUES syntax is 10-50x faster than single-row INSERT in a loop — the difference is network round-trips. For very large loads, COPY (PostgreSQL) or LOAD DATA INFILE (MySQL) outperforms even batched INSERT by another order of magnitude.
  • Soft Delete (UPDATE SET is_deleted = true) is the correct default for user-facing data in regulated or audit-sensitive systems. Hard Delete is a deliberate choice for specific scenarios — GDPR erasure, staging tables, data past retention windows — not the casual path of least resistance.

⚠ Common Mistakes to Avoid

    Running DELETE or UPDATE without a WHERE clause
    Symptom

    Every row in the table gets affected instantly. Salaries zeroed, all records deleted, all statuses overwritten. No error is raised because the syntax is completely valid — the database did exactly what was asked.

    Fix

    Write the WHERE clause before writing SET or FROM — physically type it first, even if you fill in the condition afterward. Follow the SELECT-first workflow without exception: run SELECT * FROM table WHERE [condition], verify the result set contains exactly the intended rows, confirm the count, then convert the statement to UPDATE or DELETE. Wrap all production DML in BEGIN/COMMIT so there is a ROLLBACK window. If your database client has a 'safe update mode' setting (MySQL's --safe-updates flag), enable it — it rejects UPDATE and DELETE statements without a WHERE clause automatically.

    Omitting column names in INSERT statements
    Symptom

    INSERT succeeds today but silently stores data in the wrong columns after a schema change. A new column added between existing ones shifts all subsequent value positions. The error may not surface immediately — wrong values in wrong columns can persist for days before someone notices inconsistencies in application behavior.

    Fix

    Always name every column explicitly in every INSERT statement: INSERT INTO employees (employee_id, full_name, department, salary) VALUES (...). Never rely on column position. This is the most common SQL practice violation I see in code reviews, and it has caused real production data corruption when schema migrations add columns mid-application.

    Confusing DELETE, TRUNCATE, and DROP TABLE
    Symptom

    Developer intends to clear rows for a fresh data seed. They run DROP TABLE, which removes the entire table structure — columns, indexes, constraints, permissions, sequences — not just the data. Recovery requires restoring from a backup or recreating the schema from migration scripts.

    Fix

    DELETE FROM table WHERE [condition] removes rows matching the condition while preserving the table structure. TRUNCATE TABLE removes all rows quickly (DDL operation, minimal logging, resets auto-increment) while preserving the table structure. DROP TABLE removes the table itself — use only when decommissioning the table permanently. Know which of the three you need before running any of them.

    Single-row INSERT in a loop for bulk data loading
    Symptom

    An import job that should take two minutes runs for 45 minutes. Each iteration of the loop opens a connection, sends a query, waits for acknowledgment, and closes. For 100,000 rows, this is 100,000 round-trips. The database connection pool exhausts. Other services start experiencing timeout errors as connections are held waiting for the import to complete.

    Fix

    Batch INSERTs using multi-row VALUES syntax — insert 500-1000 rows per statement. For very large loads (millions of rows), use PostgreSQL's COPY command or MySQL's LOAD DATA INFILE, which bypass the row-by-row insert machinery and write directly to the storage engine at orders-of-magnitude higher throughput. Profile the import job's duration against the row count — if it scales linearly, you're doing single-row inserts.

    Fitting the scaler on full data — wrong transaction isolation for concurrent reads during DML
    Symptom

    An UPDATE or DELETE that works correctly in isolation produces unexpected results when other transactions are writing concurrently. Rows that should have been targeted are skipped. Rows that should have been skipped are modified. The behavior is non-deterministic and difficult to reproduce.

    Fix

    Understand your database's default isolation level. In PostgreSQL, READ COMMITTED (the default) means an UPDATE's WHERE clause re-evaluates each row at the time that row is processed, not at the time the statement started. If another transaction modifies a row between when the UPDATE started and when it reaches that row, the row's current state determines whether it matches the WHERE clause. For operations that require a consistent snapshot of the table at statement start, use REPEATABLE READ or SERIALIZABLE isolation: BEGIN ISOLATION LEVEL REPEATABLE READ; UPDATE ...; COMMIT;

Interview Questions on This Topic

  • QWhat is the difference between DELETE and TRUNCATE? Cover DML vs DDL, transaction logging, trigger behavior, and performance.JuniorReveal
    DELETE is a DML command that removes rows one at a time (conceptually), logs each row deletion individually in the transaction log, respects DELETE triggers (triggers fire once per row or once per statement depending on the database), and can be rolled back if inside an explicit transaction. Because each row deletion is logged, the log volume is proportional to the number of rows deleted — a 50M-row DELETE generates enormous WAL/redo log output. TRUNCATE is a DDL command (in most databases) that deallocates data pages rather than logging individual row deletions. It is dramatically faster — removing 50 million rows via TRUNCATE takes seconds versus hours for DELETE. TRUNCATE resets auto-increment/sequence counters to their starting value, does not fire row-level triggers, and in most databases (MySQL, SQL Server) cannot be rolled back because it is DDL and DDL is implicitly committed. PostgreSQL is a notable exception — TRUNCATE inside a BEGIN/COMMIT block can be rolled back. Practical guidance: use DELETE when you need WHERE filtering, trigger execution, or precise control over which rows are removed. Use TRUNCATE when you need to empty an entire table as fast as possible — common for staging tables, test data reset, and ETL target tables.
  • QExplain Soft Delete vs Hard Delete. Why would a production banking system prefer one over the other, and what are the operational trade-offs?Mid-levelReveal
    Hard Delete uses DELETE FROM to permanently remove a row from the table. Once committed, the data is gone from the application's perspective — vacuum/compaction eventually reclaims the disk space. Hard Delete is simple, keeps tables lean, and is appropriate for temporary data (staging rows, log events past retention windows, GDPR erasure requests). Soft Delete doesn't use DELETE at all. It uses UPDATE to set a flag: is_deleted = true or deleted_at = NOW(). The row remains in the table. Application queries add WHERE is_deleted = false to exclude deleted records. The data can be restored by flipping the flag back. Banking systems prefer Soft Delete for customer and transaction data because: regulatory compliance (SOX, PCI-DSS) requires audit trails proving what data existed and when; accidental deletions can be reversed in seconds by a support engineer without a backup restore; referential integrity is preserved — foreign keys pointing to a soft-deleted row don't break; and historical reporting can include deleted records by removing the is_deleted filter. The trade-offs: table bloat over time (deleted rows never physically leave), index bloat (indexes include soft-deleted rows that don't participate in live queries), and queries must consistently include the is_deleted filter — missing it in one query shows deleted records as if they're active. Mitigation: periodic archival jobs that move soft-deleted rows older than the retention window to a cold storage table, and database views or row-level security that enforce the is_deleted filter automatically.
  • QLeetCode challenge: Write a SQL query to delete all duplicate emails from a table named 'Users', keeping only the row with the smallest ID for each email.Mid-levelReveal
    The approach: identify the minimum ID for each email group (those rows should be kept), then delete everything else. For PostgreSQL: ``sql DELETE FROM Users WHERE id NOT IN ( SELECT MIN(id) FROM Users GROUP BY email ); ` For MySQL (requires a derived table alias because MySQL cannot reference the DELETE target table directly in a subquery): `sql DELETE FROM Users WHERE id NOT IN ( SELECT min_id FROM ( SELECT MIN(id) AS min_id FROM Users GROUP BY email ) AS rows_to_keep ); ` Alternative using a self-join (works across most databases): `sql DELETE u1 FROM Users u1 INNER JOIN Users u2 ON u1.email = u2.email AND u1.id > u2.id; `` The self-join approach deletes any row where another row exists with the same email and a smaller ID — effectively keeping only the minimum-ID row per email group. Before running either in production: SELECT MIN(id), email, COUNT() FROM Users GROUP BY email HAVING COUNT() > 1 to identify the duplicates and verify the scope.
  • QYou accidentally run an UPDATE without a WHERE clause and your session is still open. What do you do immediately, and what are your options if autocommit was enabled?JuniorReveal
    If you are inside an explicit transaction (you ran BEGIN or START TRANSACTION before the UPDATE), execute ROLLBACK; immediately. This reverts all changes made since BEGIN, restoring the table to its pre-UPDATE state. The row locks held by the transaction are released and the write is completely discarded. If autocommit is enabled (the default in most database clients and ORM configurations), the UPDATE was committed the moment it completed. ROLLBACK has no effect on an already-committed change. Your options at this point: 1. Point-in-time recovery (PITR): if your database has WAL archiving (PostgreSQL) or binary logging (MySQL) enabled, you can restore the database to a specific timestamp before the UPDATE executed. This requires taking the database offline or restoring to a separate instance for data extraction. 2. Backup restore: restore from the most recent backup to a separate instance, extract the pre-UPDATE values from that instance, and apply them to production via a corrective UPDATE with explicit WHERE conditions. 3. CDC or audit log reconstruction: if you have Change Data Capture (Debezium, AWS DMS) or application-level audit tables that record before-images of row changes, you can extract the original values and construct corrective UPDATEs. Prevention: disable autocommit in production database sessions, wrap all DML in explicit BEGIN/COMMIT blocks, require rows-affected count verification before COMMIT, and implement peer review for DML affecting more than a defined row threshold.
  • QHow does the database engine handle an INSERT that violates a UNIQUE constraint? How do you write an upsert that handles both the insert and update case in a single statement?Mid-levelReveal
    When an INSERT violates a UNIQUE constraint or PRIMARY KEY, the database engine checks the relevant index during the write operation. If a matching value already exists, the INSERT is rejected and the row is not written. The error is raised at statement execution time, not at commit time (unless the constraint is declared DEFERRABLE INITIALLY DEFERRED, in which case checking is deferred to COMMIT). Error codes by database: - PostgreSQL: 23505 (unique_violation) - MySQL: 1062 (Duplicate entry '%s' for key '%s') - SQL Server: 2627 - SQLite: SQLITE_CONSTRAINT_UNIQUE Upsert syntax by database: PostgreSQL: ``sql INSERT INTO employees (employee_id, full_name, salary) VALUES (3, 'Priya Nair', 75000.00) ON CONFLICT (employee_id) DO UPDATE SET salary = EXCLUDED.salary, full_name = EXCLUDED.full_name; -- EXCLUDED. references the values that were attempted in the INSERT. -- ON CONFLICT DO NOTHING silently skips if the row already exists. ` MySQL: `sql INSERT INTO employees (employee_id, full_name, salary) VALUES (3, 'Priya Nair', 75000.00) ON DUPLICATE KEY UPDATE salary = VALUES(salary), full_name = VALUES(full_name); ` SQL Server: `sql MERGE employees AS target USING (VALUES (3, 'Priya Nair', 75000.00)) AS source (employee_id, full_name, salary) ON target.employee_id = source.employee_id WHEN MATCHED THEN UPDATE SET salary = source.salary WHEN NOT MATCHED THEN INSERT (employee_id, full_name, salary) VALUES (source.employee_id, source.full_name, source.salary); `` Upsert is particularly valuable for idempotent data pipelines where retrying a failed operation should not produce duplicate rows.
  • QExplain how a missing index on the WHERE clause column affects the performance of UPDATE and DELETE on a large table. How would you diagnose and fix it without blocking production traffic?SeniorReveal
    When the WHERE clause column has no index, the database engine performs a sequential scan (full table scan) to find matching rows. For a 10-million-row table, this means reading every row into memory, evaluating the WHERE condition, and then applying the modification to matched rows. The scan alone can take minutes on spinning disk or seconds on NVMe — but more critically, the rows are locked for the entire duration. For UPDATE: every row being modified acquires a row-level lock. Concurrent transactions trying to update any of those rows queue behind the running UPDATE. On a high-write-throughput system, the queue fills quickly and API timeouts cascade. For DELETE: same lock behavior, plus the additional WAL generation for each deleted row, which can fill transaction log space on large deletes. Diagnosis: run EXPLAIN (ANALYZE, BUFFERS) UPDATE employees SET salary = salary * 1.05 WHERE department = 'Marketing'. Look for 'Seq Scan on employees' in the execution plan — this confirms no index is being used. Compare the estimated vs actual row counts and the buffers hit count to understand how much data the scan is reading. Fix without blocking production: add the index concurrently before running the DML. In PostgreSQL: CREATE INDEX CONCURRENTLY idx_employees_dept ON employees(department). CONCURRENTLY builds the index in the background without holding an exclusive lock — existing reads and writes continue normally during the build. The build takes longer than a standard CREATE INDEX, but it does not block production traffic. After the index is built, EXPLAIN confirms the execution plan switches from Seq Scan to Index Scan. Then run the UPDATE or DELETE — now it targets only the matching rows directly, holds locks only on those rows, and completes in a fraction of the previous time.

Frequently Asked Questions

Can I undo a DELETE or UPDATE after it's been executed?

It depends on whether the change has been committed and whether your database has recovery mechanisms in place.

If you ran BEGIN (or START TRANSACTION) before the DML and haven't committed yet, execute ROLLBACK; immediately — this reverts all changes made since BEGIN and releases the locks. This is the clean undo path.

If autocommit was enabled (the default in most clients), the change was committed the moment it completed. ROLLBACK has no effect. Your options are: point-in-time recovery using PostgreSQL's WAL archiving or MySQL's binary log (requires the feature to have been configured before the incident), a backup restore from the most recent snapshot (with data loss equal to the time since the last backup), or reconstruction from an audit log or CDC stream if your system has one.

The practice that ensures you always have the clean undo path: disable autocommit in production database sessions, always wrap DML in explicit BEGIN/COMMIT blocks, and never COMMIT until you have verified the rows-affected count matches your expectation.

What is the difference between DELETE FROM employees and DROP TABLE employees?

DELETE FROM employees is a data operation. It removes rows matching the WHERE clause (or all rows if no WHERE is provided), but the table structure — column definitions, indexes, constraints, permissions, sequences — remains intact. The table exists and is ready for new INSERTs.

DROP TABLE employees is a schema operation. It removes the entire table entity: every row, every column definition, every index, every constraint, every permission granted on the table, and any sequences associated with it. After DROP TABLE, the table does not exist. Recovery requires recreating the schema from migration scripts and restoring data from a backup.

TRUNCATE TABLE occupies the middle ground: removes all rows efficiently (like DROP + recreate but preserving the structure), resets auto-increment counters, and does not require a WHERE clause because it always empties the entire table.

Rule: DELETE to remove specific rows, TRUNCATE to empty a table while keeping its structure, DROP to decommission the table permanently.

Can an INSERT statement add rows to multiple tables at the same time?

Standard SQL does not support inserting into multiple tables in a single INSERT statement — one INSERT targets one table. For writing to multiple tables as part of one logical operation, use multiple INSERT statements inside a single transaction: BEGIN; INSERT INTO table_a ...; INSERT INTO table_b ...; COMMIT;. The COMMIT ensures both writes succeed or both are rolled back — the operation is atomic from the application's perspective.

Oracle's INSERT ALL provides a non-standard extension for multi-table insert in a single statement, useful for ETL-style fan-out operations. For most PostgreSQL, MySQL, and SQL Server applications, the transaction-wrapped multiple INSERT pattern is the standard approach.

When should I use Soft Delete vs Hard Delete?

Use Soft Delete (UPDATE SET is_deleted = true, deleted_at = NOW()) when: you need an audit trail proving the record existed and when it was removed; regulatory compliance requires data retention for a defined period; accidental deletions need to be reversible without a backup restore; or foreign key relationships reference the record and hard deletion would require cascading changes.

Use Hard Delete (DELETE FROM table WHERE ...) when: GDPR right-to-erasure requires the data to be physically removed; you're clearing temporary staging or work tables that are re-populated on each run; data past a retention window needs to be purged to manage storage costs; or the table is not user-facing and audit requirements don't apply.

The operational discipline for Hard Delete: always inside a transaction, always preceded by a SELECT to verify targets, always batched for large row counts to manage WAL/redo log volume and lock duration.

Why does my UPDATE affect 0 rows even though the data visually appears to match?
  1. String case mismatch: WHERE department = 'hr' doesn't match rows stored as 'HR'. SQL string comparisons are case-sensitive in PostgreSQL by default (and case-insensitive in MySQL with the default collation). Verify with SELECT department, LENGTH(department) FROM employees WHERE employee_id = X to see the exact stored value including any trailing spaces.
  2. NULL comparison: WHERE manager_id = NULL never matches any row, including rows where manager_id is NULL. NULL is not equal to NULL in SQL. Use WHERE manager_id IS NULL instead.
  3. Trailing whitespace: values stored with trailing spaces (common with CHAR columns and some data imports) don't match string literals without the spaces. Verify with SELECT department, LENGTH(TRIM(department)) = LENGTH(department) AS has_trailing_space FROM employees.

Diagnosis: run SELECT COUNT(*) FROM employees WHERE [exact condition from your UPDATE] — if it returns 0, the WHERE clause is the problem, not the UPDATE syntax.

🔥
Naren Founder & Author

Developer and founder of TheCodeForge. I built this site because I was tired of tutorials that explain what to type without explaining why it works. Every article here is written to make concepts actually click.

← PreviousSQL CREATE TABLE and Data TypesNext →SQL SELECT Statement
Forged with 🔥 at TheCodeForge.io — Where Developers Are Forged