Junior 11 min · March 06, 2026

SQL String Functions: Trailing Spaces Locked 30% of Users

Trailing spaces in SQL emails locked 30% of users.

N
Naren Founder & Principal Engineer

20+ years shipping high-throughput database systems. Drawn from code that ran under real load.

Follow
Production
production tested
May 24, 2026
last updated
1,554
articles · all by Naren
 ● Production Incident 🔎 Debug Guide
Quick Answer
  • String functions operate on VARCHAR/TEXT columns — UPPER, LOWER, LENGTH, TRIM, SUBSTRING, CONCAT, REPLACE
  • TRIM removes leading and trailing whitespace — invisible spaces cause silent JOIN and comparison failures
  • LIKE with a leading wildcard ('%term') disables index usage — forces a full table scan
  • LENGTH counts bytes in MySQL for multi-byte characters; use CHAR_LENGTH for character count
  • String concatenation: CONCAT in MySQL, || in PostgreSQL and standard SQL
  • Biggest mistake: comparing strings without normalizing case — WHERE city = 'london' misses 'London' silently
✦ Definition~90s read
What is SQL String Functions?

SQL string functions are built-in operations that manipulate character data directly inside database queries, letting you clean, transform, and search text without pulling it into application code. They exist because raw database text is rarely clean—trailing spaces, inconsistent casing, embedded delimiters, and partial matches are the norm in production systems.

Imagine you work at a library and someone hands you a messy pile of book titles — some in ALL CAPS, some with random spaces at the start, some misspelled.

Without them, you'd be writing procedural loops or shipping gigabytes of raw strings to your app just to trim whitespace or extract a substring. These functions are the difference between a query that returns 'john@example.com ' (with a trailing space that breaks your login) and one that returns 'john@example.com'.

In practice, you'll reach for a core set of seven functions in nearly every project: UPPER, LOWER, LENGTH for normalization and sizing; TRIM, LTRIM, RTRIM for stripping invisible characters that silently corrupt joins and comparisons; and SUBSTRING, CONCAT, REPLACE for slicing, joining, and swapping text. The article's title references a real-world consequence: a 30% user lockout caused by trailing spaces in email fields—a problem TRIM solves in one line.

For search operations, LIKE with wildcards works for simple patterns but kills performance on large tables; PATINDEX and CHARINDEX give you position-based searching without full table scans, critical when you're querying millions of rows.

Alternatives exist—full-text search engines like Elasticsearch or PostgreSQL's tsvector for complex linguistic queries, and application-layer regex for patterns SQL can't handle. But for the 90% of text manipulation tasks—normalizing user input, building display strings, cleaning imported data—SQL string functions are the right tool because they execute inside the database engine, avoiding network round-trips and keeping your data pipeline simple.

Don't use them for heavy parsing or multi-language collation; that's what dedicated search indexes are for.

Plain-English First

Imagine you work at a library and someone hands you a messy pile of book titles — some in ALL CAPS, some with random spaces at the start, some misspelled. SQL string functions are the tools you use to clean up, reshape, and search through that text, all without touching the original paper. They let your database do the tedious text-editing work so you never have to open a spreadsheet and do it by hand.

Every real-world database is full of text — customer names, email addresses, product descriptions, phone numbers. The problem is that text is messy. Users type their names in all caps, leave trailing spaces, or enter 'New York' when your system expects 'new york'. If you can't clean and manipulate that text inside the database itself, you end up writing hundreds of lines of application code to do a job SQL can handle in a single line.

SQL string functions solve exactly that problem. They're built-in tools the database gives you to inspect, clean, transform, and combine text columns on the fly — right inside your queries. Instead of pulling all the data into Python or JavaScript and then fixing it, you fix it where it lives.

By the end of this article you'll know the seven most important SQL string functions, exactly when to reach for each one, how to chain them together for real cleaning tasks, and the two traps that catch almost every beginner. You'll be able to look at a messy data problem and immediately know which function to use.

Why SQL String Functions Are Not Just Trivial Text Tools

SQL string functions are built-in operations that manipulate character data — concatenating, extracting, trimming, replacing, and transforming strings directly in queries. Unlike application-level string handling, these functions execute inside the database engine, operating on column values row by row or across sets. The core mechanic is that each function returns a new string value derived from the input, leaving the original data unchanged unless explicitly updated.

In practice, SQL string functions are deterministic (same input always yields same output) and can be used in SELECT, WHERE, JOIN, and ORDER BY clauses. However, their behavior varies across databases — LENGTH in Oracle returns bytes, while in PostgreSQL it returns characters. Trailing spaces are handled inconsistently: CHAR columns pad values with spaces, and comparisons in some databases ignore trailing spaces (SQL Server) while others treat them as significant (PostgreSQL). This subtlety causes silent data mismatches, failed joins, and incorrect aggregations.

Use SQL string functions when you need to clean, normalize, or transform data within the database rather than pulling raw data into an application layer. They are critical for ETL pipelines, data migration, and reporting — especially when dealing with user input, legacy systems, or cross-database compatibility. Ignoring their quirks, especially around whitespace and collation, leads to production bugs that are hard to trace because the data looks correct in a GUI but behaves differently in queries.

Trailing Spaces Are Not Your Friend
In SQL Server, 'abc' = 'abc ' evaluates to true due to ANSI padding rules, but in PostgreSQL it's false — a silent join killer across databases.
Production Insight
A CRM migration from SQL Server to PostgreSQL caused 30% of user records to fail matching because source CHAR(10) fields had trailing spaces that PostgreSQL treated as distinct.
Symptom: JOINs on email or username returned fewer rows than expected, with no error message — only a mysterious drop in record counts.
Rule: Always TRIM() or RTRIM() string columns before comparison or join, especially when data originates from fixed-width or legacy systems.
Key Takeaway
Trailing space behavior is database-specific and can silently break joins and lookups.
String functions are deterministic but not portable — test across target databases.
Always normalize strings (TRIM, UPPER, COLLATE) at the boundary where data enters your system.
SQL String Functions: Trailing Spaces Locked 30% of Users THECODEFORGE.IO SQL String Functions: Trailing Spaces Locked 30% of Users Core string functions and common pitfalls in SQL text processing UPPER, LOWER, LENGTH Case conversion and string size TRIM, LTRIM, RTRIM Remove leading/trailing spaces SUBSTRING, CONCAT, REPLACE Slicing, joining, and substitution LIKE, PATINDEX, CHARINDEX Pattern matching and position finding REPLACE and TRANSLATE Character-level replacement vs. mapping CONCAT_WS() Clean concatenation with separator ⚠ Trailing spaces cause unexpected mismatches in comparisons Always TRIM or use LIKE with wildcards to avoid locked users THECODEFORGE.IO
thecodeforge.io
SQL String Functions: Trailing Spaces Locked 30% of Users
Sql String Functions

The Seven String Functions You'll Use 90% of the Time

SQL has dozens of string functions, but most professional work comes down to seven. Think of them like tools in a toolbox — a hammer, a screwdriver, a tape measure. You don't need every tool on day one; you need to know the core ones cold.

— UPPER / LOWER: change the case of text, like Caps Lock on a keyboard. — LENGTH: count how many characters are in a string, like counting letters in a word. — TRIM / LTRIM / RTRIM: remove unwanted spaces from the edges of text, like cutting the crusts off bread. — SUBSTRING: pull out a chunk of text from the middle of a string, like cutting a slice from a loaf. — CONCAT: glue two or more pieces of text together, like taping two pieces of paper side by side. — REPLACE: swap one piece of text for another, like using find-and-replace in Word. — INSTR / CHARINDEX / POSITION: find where a smaller piece of text lives inside a bigger one.

Different databases (MySQL, PostgreSQL, SQL Server) spell a few of these slightly differently — we'll flag that as we go. The concepts are identical everywhere.

string_functions_overview.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
-- ============================================================
-- Setup: create a small customers table with intentionally
-- messy data so we can demonstrate each function clearly.
-- ============================================================

CREATE TABLE customers (
    customer_id   INT,
    full_name     VARCHAR(100),
    email         VARCHAR(150),
    city          VARCHAR(50)
);

INSERT INTO customers VALUES
    (1, '  alice johnson  ', 'ALICE@EXAMPLE.COM', 'new york'),
    (2, 'BOB SMITH',         'bob@example.com',   'Los Angeles'),
    (3, 'carol white  ',     'Carol@Example.com', 'CHICAGO');

-- -------------------------------------------------------
-- Quick preview: what does the raw data look like?
-- -------------------------------------------------------
SELECT customer_id, full_name, email, city
FROM   customers;
Output
customer_id | full_name | email | city
------------|--------------------|---------------------|------------
1 | alice johnson | ALICE@EXAMPLE.COM | new york
2 | BOB SMITH | bob@example.com | Los Angeles
3 | carol white | Carol@Example.com | CHICAGO
Database Dialect Note:
All examples in this article use standard SQL that runs in MySQL 8+, PostgreSQL 14+, and SQL Server 2019+. Where syntax differs between databases we'll call it out explicitly with a short note.
Production Insight
String normalization (LOWER, TRIM) should happen at write time, not read time — applying LOWER() in WHERE prevents index usage.
Store emails in lowercase at INSERT time; store phone numbers in a canonical format — don't rely on query-time normalization.
For case-insensitive columns, PostgreSQL's CITEXT type handles case-insensitive comparisons automatically without function overhead.
Key Takeaway
Normalize strings at write time — don't apply LOWER or TRIM in WHERE clauses if you can avoid it.
Case-insensitive storage (lowercase at insert) allows exact-match queries to use indexes normally.
For text columns requiring case-insensitive search, PostgreSQL's CITEXT type is cleaner than LOWER() everywhere.

UPPER, LOWER and LENGTH — Your Case and Size Tools

Case inconsistency is one of the most common data quality problems in real databases. One record says 'London', another says 'LONDON', a third says 'london'. To your database those are three completely different values — so a GROUP BY city query would produce three separate buckets instead of one. That's a real bug.

UPPER(text) converts every letter in a string to uppercase. LOWER(text) converts every letter to lowercase. Neither modifies the original stored data — they return a new version of the text just for that query result. Think of it like wearing tinted glasses: the world looks different, but the world hasn't changed.

LENGTH(text) tells you how many characters are in a string, including spaces. This is useful for data validation — if a phone number column has a length other than 10 digits, something went wrong during data entry. Note: in SQL Server, the equivalent function is called LEN() not LENGTH() — a classic gotcha we'll revisit.

The real power comes when you combine these with a WHERE clause. Instead of hoping users typed city names consistently, you normalise both sides of the comparison so they always match regardless of how the data was entered.

upper_lower_length.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
-- ============================================================
-- UPPER: standardise city names to uppercase for safe grouping
-- ============================================================
SELECT
    UPPER(city)          AS city_uppercase,   -- 'new york' → 'NEW YORK'
    COUNT(*)             AS customer_count
FROM   customers
GROUP BY UPPER(city);                         -- groups correctly despite mixed case

-- ============================================================
-- LOWER: normalise email addresses (emails are case-insensitive
-- but databases treat them as case-sensitive strings)
-- ============================================================
SELECT
    customer_id,
    LOWER(email)  AS normalised_email  -- 'ALICE@EXAMPLE.COM' → 'alice@example.com'
FROM   customers;

-- ============================================================
-- LENGTH: find any city names that seem suspiciously short
-- (useful for catching typos or blank entries)
-- ============================================================
SELECT
    customer_id,
    city,
    LENGTH(city)  AS city_length   -- counts every character including spaces
FROM   customers
WHERE  LENGTH(city) < 5;           -- flag anything under 5 characters as suspect

-- ============================================================
-- PRACTICAL COMBO: case-insensitive search
-- Find all customers in New York regardless of how it was typed
-- ============================================================
SELECT *
FROM   customers
WHERE  LOWER(city) = 'new york';   -- normalise BOTH sides to lowercase
Output
-- UPPER + GROUP BY result:
city_uppercase | customer_count
---------------|---------------
NEW YORK | 1
LOS ANGELES | 1
CHICAGO | 1
-- LOWER email result:
customer_id | normalised_email
------------|------------------
1 | alice@example.com
2 | bob@example.com
3 | carol@example.com
-- LENGTH < 5 result:
(0 rows — all city names in this dataset are 7+ characters)
-- Case-insensitive search result:
customer_id | full_name | email | city
------------|------------------|-------------------|----------
1 | alice johnson | ALICE@EXAMPLE.COM | new york
Watch Out — SQL Server Uses LEN() Not LENGTH():
In SQL Server, LENGTH() does not exist. Use LEN() instead. Also note that LEN() does NOT count trailing spaces, while MySQL's LENGTH() does. If you're counting spaces intentionally, use DATALENGTH() in SQL Server.
Production Insight
UPPER/LOWER applied to indexed columns in WHERE disables the index.
Create a functional index to restore index usage: CREATE INDEX idx_lower_email ON users (LOWER(email)).
For case-insensitive equality, LOWER(email) = LOWER($input) with a matching functional index is O(log n).
Key Takeaway
LOWER(col) = LOWER($input) forces a full scan unless you have a functional index on LOWER(col).
For case-insensitive lookups, create a functional index that matches the WHERE expression exactly.
LENGTH counts bytes in MySQL for multi-byte UTF8 — use CHAR_LENGTH for true character count.

TRIM, LTRIM and RTRIM — Killing the Invisible Spaces

Spaces you can't see are some of the sneakiest bugs in data work. When a user fills in a web form and accidentally hits the spacebar before their name, that space gets stored in the database. Now ' Alice' and 'Alice' are two different strings. Searches fail. Joins don't match. Reports double-count.

TRIM(text) removes spaces from both the left and right sides of a string. LTRIM only removes spaces on the left (the leading side). RTRIM only removes spaces on the right (the trailing side). Think of it like cutting the bread crusts — you can cut both ends, just the left, or just the right.

This comes up constantly in data imports. When you load data from a CSV file or an external API, strings almost always arrive with extra whitespace. Running TRIM before you process the data is like shaking the crumbs off a tablecloth before you sit down — a small step that prevents a lot of mess.

Modern SQL (SQL:2003 standard) lets you TRIM other characters too, not just spaces — for example trimming a dollar sign off a price string. We'll show that in the code below. PostgreSQL and SQL Server both support this; MySQL added it in version 8.

trim_functions.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
-- ============================================================
-- Basic TRIM: remove spaces from both sides of full_name
-- ============================================================
SELECT
    customer_id,
    full_name,                      -- original, messy version
    TRIM(full_name) AS clean_name,  -- spaces removed from both sides
    LENGTH(full_name)         AS original_length,
    LENGTH(TRIM(full_name))   AS trimmed_length  -- shows how many spaces were hiding
FROM   customers;

-- ============================================================
-- LTRIM vs RTRIM: when you only want to cut one side
-- Useful if you're processing fixed-width data exports where
-- trailing spaces are padding but leading spaces mean something.
-- ============================================================
SELECT
    '  hello world  '           AS original,
    LTRIM('  hello world  ')    AS left_trimmed,   -- 'hello world  '
    RTRIM('  hello world  ')    AS right_trimmed,  -- '  hello world'
    TRIM('  hello world  ')     AS fully_trimmed;  -- 'hello world'

-- ============================================================
-- TRIM a specific character (not just spaces)
-- Imagine prices were imported with a leading dollar sign
-- ============================================================
SELECT
    TRIM(LEADING '$' FROM '$49.99')   AS price_no_prefix,   -- '49.99'
    TRIM(TRAILING '!' FROM 'Sale!')   AS message_no_bang,   -- 'Sale'
    TRIM(BOTH   '*' FROM '***HOT***') AS product_clean;      -- 'HOT'

-- ============================================================
-- PRACTICAL USE: clean names before inserting into a report
-- Combine TRIM + UPPER to fully normalise a string
-- ============================================================
SELECT
    customer_id,
    UPPER(TRIM(full_name)) AS display_name  -- '  alice johnson  ' → 'ALICE JOHNSON'
FROM   customers;
Output
-- TRIM with length comparison:
customer_id | full_name | clean_name | original_length | trimmed_length
------------|------------------|---------------|-----------------|---------------
1 | alice johnson | alice johnson | 16 | 13
2 | BOB SMITH | BOB SMITH | 9 | 9
3 | carol white | carol white | 12 | 11
-- LTRIM / RTRIM demo:
original | left_trimmed | right_trimmed | fully_trimmed
-----------------|-----------------|-----------------|---------------
hello world | hello world | hello world | hello world
-- TRIM specific characters:
price_no_prefix | message_no_bang | product_clean
----------------|-----------------|---------------
49.99 | Sale | HOT
-- UPPER + TRIM combo:
customer_id | display_name
------------|---------------
1 | ALICE JOHNSON
2 | BOB SMITH
3 | CAROL WHITE
Pro Tip — Trim on INSERT, Not Just on SELECT:
Instead of trimming on every SELECT query (which runs the function millions of times on a large table), use a CHECK constraint or a BEFORE INSERT trigger to strip whitespace when data first enters the database. Clean at the gate, not at every door.
Production Insight
TRIM is not applied automatically on insert — users regularly paste values with surrounding whitespace.
Add a database-level CHECK constraint: CHECK (email = TRIM(email)) to prevent dirty data entering.
For batch imports from CSV or external systems, always run UPDATE table SET col = TRIM(col) as a post-import step.
Key Takeaway
TRIM at write time beats TRIM at read time — prevent dirty data, don't just work around it.
CHECK (col = TRIM(col)) is an elegant constraint that rejects values with leading/trailing spaces.
For imports: UPDATE table SET col = TRIM(col) WHERE col != TRIM(col) as a data quality step.

SUBSTRING, CONCAT and REPLACE — Slicing, Joining and Swapping Text

These three functions let you reshape text rather than just clean it. SUBSTRING pulls a piece out of the middle of a string. CONCAT sticks strings together. REPLACE swaps one piece of text for another. Together they cover almost every text-transformation task you'll face.

SUBSTRING(text, start, length) takes three arguments: the text to work on, the position to start cutting, and how many characters to take. Positions start at 1, not 0 — that trips up programmers coming from JavaScript or Python. So SUBSTRING('London', 1, 3) gives you 'Lon', not 'ond'.

CONCAT(text1, text2, ...) glues strings together. This is useful for building full names from first and last name columns, constructing URLs from parts, or building human-readable labels from codes. The || operator does the same job in PostgreSQL and standard SQL — but MySQL requires the CONCAT function.

REPLACE(text, old_value, new_value) finds every occurrence of old_value inside text and swaps it for new_value. This is powerful for data cleaning — swapping phone number formats, removing unwanted characters, or fixing a batch of mis-entered values all in one UPDATE statement.

substring_concat_replace.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
-- ============================================================
-- SUBSTRING: extract just the domain from an email address.
-- We know the '@' is always there — find where it is, then
-- grab everything after it.
--
-- INSTR(string, search) returns the position of the first
-- match. We add 1 to skip past the '@' itself.
-- ============================================================
SELECT
    email,
    SUBSTRING(
        email,
        INSTR(email, '@') + 1,          -- start one char AFTER the '@'
        LENGTH(email)                    -- grab the rest (generous upper bound)
    ) AS email_domain                   -- 'alice@example.com' → 'example.com'
FROM   customers;

-- ============================================================
-- CONCAT: build a full display label from separate columns.
-- Notice we add a literal string ' — ' between city and name.
-- ============================================================
SELECT
    CONCAT(
        TRIM(UPPER(city)),      -- 'NEW YORK'
        ' | ',                  -- literal separator
        TRIM(full_name)         -- 'alice johnson'
    ) AS customer_label
FROM   customers;

-- PostgreSQL / standard SQL alternative (same result):
-- SELECT TRIM(UPPER(city)) || ' | ' || TRIM(full_name) AS customer_label
-- FROM   customers;

-- ============================================================
-- REPLACE: the marketing team re-branded 'example.com' to
-- 'mycompany.com' — fix all the email addresses in one shot.
-- ============================================================
SELECT
    email,
    REPLACE(
        LOWER(email),           -- normalise case first so the match is reliable
        'example.com',          -- the old text to find
        'mycompany.com'         -- the new text to put in its place
    ) AS updated_email
FROM   customers;

-- ============================================================
-- REAL UPDATE using REPLACE (not just a SELECT preview):
-- Uncomment only when you're ready — always preview first!
-- ============================================================
-- UPDATE customers
-- SET    email = REPLACE(LOWER(email), 'example.com', 'mycompany.com')
-- WHERE  email LIKE '%example.com%';  -- safety net: only touch matching rows

-- ============================================================
-- SUBSTRING to extract area code from a stored phone number
-- Assumes format: '212-555-0100' — area code is always first 3 chars
-- ============================================================
SELECT
    '212-555-0100'            AS raw_phone,
    SUBSTRING('212-555-0100', 1, 3) AS area_code;  -- '212'
Output
-- Email domain extraction:
email | email_domain
--------------------|-------------
ALICE@EXAMPLE.COM | EXAMPLE.COM
bob@example.com | example.com
Carol@Example.com | Example.com
-- CONCAT customer label:
customer_label
--------------------------
NEW YORK | alice johnson
LOS ANGELES | BOB SMITH
CHICAGO | carol white
-- REPLACE email domain:
email | updated_email
--------------------|----------------------
ALICE@EXAMPLE.COM | alice@mycompany.com
bob@example.com | bob@mycompany.com
Carol@Example.com | carol@mycompany.com
-- Area code extraction:
raw_phone | area_code
-------------|----------
212-555-0100 | 212
Watch Out — SUBSTRING Positions Start at 1, Not 0:
Coming from Python or JavaScript? SUBSTRING in SQL counts from position 1, not 0. SUBSTRING('Hello', 1, 1) returns 'H'. If you write SUBSTRING('Hello', 0, 1) you'll get an empty string or unexpected result depending on the database. This bites almost every developer the first time.
Production Insight
SUBSTRING indexes are 0-based in some databases (SQL Server) and 1-based in others (PostgreSQL, MySQL) — verify for your database.
For extracting structured components (area code from phone number), REGEXP_REPLACE/REGEXP_SUBSTR is more robust than SUBSTRING with hardcoded positions.
For large text manipulation, do it in application code — pushing heavy string processing into SQL runs on the database server which is usually a shared resource.
Key Takeaway
SUBSTRING position is 1-based in PostgreSQL/MySQL, 0-based in SQL Server — test your database.
For structured text parsing (extracting domain from email), REGEXP_SUBSTR is more robust than SUBSTRING(col, POSITION('@' IN col) + 1).
Heavy string transformation belongs in application code, not SQL — databases are optimized for data retrieval, not string processing.

LIKE, PATINDEX, and CHARINDEX — Finding Needles Without Blowing Up Performance

You've got a column full of customer emails. Your junior asks: "How do I find all users from .edu domains?" First instinct: WHERE email LIKE '%.edu'. Works on 100 rows. On 10 million, that leading wildcard forces a full table scan every single time. Your query goes from 2ms to 30 seconds. Production cries.

LIKE with a leading % is a sequential scan bomb. Always push the wildcard to the end when you can: LIKE 'john@%'. If you need substring search across millions of rows, use PATINDEX or CHARINDEX inside a persisted computed column and index it. CHARINDEX('@company.com', email) > 0 is marginally better than LIKE for some optimizers, but still not SARGable. The real move: normalize, add a domain column, index it. String functions are runtime crutches — don't let them become your schema.

EmailDomainSearch.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
// io.thecodeforge — database tutorial

-- Bad: leading wildcard = index ignore
SELECT user_id, email
  FROM users
 WHERE email LIKE '%.edu';

-- Better: trailing wildcard can use index
SELECT user_id, email
  FROM users
 WHERE email LIKE 'john@%';

-- Best: extracted domain column + index
ALTER TABLE users
  ADD email_domain AS SUBSTRING(
    email, CHARINDEX('@', email) + 1, LEN(email)
  ) PERSISTED;

CREATE INDEX idx_users_email_domain
  ON users(email_domain);

-- Now this uses the index
SELECT user_id, email
  FROM users
 WHERE email_domain = 'stanford.edu';
Output
-- Index scan on idx_users_email_domain
-- user_id | email
-- 1 | alice@stanford.edu
-- 2 | bob@stanford.edu
-- (no table scan)
Production Trap:
LIKE with leading '%' is the #1 cause of accidental full scans in text search queries. Always check the execution plan — if you see 'Table Scan' and a wildcard at position 0, refactor.
Key Takeaway
Push wildcards to the end of LIKE patterns, or better yet, extract and index the substring you're searching for.

REPLACE and TRANSLATE — When a Sledgehammer Beats a Scalpel

You need to strip punctuation from a user's bio before inserting it into a search index. Junior reaches for nested REPLACE calls. Eight lines of spaghetti. One for each character: period, comma, exclamation. It's fragile and slow — every REPLACE scans the whole string. On a batch of 50,000 rows, that's 400,000 full scans. Your ETL pipe grinds to a halt.

TRANSLATE is the hammer you forgot you had. It maps characters in one step: TRANSLATE(bio, '.,!?;:', '') — single pass, zero nesting. If you're cleaning character sets (e.g. replacing smart quotes with straight ones), TRANSLATE handles a full mapping in one call. REPLACE is for swapping substrings, not scrubbing individual characters. Know the difference: REPLACE = substring surgery. TRANSLATE = character-level carpet bombing. Use the right tool or own the performance bill.

StripPunctuation.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
// io.thecodeforge — database tutorial

-- Junior: nested REPLACE — scans bio 8 times
SELECT
  REPLACE(
    REPLACE(
      REPLACE(
        REPLACE(bio, '.', ''),
        ',', ''
      ),
      '!', ''
    ),
    '?', ''
  ) AS clean_bio
FROM users;

-- Senior: TRANSLATE — single pass
SELECT
  TRANSLATE(bio, '.,!?;:', '') AS clean_bio
FROM users;

-- Real-world: replace smart quotes
SELECT
  TRANSLATE(
    'Hello “World” — foo',
    '“”‘’—',
    '""\'\'-'
  ) AS normalized_text;
Output
-- With bio = 'Hello, World! How are you?'
-- translate returns: 'Hello World How are you'
-- vs nested replace returns the same, but with 8x the I/O
Senior Shortcut:
TRANSLATE is not available in MySQL or PostgreSQL — use REGEXP_REPLACE instead. In SQL Server and Oracle, TRANSLATE is a performance cheat code.
Key Takeaway
Use TRANSLATE for single-character replacements (punctuation, quotes) — one scan beats eight nested REPLACE calls every time.

CONCAT_WS() — Stop Writing Ugly Concatenation Chains

You’ve been writing CONCAT with a dozen parameters and commas between every pair? That’s amateur hour. CONCAT_WS() is the production-grade version: first argument is your separator, then list as many columns as you want. NULLs get skipped silently — your report won’t crash because someone’s middle name is missing.

The separator is applied between every non-null value. No more checking IS NULL before every join. No more messy COALESCE hacks. If you’re building CSV exports or full-name fields, this is the function you should reach for every time. It’s shorter, faster, and handles edge cases without you writing defensive code.

Example.sqlSQL
1
2
3
4
5
6
7
// io.thecodeforge — database tutorial

SELECT 
    CONCAT_WS(' | ', first_name, middle_initial, last_name) AS full_name
FROM employees
WHERE department = 'Engineering'
LIMIT 3;
Output
full_name
----------------------
Alice | A | Chen
Bob | NULL | Kowalski --> middle_initial is NULL, skipped
Carlos | NULL | Mendez
Production Trap:
CONCAT() treats NULL as empty string if you wrap in COALESCE. CONCAT_WS() just skips nulls. That changes row counts in aggregations — watch out when building column lists for pivot tables.
Key Takeaway
CONCAT_WS() is the only concatenation function you need in production — it handles separators and nulls in one call.

LPAD() and RPAD() — Padding Is Not Just for Pretty Print

Padding looks like a UI problem. It’s not. When you need fixed-width exports for legacy systems, or you’re building invoice numbers with leading zeros, LPAD() and RPAD() are your only sane options. LEFT() + REPLICATE() is a fragile kludge — these functions exist exactly so you don’t write that garbage.

LPAD() adds characters to the left until the string hits your target length. RPAD() does the same on the right. Pass the column, the target width, and the pad character. If the original string is longer than the target, it truncates from the padded side. That means you need to check your data first — truncation is silent and invisible.

Example.sqlSQL
1
2
3
4
5
6
7
8
9
// io.thecodeforge — database tutorial

SELECT 
    order_id,
    LPAD(CAST(order_id AS CHAR), 8, '0') AS padded_id,
    RPAD(product_name, 20, '.') AS product_display
FROM orders
WHERE order_date >= '2024-01-01'
LIMIT 2;
Output
order_id | padded_id | product_display
-------- | --------- | --------------------
123 | 00000123 | Wireless Mouse........
4567 | 00004567 | Mechanical Keyboard...
Senior Shortcut:
Pad with spaces for fixed-width reports; pad with zeros for financial codes. Always test truncation behavior — if your target length is smaller than the longest value, you’ll lose data without an error.
Key Takeaway
LPAD/RPAD turn variable-length strings into fixed-width data without messy string arithmetic.

LOCATE() — The Find Function That Doesn’t Overcomplicate Things

LIKE is for pattern matching. PATINDEX is for regex fans. When you just need to know if a substring exists inside a string — and where it starts — use LOCATE(). It’s the SQL equivalent of strpos() in PHP or indexOf() in JavaScript. Three parameters: the substring to find, the source string, and an optional starting position.

Returns zero if not found. That’s it. No wildcards, no performance nightmares from leading wildcard LIKE patterns. LOCATE() is fully index-unfriendly when you search on a computed column, but if you’re searching a known column with a substring, it’s faster and cleaner than a LIKE with '%something%'. Use it for parsing CSV fields, checking for tags, or extracting substrings by position.

Example.sqlSQL
1
2
3
4
5
6
7
8
9
10
// io.thecodeforge — database tutorial

SELECT 
    email,
    LOCATE('@', email) AS at_position,
    LOCATE('.com', email) AS dotcom_position
FROM users
WHERE LOCATE('@', email) > 0
  AND LOCATE('.com', email) = 0
LIMIT 3;
Output
email | at_position | dotcom_position
-------------------- | ----------- | ---------------
user@example.net | 5 | 0
admin@company.org | 6 | 0
test@dev.io | 5 | 0
Production Trap:
LOCATE() with a leading wildcard (e.g., LOCATE('@', email)) cannot use indexes. For search-heavy workloads, store extracted domains in a separate indexed column or use full-text search.
Key Takeaway
LOCATE() is the simplest way to find a substring’s position — zero means not found, any positive integer is the character position.

REVERSE() — When Backward Text Holds Hidden Patterns

Most developers view REVERSE() as a party trick, but it solves real problems. Need to find palindromic product codes? REVERSE() + equality check filters them in one pass. Working with hierarchically delimited strings like file paths or SKUs? Reverse the string, extract the first segment, and reverse back — a two-step solution that avoids complex split logic. REVERSE() also cracks pattern detection in domain names, where subdomains must match certain suffixes. The function flips every character: REVERSE('abc') becomes 'cba'. It works on any string type and respects Unicode correctly in most modern databases. Performance is linear in string length, making it safe for millions of rows. Avoid REVERSE() inside WHERE clauses on indexed columns unless you use a computed column with a persisted index — because reversing destroys index order. Use it instead in SELECT or CTE stages after filtering.

ReverseDomain.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
// io.thecodeforge — database tutorial

WITH emails AS (
  SELECT 'user@sub.example.com' AS email
)
SELECT
  email,
  REVERSE(email) AS reversed,
  REVERSE(
    SUBSTRING_INDEX(REVERSE(email), '.', 1)
  ) AS top_level_domain
FROM emails;
Output
email | reversed | top_level_domain
---------------------------+----------------------------------+-----------------
user@sub.example.com | moc.elpmaxe.bus@resu | com
Production Trap:
REVERSE() bypasses indexes. Never use it in JOIN or WHERE on large tables unless you create a computed, persisted index on the reversed column.
Key Takeaway
REVERSE() turns string-end processing into string-start processing — always apply after filtering, never in join conditions.

REPEAT() — Generating Test Data Without Loops

REPEAT() replicates a string N times in a single expression. It's your fastest tool for generating padded output, fake data, or separator lines. The signature is simple: REPEAT(string, count). If count is zero, it returns an empty string. Negative count returns NULL. Most databases cap the result at a system maximum length (e.g., 65,535 bytes in MySQL) — hitting that silently truncates, not errors. Use REPEAT() in reporting to create dynamic progress bars: REPEAT('█', ROUND(percent 50)). For data anonymization, replace each character of a name with '': REPEAT('*', LENGTH(name)). It also builds repeating separators in CONCAT_WS chains without hardcoding. The performance is O(N) in the output length, so generating 100MB strings will hurt. Keep counts under 10,000 in production queries. For row-wise repetition, pair REPEAT() with a numbers table to multiply rows, not just characters.

GenerateStars.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
// io.thecodeforge — database tutorial

SELECT
  product_name,
  CONCAT(
    'Score: ',
    REPEAT('★', rating),
    REPEAT('☆', 5 - rating)
  ) AS star_rating
FROM (
  SELECT 'Widget A' AS product_name, 4 AS rating
  UNION ALL
  SELECT 'Widget B', 2
) scores;
Output
product_name | star_rating
-------------+--------------------
Widget A | Score: ★★★★☆
Widget B | Score: ★★☆☆☆
Production Trap:
REPEAT() with count over 100,000 can lock memory. Always validate count is bounded — use LEAST(count, 5000) for safety.
Key Takeaway
REPEAT() replaces cursor-based loops for generating repeated strings — but cap the count to avoid memory bloat.

FIND_IN_SET() — Slicing Comma-Separated Lists Without Pain

FIND_IN_SET() searches a comma-separated string for a needle and returns its 1-based position. It's purpose-built for columns that store multiple tags or categories in a single string — a pattern you inherit from legacy systems or ETL feeds. The syntax: FIND_IN_SET('needle', 'item1,item2,item3'). Returns 0 if not found. The needle must be a single string, not a comma. It does not support wildcards. Performance is O(N) in list length; for very long lists (100+ items), normalize to a junction table instead. The biggest hidden trap: FIND_IN_SET ignores spaces after commas — 'tag1, tag2' treats ' tag2' as the item, so 'tag2' will not match. Always clean your data with REPLACE(list, ' ', '') before using it. Combine with SUBSTRING_INDEX to extract the matching item itself. Avoid using FIND_IN_SET in JOIN conditions — it forces full table scans and prevents index usage. Use it only for read-only reports or one-time data cleanups.

FindCategory.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
// io.thecodeforge — database tutorial

WITH products AS (
  SELECT 'Laptop' AS product, 'electronics,computers,portable' AS tags
  UNION ALL
  SELECT 'Chair', 'furniture,office'
)
SELECT
  product,
  FIND_IN_SET('computers', tags) AS computer_position
FROM products
WHERE FIND_IN_SET('computers', tags) > 0;
Output
product | computer_position
--------+------------------
Laptop | 2
Production Trap:
FIND_IN_SET ignores whitespace. 'blue' will NOT match 'blue, red' because of the leading space before 'red'. Always trim your delimited values.
Key Takeaway
FIND_IN_SET() is a quick-but-dirty parser for comma lists — acceptable in reports, but refactor to normalized tables for OLTP queries.
● Production incidentPOST-MORTEMseverity: high

User Lookup Failed for 30% of Users Due to Trailing Spaces in Email Column

Symptom
30% of users were locked out after a system migration. Password reset emails reached them (sent to the stored email) but login lookups returned 'user not found'.
Assumption
The login query used WHERE email = $input. The team assumed emails in the database were clean.
Root cause
A legacy CSV import inserted emails with trailing spaces. The login form trimmed user input; the database values had spaces. 'alice@example.com ' != 'alice@example.com' — exact string comparison failed silently.
Fix
UPDATE users SET email = TRIM(email) to clean existing data. Added TRIM() in the login query as an interim fix. Added a CHECK constraint: CHECK (email = TRIM(email)) to prevent future imports from introducing spaces.
Key lesson
  • Always TRIM text columns at write time, not read time — prevent the dirty data from entering the database
  • Add CHECK constraints for format rules: CHECK (email = TRIM(email)) prevents leading/trailing spaces
  • When doing exact string comparisons in authentication, TRIM both sides defensively
Production debug guideDiagnosing silent comparison failures and encoding issues3 entries
Symptom · 01
String comparison returns no rows despite visually matching values
Fix
Check for invisible characters: SELECT LENGTH(col), TRIM(col), col FROM table WHERE id = x. If LENGTH != CHAR_LENGTH(TRIM(col)), there are leading/trailing spaces or multi-byte characters. Also check case: col = 'london' won't match 'London' without LOWER(col).
Symptom · 02
LIKE '%pattern%' is slow on a large table
Fix
LIKE with a leading wildcard forces a full table scan. For contains-search, create a full-text index: CREATE INDEX idx_fts ON products USING GIN(to_tsvector('english', description)) in PostgreSQL. Use MySQL FULLTEXT index with MATCH...AGAINST for large text search.
Symptom · 03
CONCAT returns NULL when some inputs are NULL
Fix
CONCAT in PostgreSQL and standard SQL returns NULL if any argument is NULL. Use COALESCE: COALESCE(first_name, '') || ' ' || COALESCE(last_name, ''). MySQL's CONCAT_WS(separator, ...) ignores NULLs automatically.
FunctionWhat It DoesMySQL SyntaxPostgreSQL SyntaxSQL Server Syntax
UppercaseConverts all letters to uppercaseUPPER(col)UPPER(col)UPPER(col)
LowercaseConverts all letters to lowercaseLOWER(col)LOWER(col)LOWER(col)
String lengthReturns character count of stringLENGTH(col)LENGTH(col)LEN(col)
Trim both sidesRemoves leading and trailing spacesTRIM(col)TRIM(col)TRIM(col)
Trim left onlyRemoves leading spaces onlyLTRIM(col)LTRIM(col)LTRIM(col)
Trim right onlyRemoves trailing spaces onlyRTRIM(col)RTRIM(col)RTRIM(col)
Extract substringPulls a slice of text by positionSUBSTRING(col, start, len)SUBSTRING(col, start, len)SUBSTRING(col, start, len)
Concatenate stringsJoins two or more strings togetherCONCAT(a, b)CONCAT(a, b) or a || bCONCAT(a, b) or a + b
Find positionReturns index of substring in stringINSTR(col, search)POSITION(search IN col)CHARINDEX(search, col)
Replace textSwaps all occurrences of a substringREPLACE(col, old, new)REPLACE(col, old, new)REPLACE(col, old, new)

Key takeaways

1
String functions never change the stored data
they return a transformed copy for that query only. To permanently change data you need an UPDATE statement.
2
Always normalise BOTH sides of a string comparison
WHERE LOWER(city) = 'london' works, but WHERE city = 'london' silently fails when the stored value is 'London' or 'LONDON'.
3
SQL Server uses LEN() and CHARINDEX() where MySQL/PostgreSQL use LENGTH() and INSTR()/POSITION()
the concept is identical, the spelling is different. Check your dialect before debugging.
4
Chain functions from the inside out
UPPER(TRIM(full_name)) trims first, then uppercases. Read nested functions like onion layers — innermost runs first.

Common mistakes to avoid

3 patterns
×

Comparing strings without normalizing case

Symptom
WHERE city = 'london' returns zero rows when the data contains 'London' or 'LONDON' — exact string comparison is case-sensitive in PostgreSQL and SQL Server
Fix
Use LOWER() on both sides: WHERE LOWER(city) = 'london'. For repeated case-insensitive searches, create a functional index: CREATE INDEX idx_lower_city ON addresses (LOWER(city)). Store the data normalized (always lowercase) at write time to avoid the overhead entirely.
×

Using LIKE '%pattern%' for search on large tables

Symptom
Search query is slow — EXPLAIN shows Seq Scan on the full table regardless of existing indexes on the searched column. Performance degrades linearly with table size.
Fix
Full-text search: PostgreSQL: CREATE INDEX idx_fts ON table USING GIN(to_tsvector('english', col)); query with WHERE to_tsvector('english', col) @@ plainto_tsquery('pattern'). MySQL: CREATE FULLTEXT INDEX; query with MATCH(col) AGAINST ('pattern'). LIKE 'prefix%' (no leading wildcard) can still use a regular B-tree index.
×

Concatenating strings without handling NULLs

Symptom
CONCAT(first_name, ' ', last_name) returns NULL for any user where first_name or last_name is NULL — the full name column shows NULL instead of the partial name
Fix
COALESCE every nullable component: COALESCE(first_name, '') || ' ' || COALESCE(last_name, ''). Or use CONCAT_WS(' ', first_name, last_name) in MySQL which skips NULL arguments automatically.
INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01SENIOR
What is the difference between CHAR_LENGTH and LENGTH in SQL?
Q02SENIOR
Why does LIKE '%pattern%' not use an index, and how would you optimize a...
Q03JUNIOR
How do you safely concatenate a full name from first_name and last_name ...
Q01 of 03SENIOR

What is the difference between CHAR_LENGTH and LENGTH in SQL?

ANSWER
LENGTH in MySQL returns the number of bytes, not characters. For ASCII strings these are equal. For multi-byte UTF-8 characters (Chinese, Arabic, emoji), LENGTH may return a higher number because each character uses 2-4 bytes. CHAR_LENGTH always returns the number of characters regardless of encoding. In PostgreSQL, LENGTH returns characters by default. Example: a string containing a single emoji (4 bytes in UTF-8) — MySQL LENGTH returns 4, CHAR_LENGTH returns 1. For user-facing character counting (form validation, display truncation), always use CHAR_LENGTH.
FAQ · 3 QUESTIONS

Frequently Asked Questions

01
What is the difference between CHAR_LENGTH and LENGTH in SQL?
02
Can I use SQL string functions inside a WHERE clause?
03
Why does CONCAT return NULL when one of my columns is NULL?
N
Naren Founder & Principal Engineer

20+ years shipping high-throughput database systems. Drawn from code that ran under real load.

Follow
Verified
production tested
May 24, 2026
last updated
1,554
articles · all by Naren
🔥

That's SQL Basics. Mark it forged?

11 min read · try the examples if you haven't

Previous
SQL Date and Time Functions
15 / 16 · SQL Basics
Next
SQL UPDATE Statement: Syntax, Examples and Best Practices