Database Beginner

SQL String Functions Explained — UPPER, TRIM, SUBSTRING and More

Q: What is the difference between CHAR_LENGTH and LENGTH in SQL?

In MySQL, LENGTH() counts bytes not characters — so a single emoji or accented character (which uses multiple bytes in UTF-8) can count as 2, 3 or 4. CHAR_LENGTH() counts actual characters regardless of byte size. For modern text data with international characters always prefer CHAR_LENGTH() in MySQL. PostgreSQL's LENGTH() counts characters directly so this distinction doesn't apply there.

Q: Can I use SQL string functions inside a WHERE clause?

Yes, absolutely. WHERE LOWER(email) = 'alice@example.com' is perfectly valid. The function runs on each row as the database filters. The trade-off is performance: wrapping a column in a function usually prevents the database from using an index on that column. For a large table, consider storing a pre-normalised version of the column or using a function-based index if your database supports it.

Q: Why does CONCAT return NULL when one of my columns is NULL?

In standard SQL and most databases, any string operation involving NULL returns NULL — because NULL means 'unknown', and 'Alice' joined to 'unknown' is still 'unknown'. Fix this with COALESCE: CONCAT(first_name, ' ', COALESCE(last_name, '')) replaces a NULL last_name with an empty string so the full name still returns correctly instead of vanishing.

📅 March 2026 ⏱ 8 min read 🎯 Beginner

In Plain English 🔥

Imagine you work at a library and someone hands you a messy pile of book titles — some in ALL CAPS, some with random spaces at the start, some misspelled. SQL string functions are the tools you use to clean up, reshape, and search through that text, all without touching the original paper. They let your database do the tedious text-editing work so you never have to open a spreadsheet and do it by hand.

⚡ Quick Answer

Every real-world database is full of text — customer names, email addresses, product descriptions, phone numbers. The problem is that text is messy. Users type their names in all caps, leave trailing spaces, or enter 'New York' when your system expects 'new york'. If you can't clean and manipulate that text inside the database itself, you end up writing hundreds of lines of application code to do a job SQL can handle in a single line.

SQL string functions solve exactly that problem. They're built-in tools the database gives you to inspect, clean, transform, and combine text columns on the fly — right inside your queries. Instead of pulling all the data into Python or JavaScript and then fixing it, you fix it where it lives.

By the end of this article you'll know the seven most important SQL string functions, exactly when to reach for each one, how to chain them together for real cleaning tasks, and the two traps that catch almost every beginner. You'll be able to look at a messy data problem and immediately know which function to use.

The Seven String Functions You'll Use 90% of the Time

SQL has dozens of string functions, but most professional work comes down to seven. Think of them like tools in a toolbox — a hammer, a screwdriver, a tape measure. You don't need every tool on day one; you need to know the core ones cold.

Here's a quick map before we go deep:

— UPPER / LOWER: change the case of text, like Caps Lock on a keyboard. — LENGTH: count how many characters are in a string, like counting letters in a word. — TRIM / LTRIM / RTRIM: remove unwanted spaces from the edges of text, like cutting the crusts off bread. — SUBSTRING: pull out a chunk of text from the middle of a string, like cutting a slice from a loaf. — CONCAT: glue two or more pieces of text together, like taping two pieces of paper side by side. — REPLACE: swap one piece of text for another, like using find-and-replace in Word. — INSTR / CHARINDEX / POSITION: find where a smaller piece of text lives inside a bigger one.

Different databases (MySQL, PostgreSQL, SQL Server) spell a few of these slightly differently — we'll flag that as we go. The concepts are identical everywhere.

string_functions_overview.sql · SQL

12345678910111213141516171819202122

-- ============================================================
-- Setup: create a small customers table with intentionally
-- messy data so we can demonstrate each function clearly.
-- ============================================================

CREATE TABLE customers (
    customer_id   INT,
    full_name     VARCHAR(100),
    email         VARCHAR(150),
    city          VARCHAR(50)
);

INSERT INTO customers VALUES
    (1, '  alice johnson  ', 'ALICE@EXAMPLE.COM', 'new york'),
    (2, 'BOB SMITH',         'bob@example.com',   'Los Angeles'),
    (3, 'carol white  ',     'Carol@Example.com', 'CHICAGO');

-- -------------------------------------------------------
-- Quick preview: what does the raw data look like?
-- -------------------------------------------------------
SELECT customer_id, full_name, email, city
FROM   customers;

▶ Output

customer_id | full_name | email | city
------------|--------------------|---------------------|------------
1 | alice johnson | ALICE@EXAMPLE.COM | new york
2 | BOB SMITH | bob@example.com | Los Angeles
3 | carol white | Carol@Example.com | CHICAGO

🔥

Database Dialect Note:All examples in this article use standard SQL that runs in MySQL 8+, PostgreSQL 14+, and SQL Server 2019+. Where syntax differs between databases we'll call it out explicitly with a short note.

UPPER, LOWER and LENGTH — Your Case and Size Tools

Case inconsistency is one of the most common data quality problems in real databases. One record says 'London', another says 'LONDON', a third says 'london'. To your database those are three completely different values — so a GROUP BY city query would produce three separate buckets instead of one. That's a real bug.

UPPER(text) converts every letter in a string to uppercase. LOWER(text) converts every letter to lowercase. Neither modifies the original stored data — they return a new version of the text just for that query result. Think of it like wearing tinted glasses: the world looks different, but the world hasn't changed.

LENGTH(text) tells you how many characters are in a string, including spaces. This is useful for data validation — if a phone number column has a length other than 10 digits, something went wrong during data entry. Note: in SQL Server, the equivalent function is called LEN() not LENGTH() — a classic gotcha we'll revisit.

The real power comes when you combine these with a WHERE clause. Instead of hoping users typed city names consistently, you normalise both sides of the comparison so they always match regardless of how the data was entered.

upper_lower_length.sql · SQL

123456789101112131415161718192021222324252627282930313233343536

-- ============================================================
-- UPPER: standardise city names to uppercase for safe grouping
-- ============================================================
SELECT
    UPPER(city)          AS city_uppercase,   -- 'new york' → 'NEW YORK'
    COUNT(*)             AS customer_count
FROM   customers
GROUP BY UPPER(city);                         -- groups correctly despite mixed case

-- ============================================================
-- LOWER: normalise email addresses (emails are case-insensitive
-- but databases treat them as case-sensitive strings)
-- ============================================================
SELECT
    customer_id,
    LOWER(email)  AS normalised_email  -- 'ALICE@EXAMPLE.COM' → 'alice@example.com'
FROM   customers;

-- ============================================================
-- LENGTH: find any city names that seem suspiciously short
-- (useful for catching typos or blank entries)
-- ============================================================
SELECT
    customer_id,
    city,
    LENGTH(city)  AS city_length   -- counts every character including spaces
FROM   customers
WHERE  LENGTH(city) < 5;           -- flag anything under 5 characters as suspect

-- ============================================================
-- PRACTICAL COMBO: case-insensitive search
-- Find all customers in New York regardless of how it was typed
-- ============================================================
SELECT *
FROM   customers
WHERE  LOWER(city) = 'new york';   -- normalise BOTH sides to lowercase

▶ Output

-- UPPER + GROUP BY result:
city_uppercase | customer_count
---------------|---------------
NEW YORK | 1
LOS ANGELES | 1
CHICAGO | 1

-- LOWER email result:
customer_id | normalised_email
------------|------------------
1 | alice@example.com
2 | bob@example.com
3 | carol@example.com

-- LENGTH < 5 result:
(0 rows — all city names in this dataset are 7+ characters)

-- Case-insensitive search result:
customer_id | full_name | email | city
------------|------------------|-------------------|----------
1 | alice johnson | ALICE@EXAMPLE.COM | new york

⚠️

Watch Out — SQL Server Uses LEN() Not LENGTH():In SQL Server, LENGTH() does not exist. Use LEN() instead. Also note that LEN() does NOT count trailing spaces, while MySQL's LENGTH() does. If you're counting spaces intentionally, use DATALENGTH() in SQL Server.

TRIM, LTRIM and RTRIM — Killing the Invisible Spaces

Spaces you can't see are some of the sneakiest bugs in data work. When a user fills in a web form and accidentally hits the spacebar before their name, that space gets stored in the database. Now ' Alice' and 'Alice' are two different strings. Searches fail. Joins don't match. Reports double-count.

TRIM(text) removes spaces from both the left and right sides of a string. LTRIM only removes spaces on the left (the leading side). RTRIM only removes spaces on the right (the trailing side). Think of it like cutting the bread crusts — you can cut both ends, just the left, or just the right.

This comes up constantly in data imports. When you load data from a CSV file or an external API, strings almost always arrive with extra whitespace. Running TRIM before you process the data is like shaking the crumbs off a tablecloth before you sit down — a small step that prevents a lot of mess.

Modern SQL (SQL:2003 standard) lets you TRIM other characters too, not just spaces — for example trimming a dollar sign off a price string. We'll show that in the code below. PostgreSQL and SQL Server both support this; MySQL added it in version 8.

trim_functions.sql · SQL

123456789101112131415161718192021222324252627282930313233343536373839

-- ============================================================
-- Basic TRIM: remove spaces from both sides of full_name
-- ============================================================
SELECT
    customer_id,
    full_name,                      -- original, messy version
    TRIM(full_name) AS clean_name,  -- spaces removed from both sides
    LENGTH(full_name)         AS original_length,
    LENGTH(TRIM(full_name))   AS trimmed_length  -- shows how many spaces were hiding
FROM   customers;

-- ============================================================
-- LTRIM vs RTRIM: when you only want to cut one side
-- Useful if you're processing fixed-width data exports where
-- trailing spaces are padding but leading spaces mean something.
-- ============================================================
SELECT
    '  hello world  '           AS original,
    LTRIM('  hello world  ')    AS left_trimmed,   -- 'hello world  '
    RTRIM('  hello world  ')    AS right_trimmed,  -- '  hello world'
    TRIM('  hello world  ')     AS fully_trimmed;  -- 'hello world'

-- ============================================================
-- TRIM a specific character (not just spaces)
-- Imagine prices were imported with a leading dollar sign
-- ============================================================
SELECT
    TRIM(LEADING '$' FROM '$49.99')   AS price_no_prefix,   -- '49.99'
    TRIM(TRAILING '!' FROM 'Sale!')   AS message_no_bang,   -- 'Sale'
    TRIM(BOTH   '*' FROM '***HOT***') AS product_clean;      -- 'HOT'

-- ============================================================
-- PRACTICAL USE: clean names before inserting into a report
-- Combine TRIM + UPPER to fully normalise a string
-- ============================================================
SELECT
    customer_id,
    UPPER(TRIM(full_name)) AS display_name  -- '  alice johnson  ' → 'ALICE JOHNSON'
FROM   customers;

▶ Output

-- TRIM with length comparison:
customer_id | full_name | clean_name | original_length | trimmed_length
------------|------------------|---------------|-----------------|---------------
1 | alice johnson | alice johnson | 16 | 13
2 | BOB SMITH | BOB SMITH | 9 | 9
3 | carol white | carol white | 12 | 11

-- LTRIM / RTRIM demo:
original | left_trimmed | right_trimmed | fully_trimmed
-----------------|-----------------|-----------------|---------------
hello world | hello world | hello world | hello world

-- TRIM specific characters:
price_no_prefix | message_no_bang | product_clean
----------------|-----------------|---------------
49.99 | Sale | HOT

-- UPPER + TRIM combo:
customer_id | display_name
------------|---------------
1 | ALICE JOHNSON
2 | BOB SMITH
3 | CAROL WHITE

⚠️

Pro Tip — Trim on INSERT, Not Just on SELECT:Instead of trimming on every SELECT query (which runs the function millions of times on a large table), use a CHECK constraint or a BEFORE INSERT trigger to strip whitespace when data first enters the database. Clean at the gate, not at every door.

SUBSTRING, CONCAT and REPLACE — Slicing, Joining and Swapping Text

These three functions let you reshape text rather than just clean it. SUBSTRING pulls a piece out of the middle of a string. CONCAT sticks strings together. REPLACE swaps one piece of text for another. Together they cover almost every text-transformation task you'll face.

SUBSTRING(text, start, length) takes three arguments: the text to work on, the position to start cutting, and how many characters to take. Positions start at 1, not 0 — that trips up programmers coming from JavaScript or Python. So SUBSTRING('London', 1, 3) gives you 'Lon', not 'ond'.

CONCAT(text1, text2, ...) glues strings together. This is useful for building full names from first and last name columns, constructing URLs from parts, or building human-readable labels from codes. The || operator does the same job in PostgreSQL and standard SQL — but MySQL requires the CONCAT function.

REPLACE(text, old_value, new_value) finds every occurrence of old_value inside text and swaps it for new_value. This is powerful for data cleaning — swapping phone number formats, removing unwanted characters, or fixing a batch of mis-entered values all in one UPDATE statement.

substring_concat_replace.sql · SQL

12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061

-- ============================================================
-- SUBSTRING: extract just the domain from an email address.
-- We know the '@' is always there — find where it is, then
-- grab everything after it.
--
-- INSTR(string, search) returns the position of the first
-- match. We add 1 to skip past the '@' itself.
-- ============================================================
SELECT
    email,
    SUBSTRING(
        email,
        INSTR(email, '@') + 1,          -- start one char AFTER the '@'
        LENGTH(email)                    -- grab the rest (generous upper bound)
    ) AS email_domain                   -- 'alice@example.com' → 'example.com'
FROM   customers;

-- ============================================================
-- CONCAT: build a full display label from separate columns.
-- Notice we add a literal string ' — ' between city and name.
-- ============================================================
SELECT
    CONCAT(
        TRIM(UPPER(city)),      -- 'NEW YORK'
        ' | ',                  -- literal separator
        TRIM(full_name)         -- 'alice johnson'
    ) AS customer_label
FROM   customers;

-- PostgreSQL / standard SQL alternative (same result):
-- SELECT TRIM(UPPER(city)) || ' | ' || TRIM(full_name) AS customer_label
-- FROM   customers;

-- ============================================================
-- REPLACE: the marketing team re-branded 'example.com' to
-- 'mycompany.com' — fix all the email addresses in one shot.
-- ============================================================
SELECT
    email,
    REPLACE(
        LOWER(email),           -- normalise case first so the match is reliable
        'example.com',          -- the old text to find
        'mycompany.com'         -- the new text to put in its place
    ) AS updated_email
FROM   customers;

-- ============================================================
-- REAL UPDATE using REPLACE (not just a SELECT preview):
-- Uncomment only when you're ready — always preview first!
-- ============================================================
-- UPDATE customers
-- SET    email = REPLACE(LOWER(email), 'example.com', 'mycompany.com')
-- WHERE  email LIKE '%example.com%';  -- safety net: only touch matching rows

-- ============================================================
-- SUBSTRING to extract area code from a stored phone number
-- Assumes format: '212-555-0100' — area code is always first 3 chars
-- ============================================================
SELECT
    '212-555-0100'            AS raw_phone,
    SUBSTRING('212-555-0100', 1, 3) AS area_code;  -- '212'

▶ Output

-- Email domain extraction:
email | email_domain
--------------------|-------------
ALICE@EXAMPLE.COM | EXAMPLE.COM
bob@example.com | example.com
Carol@Example.com | Example.com

-- CONCAT customer label:
customer_label
--------------------------
NEW YORK | alice johnson
LOS ANGELES | BOB SMITH
CHICAGO | carol white

-- REPLACE email domain:
email | updated_email
--------------------|----------------------
ALICE@EXAMPLE.COM | alice@mycompany.com
bob@example.com | bob@mycompany.com
Carol@Example.com | carol@mycompany.com

-- Area code extraction:
raw_phone | area_code
-------------|----------
212-555-0100 | 212

⚠️

Watch Out — SUBSTRING Positions Start at 1, Not 0:Coming from Python or JavaScript? SUBSTRING in SQL counts from position 1, not 0. SUBSTRING('Hello', 1, 1) returns 'H'. If you write SUBSTRING('Hello', 0, 1) you'll get an empty string or unexpected result depending on the database. This bites almost every developer the first time.

Function	What It Does	MySQL Syntax	PostgreSQL Syntax	SQL Server Syntax
Uppercase	Converts all letters to uppercase	UPPER(col)	UPPER(col)	UPPER(col)
Lowercase	Converts all letters to lowercase	LOWER(col)	LOWER(col)	LOWER(col)
String length	Returns character count of string	LENGTH(col)	LENGTH(col)	LEN(col)
Trim both sides	Removes leading and trailing spaces	TRIM(col)	TRIM(col)	TRIM(col)
Trim left only	Removes leading spaces only	LTRIM(col)	LTRIM(col)	LTRIM(col)
Trim right only	Removes trailing spaces only	RTRIM(col)	RTRIM(col)	RTRIM(col)
Extract substring	Pulls a slice of text by position	SUBSTRING(col, start, len)	SUBSTRING(col, start, len)	SUBSTRING(col, start, len)
Concatenate strings	Joins two or more strings together	CONCAT(a, b)	CONCAT(a, b) or a \|\| b	CONCAT(a, b) or a + b
Find position	Returns index of substring in string	INSTR(col, search)	POSITION(search IN col)	CHARINDEX(search, col)
Replace text	Swaps all occurrences of a substring	REPLACE(col, old, new)	REPLACE(col, old, new)	REPLACE(col, old, new)

🎯 Key Takeaways

String functions never change the stored data — they return a transformed copy for that query only. To permanently change data you need an UPDATE statement.
Always normalise BOTH sides of a string comparison — WHERE LOWER(city) = 'london' works, but WHERE city = 'london' silently fails when the stored value is 'London' or 'LONDON'.
SQL Server uses LEN() and CHARINDEX() where MySQL/PostgreSQL use LENGTH() and INSTR()/POSITION() — the concept is identical, the spelling is different. Check your dialect before debugging.
Chain functions from the inside out: UPPER(TRIM(full_name)) trims first, then uppercases. Read nested functions like onion layers — innermost runs first.

⚠ Common Mistakes to Avoid

✕Mistake 1: Comparing strings without normalising case — symptom: WHERE city = 'london' returns no results even though 'London' is in the table — fix: always normalise both sides: WHERE LOWER(city) = LOWER('london') or WHERE LOWER(city) = 'london'.
✕Mistake 2: Using LENGTH() in SQL Server — symptom: 'Invalid object name' or 'function not found' error — fix: SQL Server uses LEN() not LENGTH(). Write a wrapper view or just remember the dialect difference. Also note LEN() silently ignores trailing spaces, so if you need to count those use DATALENGTH() instead.
✕Mistake 3: Running REPLACE in an UPDATE without a WHERE clause first — symptom: you accidentally update every row in the table including ones you didn't intend to touch — fix: always run the equivalent SELECT with the same REPLACE logic first to preview affected rows, then add WHERE col LIKE '%old_value%' to your UPDATE as a safety guard before executing.

Interview Questions on This Topic

QWhat's the difference between TRIM, LTRIM and RTRIM, and when would you use each one specifically?
QIf a customer name column has inconsistent casing — some rows in all caps, some in all lowercase — how would you write a query to group them correctly without modifying the underlying data?
QA candidate once said 'SUBSTRING positions start at zero just like arrays in programming' — what's wrong with that statement, and what would happen to the output?

Frequently Asked Questions

What is the difference between CHAR_LENGTH and LENGTH in SQL?

In MySQL, LENGTH() counts bytes not characters — so a single emoji or accented character (which uses multiple bytes in UTF-8) can count as 2, 3 or 4. CHAR_LENGTH() counts actual characters regardless of byte size. For modern text data with international characters always prefer CHAR_LENGTH() in MySQL. PostgreSQL's LENGTH() counts characters directly so this distinction doesn't apply there.

Can I use SQL string functions inside a WHERE clause?

Yes, absolutely. WHERE LOWER(email) = 'alice@example.com' is perfectly valid. The function runs on each row as the database filters. The trade-off is performance: wrapping a column in a function usually prevents the database from using an index on that column. For a large table, consider storing a pre-normalised version of the column or using a function-based index if your database supports it.

Why does CONCAT return NULL when one of my columns is NULL?

In standard SQL and most databases, any string operation involving NULL returns NULL — because NULL means 'unknown', and 'Alice' joined to 'unknown' is still 'unknown'. Fix this with COALESCE: CONCAT(first_name, ' ', COALESCE(last_name, '')) replaces a NULL last_name with an empty string so the full name still returns correctly instead of vanishing.

🔥

TheCodeForge Editorial Team Verified Author

Written and reviewed by senior developers with real-world experience across enterprise, startup and open-source projects. Every article on TheCodeForge is written to be clear, accurate and genuinely useful — not just SEO filler.

About Our Team Editorial Standards

Forged with 🔥 at TheCodeForge.io — Where Developers Are Forged