Advanced 11 min · March 05, 2026

Full-Text Search in SQL: Stopwords Hide 'ink' Products

Q: What is Full-Text Search in SQL in simple terms?

Full-Text Search is a database feature that lets you search large text columns for words or phrases quickly. Instead of scanning every row (like LIKE '%term%'), it builds an inverted index that maps each word to the rows containing it. This makes searches in millions of rows complete in milliseconds.

Q: Can I use full-text search with a partial match like 'prefix*'?

Yes, in Boolean mode you can use the asterisk wildcard at the end of a word (e.g., 'run*' matches 'running', 'runner', 'runs'). Leading wildcards are not supported in most databases. For prefix search, use the * operator in MySQL's BOOLEAN MODE or the :* operator in PostgreSQL's tsquery.

Q: Does full-text search support fuzzy matching (typo tolerance)?

Not natively in MySQL or PostgreSQL. You need trigram indexes (pg_trgm in PostgreSQL) or a dedicated search engine like Elasticsearch for fuzzy matching. Some workarounds exist using Levenshtein distance, but they don't scale to large tables. For typo tolerance, consider adding Elasticsearch as a specialised search layer.

Q: How does full-text search handle multi-language content?

MySQL supports per-column language specification (CHARACTER SET and COLLATION), though stemming varies. PostgreSQL allows creating custom text search configurations per language and even per query. For mixed-language tables, consider separate language-specific columns or use a single configuration that handles multiple languages poorly — better to use an external search engine with per-language analysers.

Q: Is full-text search better than Elasticsearch?

It depends. For simple text search on a single database table with moderate traffic (under 100GB, 10M rows), SQL FTS is simpler and requires no extra infrastructure. For advanced features like fuzzy matching, faceted search, multi-language support, and distributed scaling, Elasticsearch is better. FTS is a feature; Elasticsearch is a search platform. Use both: SQL FTS for exact keyword search, Elasticsearch for full-text catalogue search.

Q: How do I monitor full-text index health in production?

Set up a scheduled job (hourly/daily) that: - Compares row count with distinct doc_ids in FTS index metadata. Alert if difference >5%. - Checks index size growth. Alert if growth exceeds 20% per week without data growth. - Runs a query for a known term that should always return results. Alert if count=0. - Monitors slow queries: log any MATCH/AGAINST query taking >2 seconds. - For MySQL, monitor `INNODB_FTS_INDEX_CACHE` size vs `INNODB_FTS_INDEX_TABLE` to detect cache flushes.

Q: Can I use full-text search on a database view?

Yes, but only if the view's underlying table columns have a full-text index and the view is not materialised (regular view). Materialised views can have their own full-text indexes if the DBMS supports it (PostgreSQL allows indexing on materialised views). In MySQL, you cannot create an FTS index on a view directly; create it on the base table and query via the view.

Q: How does FTS handle NULL values?

NULL values are not indexed. If a column contains NULL, it is ignored. To include default text for NULLs, use COALESCE in the index definition or in queries. Example: `ALTER TABLE t ADD FULLTEXT(COALESCE(description, ''))` is not valid — you need a generated column. PostgreSQL: `to_tsvector('english', coalesce(description, ''))`.

Q: Does FTS support scoring for my own custom fields?

Yes, you can combine the MATCH/AGAINST score with other numeric columns (like popularity or recency) in the ORDER BY clause. In PostgreSQL, you can use the ts_rank function with setweight weights. In MySQL, you can add a weight multiplier in the ORDER BY: `ORDER BY MATCH(title) AGAINST('query') * 2 + MATCH(body) AGAINST('query') * 1 DESC`.

Stopwords hid 'ink' from FTS results; LIKE found all products.

Naren Founder & Principal Engineer

20+ years shipping high-throughput database systems. Drawn from code that ran under real load.

✓ Production

production tested

July 18, 2026

last updated

2,466

articles · all by Naren

Before you start⏱ 30 min

✓Deep production experience
✓Understanding of internals and trade-offs
✓Experience debugging complex systems

● Production Incident 🔎 Debug Guide ⚙ Triage Commands

⚡Quick Answer

Full-text search (FTS) uses an inverted index mapping tokens to row IDs. LIKE '%term%' forces full table scan — 1M rows = 10 seconds. FTS = 100ms.
MySQL FTS uses InnoDB auxiliary tables; rebuild requires OPTIMIZE TABLE (table lock). PostgreSQL uses GIN indexes supports CONCURRENTLY (zero downtime).
Ranking: MySQL uses TF-IDF (no length normalisation); PostgreSQL uses BM25 (normalises by document length — fairer for mixed-length content).
Stopwords (the, a, ink, set) are filtered by default. If 'ink' is a product name, your customers see zero results. Always review stopword list for your domain.
Stemming reduces 'running' → 'run'. Aggressive stemmers conflate unrelated words (marketing → market). Risk: false positives.
Production trap: index staleness after bulk insert. FTS index doesn't auto-update immediately for all engines. Run CHECK TABLE or REINDEX after batch jobs.
Biggest mistake: using FTS for exact identifiers (ID-123). Tokeniser splits on hyphens, ignores short tokens. Use B-Tree for IDs.

✦ Definition~90s read

What is Full-Text Search in SQL?

Full-text search in SQL is a specialized indexing and querying mechanism designed for efficient searching of natural language text within database columns. Unlike LIKE '%term%' patterns, which force full table scans and can't rank results, full-text search uses an inverted index—a data structure that maps every distinct word (token) to the list of rows containing it.

★

Imagine you walk into a giant library with a million books and ask the librarian for every book that mentions 'dragons'.

This enables sub-millisecond lookups on millions of rows, relevance scoring via TF-IDF or BM25 algorithms, and linguistic features like stemming (finding 'run' from 'running') and stopword filtering (ignoring 'the', 'and'). It exists because naive pattern matching breaks at scale and can't answer 'which results are most relevant?'—a requirement for search bars, product catalogs, and document retrieval in applications like e-commerce or content management systems.

In the SQL ecosystem, full-text search sits between basic LIKE queries and external search engines like Elasticsearch or Meilisearch. It's the right choice when you need search capabilities but want to avoid operational complexity—no separate service to deploy, no data synchronization, and transactional consistency with your existing database.

However, it's not a replacement for dedicated search engines when you need faceted navigation, typo tolerance, or real-time indexing at web scale (think Google or Amazon). MySQL's implementation (InnoDB full-text) and PostgreSQL's tsvector/tsquery differ significantly: PostgreSQL offers more granular control over dictionaries, ranking functions, and custom text configurations, while MySQL's is simpler but less flexible.

Both support stopword lists, but their default lists and stemming algorithms vary—a critical detail when 'ink' disappears from search results because it's mistakenly treated as a stopword in a language configuration.

Plain-English First

Imagine you walk into a giant library with a million books and ask the librarian for every book that mentions 'dragons'. A LIKE search is like that librarian reading every single page of every book from cover to cover. Full-text search is like the librarian pulling out a pre-built card catalogue that maps every word directly to which books contain it — the answer comes back in seconds, not hours. That card catalogue is the full-text index, and building it smartly is exactly what this article is about.

⚙ Browser compatibility

Latest versions — ✓ supported

Chrome	Firefox	Safari	Edge
✓	✓	✓	✓

Most production applications reach a point where LIKE '%search_term%' stops cutting it. You've got a product catalogue with 500,000 rows, a blog with a decade of articles, or a support ticket system with millions of entries — and users expect Google-quality search that returns relevant results in under 100 milliseconds. LIKE with a leading wildcard can't use a B-Tree index, so the database performs a full sequential scan every single time. At scale, that's a query that grinds your entire application to a halt.

Full-text search (FTS) solves this with an entirely different data structure — an inverted index — that maps individual tokens (words) back to the rows that contain them. But it does far more than just fast lookups. It understands language: it knows that 'running', 'ran', and 'runs' are forms of the same root word. It knows that 'the', 'a', and 'is' are so common they're noise. It can rank results by relevance so that the most useful rows float to the top. These capabilities live inside your existing SQL database, no Elasticsearch cluster required.

By the end of this article you'll understand exactly how an inverted index is built and queried, how to set up and tune full-text search in both MySQL and PostgreSQL, how relevance ranking actually works under the hood, when FTS is the right tool versus when you should reach for a dedicated search engine, and every production gotcha that bites teams who ship FTS without reading the fine print.

Why Full-Text Search in SQL Is Not Just LIKE with a Fancy Index

Full-text search in SQL is a specialized indexing and querying mechanism that tokenizes text into words (terms), stores them in an inverted index, and supports linguistic operations like stemming, ranking, and stopword filtering. Unlike a B-tree index used with LIKE '%pattern%', which scans every row and cannot use an index efficiently, a full-text index maps each term to its document locations, enabling sub-second searches over millions of rows.

At its core, full-text search builds an inverted index: for each unique word, it stores a list of document IDs and positions. Queries like CONTAINS or FREETEXT (T-SQL) or MATCH ... AGAINST (MySQL) leverage this index to find documents containing specific words or phrases, rank results by relevance using TF-IDF or BM25, and apply language-specific rules such as stemming (e.g., 'running' matches 'run') and stopword removal (common words like 'the' are ignored). This is fundamentally different from pattern matching — it's a set-based retrieval, not a scan.

Use full-text search when your application needs to search natural language text — product descriptions, articles, emails — at scale. It's the right tool when you need relevance ranking, linguistic variants, or fast search over large text columns. But it's not a replacement for a dedicated search engine like Elasticsearch when you need faceted search, fuzzy matching, or real-time indexing at massive scale. In SQL databases, full-text search fills the gap between simple LIKE queries and external search infrastructure.

⚠ Stopwords Can Silently Exclude Results

Default stopword lists often include short, common words like 'ink' — meaning a search for 'ink products' may return zero results if 'ink' is treated as noise.

📊 Production Insight

A product catalog search for 'ink cartridges' returned zero results because the default English stopword list included 'ink' as a noise word.

The symptom: users see 'no results' for queries containing short, common product terms, while other queries work fine.

Rule of thumb: always review and customize stopword lists per domain — never trust defaults for business-critical search.

🎯 Key Takeaway

Full-text search uses an inverted index, not a B-tree — it's designed for word-level retrieval, not pattern matching.

Stopwords and stemming are language-specific and can silently exclude valid results if not configured for your domain.

Full-text search is not a general-purpose search engine — use it for ranked text search within a single database, not for distributed, real-time search at scale.

thecodeforge.io

Full Text Search Sql

How the Inverted Index Works Internally

An inverted index is the data structure behind all full-text search engines — SQL's built-in FTS included. It maps every distinct word (token) to the list of rows where that word appears, along with its position and frequency inside each document.

When you insert or update a row, the database tokenizes the text, discards stopwords, applies stemming, and builds (or appends to) this mapping. The result is a structure you can query in logarithmic time: look up the token, get the row list, apply ranking, return top N.

MySQL stores its FTS index in separate internal InnoDB tables (auxiliary tables). The main table holds the data; the FTS index is in hidden tables (INNODB_FTS_INDEX_TABLE, INNODB_FTS_INDEX_CACHE). You cannot query them directly for debugging, but they exist.

PostgreSQL uses GiST or GIN indexes, which store lexemes (stemmed tokens) directly in the index structure. GIN is faster for lookups (read-heavy), GiST is faster for updates (write-heavy). You can query the tsvector column directly for debugging.

How a query is executed: When you run MATCH(col) AGAINST('search'), the database: (1) tokenizes and stems the query text, (2) looks up each token in the inverted index, (3) retrieves posting lists (list of row IDs), (4) optionally intersects them for multi-word queries, (5) computes a relevance score (TF-IDF or BM25) for each candidate row, (6) sorts by score and returns top N.

fts_internals_demo.sqlSQL

-- MySQL: InnoDB FTS uses hidden auxiliary tables
-- You can inspect them (read-only)
SELECT * FROM INFORMATION_SCHEMA.INNODB_FTS_INDEX_TABLE;
SELECT * FROM INFORMATION_SCHEMA.INNODB_FTS_INDEX_CACHE;

-- PostgreSQL: tsvector column stores lexemes
CREATE TABLE io_thecodeforge.documents (
    id SERIAL PRIMARY KEY,
    body TEXT,
    search_vector TSVECTOR GENERATED ALWAYS AS (to_tsvector('english', body)) STORED
);

-- Insert and the tsvector is auto-generated
INSERT INTO io_thecodeforge.documents (body) VALUES ('The quick brown fox jumps over the lazy dog.');

-- Query the tsvector directly to see normalized terms
SELECT id, body, search_vector FROM io_thecodeforge.documents;

-- Output shows:
-- 'brown':2 'dog':8 'fox':3 'jump':4 'lazi':7 'quick':1
-- Stopwords ('the', 'over') are omitted. Stemming: 'jumps' → 'jump', 'lazy' → 'lazi'

Output

id | body | search_vector

----+------+-------------------------------------------------------

1 | The quick brown fox jumps over the lazy dog. | 'brown':2 'dog':8 'fox':3 'jump':4 'lazi':7 'quick':1

Mental Model

Mental Model: Dictionary with Page Numbers

Think of an inverted index as the index at the back of a textbook — every key term points to every page where it appears.

Token = word in the book's index
Posting list = list of page (row) numbers where that word occurs
Position = where on the page the word appears, used for phrase search
Frequency = how many times the word appears on that page, used for relevance ranking
The index is pre-built — queries don't scan rows, they consult the index.

📊 Production Insight

InnoDB FTS stores the index in auxiliary tables managed internally — you cannot query them directly for most operations.

PostgreSQL's GIN index is transactional, so concurrent reads see a consistent snapshot during updates.

Failure: a server crash during a bulk insert can leave the FTS index in an inconsistent state — always test with CHECK TABLE.

Rule: For write-heavy workloads, PostgreSQL GiST has lower overhead than GIN but slower reads.

🎯 Key Takeaway

An inverted index is a reverse mapping: from words to rows, not rows to words.

It's the reason FTS queries return in milliseconds instead of seconds.

The choice of storage engine (InnoDB vs GIN) directly impacts write vs read performance.

Choose your FTS implementation based on workload

IfRead-heavy, large text columns, infrequent writes

→

UseUse PostgreSQL GIN index — best query speed, slower inserts, acceptable for read-mostly tables

IfWrite-heavy, many updates, small documents

→

UseUse MySQL InnoDB FTS — better insert performance, acceptable query speed, transactional consistency

IfNeed phrase search and fuzzy matching

→

UseBoth support, but PostgreSQL's phrase search is more robust with tsquery operators (<-> for adjacency)

IfZero-downtime index rebuilds required

→

UsePostgreSQL with CONCURRENTLY rebuilds. MySQL requires table lock during OPTIMIZE TABLE.

MATCH/AGAINST Syntax and Relevance Ranking

The SQL syntax for full-text search is MATCH(columns) AGAINST('query' IN NATURAL LANGUAGE MODE). The engine returns rows ordered by relevance — a floating-point score. Higher scores mean more relevant.

Relevance ranking isn't magic. Behind the scenes the database computes a variant of TF-IDF (MySQL) or BM25 (PostgreSQL).

TF-IDF (Term Frequency-Inverse Document Frequency) rewards terms that appear frequently in a document but rarely across the whole corpus. It doesn't normalise for document length, so long documents (with more words) tend to score higher simply because they contain more text.
BM25 (Best Matching 25) adds document length normalisation. A short document that matches a term is given a higher boost relative to its length. It also saturates term frequency so that 100 occurrences of a word doesn't give 100x the score of 1 occurrence.

IN BOOLEAN MODE gives you operators: +mandatory, -exclude, *wildcard, "phrase". It doesn't rank — it just filters. Use it when you need exact control over which terms must match, not when you need relevance ordering.

WITH QUERY EXPANSION (MySQL) adds terms from top-ranked results to the original query. It can help find related documents but can also drift off-topic.

Practical example: For a product catalogue with descriptions of varying lengths (e.g., 50-character titles vs 5000-character detailed descriptions), BM25 gives fairer ranking. MySQL's TF-IDF would rank the long descriptions higher even if the title contains the exact match.

match_against_demo.sqlSQL

-- MySQL: Natural language mode (ranked)
SELECT id, title,
       MATCH(title, body) AGAINST('Docker Compose' IN NATURAL LANGUAGE MODE) AS relevance
FROM io_thecodeforge.articles
ORDER BY relevance DESC;

-- PostgreSQL: ts_rank with normalization
SELECT id, title,
       ts_rank(to_tsvector('english', title || ' ' || body),
               plainto_tsquery('english', 'Docker Compose'), 1) AS relevance
FROM io_thecodeforge.articles
ORDER BY relevance DESC;

-- Boolean mode with operators (MySQL/PostgreSQL)
SELECT id, title
FROM io_thecodeforge.articles
WHERE MATCH(title, body) AGAINST('+Docker +healthcheck -deprecated' IN BOOLEAN MODE);

-- Phrase search in PostgreSQL using <-> operator
SELECT id, title
FROM io_thecodeforge.articles
WHERE to_tsvector('english', body) @@ to_tsquery('docker <-> compose');

Output

With ranking, documents containing the phrase in the title score higher. Boolean mode returns only documents that match all mandatory terms.

⚠ Ranking trap: short documents lose in MySQL

TF-IDF over-weights long documents. If your catalogue has product names (short) and descriptions (long), the description will often rank higher even when the name is exact match. Use PostgreSQL or implement manual boosting of title column with MATCH() * weight.

📊 Production Insight

MySQL's relevance score is normalised by total term frequency across all rows — adding a row can change all scores, causing order to shift non-deterministically.

PostgreSQL's ts_rank is deterministic per query, but the configuration (weights via setweight) controls which column matters more.

Failure: forgetting to use ORDER BY in natural language mode — rows come back in arbitrary (insertion) order, not by relevance. Your users see random results.

🎯 Key Takeaway

MATCH/AGAINST with NATURAL LANGUAGE MODE returns ranked results automatically.

Boolean mode gives you operators but no ranking — use it for filtering.

Ranking algorithms differ: MySQL uses TF-IDF (favours long docs), PostgreSQL uses BM25 (normalises length).

thecodeforge.io

Full Text Search Sql

Stopwords, Stemming and Language Support

Stopwords are common words (the, a, in, of) that don't carry search meaning. FTS engines discard them by default. But the default stopword list is English-centric and often wrong for your domain. If you're searching a product catalogue and 'ink' is a valid product name, but 'ink' happens to be in your custom stopword list — you'll lose results. On PostgreSQL, the default list includes 'the', 'a', 'and', 'of', 'to', 'in', 'for', etc. Not 'ink', unless customised. But technical terms like 'set', 'key', 'host', 'port' are NOT in default lists, but could be added accidentally.

Stemming reduces words to root forms: 'running' becomes 'run', 'better' becomes 'good'. MySQL uses a built-in stemmer based on the language setting. PostgreSQL uses dictionaries (english, simple, snowball). You can create custom dictionaries for domain-specific terms.

Key trade-off: aggressive stemming conflates unrelated words (e.g., 'marketing' and 'market' become 'market') causing false positives. Too little stemming and you miss variants.

Language-specific considerations: MySQL supports per-column character sets and collations that affect tokenisation. For non-English languages (Chinese, Japanese, Korean), MySQL's ngram parser splits text into n-character substrings; PostgreSQL's parser uses language-specific dictionaries.

Custom dictionaries in PostgreSQL: You can chain dictionaries (e.g., a custom thesaurus that maps 'JS' to 'JavaScript', then a snowball stemmer). This is far more flexible than MySQL's global setting.

custom_stopword.sqlSQL

-- MySQL: set a custom stopword table
SET GLOBAL innodb_ft_server_stopword_table = 'mydb/my_stopwords';

CREATE TABLE mydb.my_stopwords (value VARCHAR(30));
INSERT INTO mydb.my_stopwords VALUES ('the'), ('a'), ('of'), ('our'), ('your');

-- Rebuild index for changes to take effect
OPTIMIZE TABLE mydb.articles;

-- PostgreSQL: create custom text search configuration
CREATE TEXT SEARCH CONFIGURATION io_thecodeforge.product_search (COPY = english);

-- Replace stemmer with 'simple' (no stemming) for precise matching
ALTER TEXT SEARCH CONFIGURATION io_thecodeforge.product_search
    ALTER MAPPING FOR word WITH simple;

-- Add a custom thesaurus dictionary
CREATE TEXT SEARCH DICTIONARY my_thesaurus (
    TEMPLATE = thesaurus,
    DictFile = 'product_synonyms'
);

-- Then use it in queries
SELECT to_tsvector('io_thecodeforge.product_search', 'JS framework');

🔥Stopword surprise: technical terms are often stopwords by accident

Words like 'set', 'key', 'host', 'port', 'null' are common in technical documentation but may be stopwords in some languages or custom lists. Always export and review your stopword list against actual product names and user queries.

📊 Production Insight

Default stopword lists are trained on general English — they hurt domain-specific search.

PostgreSQL allows per-query tokenizer overrides; MySQL requires global config changes (requires restart for fulltext index rebuild).

Failure: a legal search system that dropped 'contract' because it matched a stopword in the custom list — double-check every word you blacklist.

Rule: For domain-specific catalogues, start with no stopwords (empty list) and only add those proven to cause false positives.

🎯 Key Takeaway

Stopwords and stemming are language-specific — never trust the default for your domain.

Test your FTS with a sample of real queries to catch false positives.

PostgreSQL's per-query configuration gives more control; MySQL's global setting is simpler but less flexible.

Performance Tuning and Index Maintenance

Full-text indexes grow with your data and can become bloated, fragmented, or stale.

MySQL InnoDB FTS tuning

innodb_ft_min_token_size (default 3): ignore words shorter than this. Lower to index acronyms ('AI', 'UX') but increases index size by 20-30%.
innodb_ft_max_token_size (default 84): ignore words longer than this. Increase for long product names.
innodb_ft_cache_size (default 8MB): size of the in-memory cache for tokenised data. Larger values improve insert performance for bulk operations.
Index maintenance: OPTIMIZE TABLE rebuilds the FTS index but LOCKS THE TABLE for writes. Schedule during low traffic.

PostgreSQL GIN/GiST tuning

work_mem: controls memory used for sorting during index creation. Higher values speed up large index builds.
maintenance_work_mem: memory used for REINDEX and VACUUM operations.
gin_pending_list_limit: for GIN indexes with fast update enabled, this controls the size of the pending list.
Zero-downtime maintenance: CREATE INDEX CONCURRENTLY and REINDEX INDEX CONCURRENTLY rebuild indexes without blocking writes.

Monitoring index health

MySQL: SHOW INDEX FROM table shows cardinality. Compare INNODB_FTS_INDEX_TABLE row count with base table row count to detect staleness.
PostgreSQL: pgstatindex() for GiST indexes, pg_size_pretty(pg_relation_size('index_name')) for index size growth.

Practical benchmarks: On a 10 million row table with 1KB average text per row, MySQL InnoDB FTS index size is about 20-30% of data size (2-3GB). PostgreSQL GIN index is larger, about 40-50% (4-5GB), but provides faster read queries.

The staleness trap: After a bulk delete or update, FTS indexes may still point to deleted rows. Run CHECK TABLE (MySQL) or REINDEX (PostgreSQL) after any bulk operation affecting >20% of rows.

fts_tuning.sqlSQL

-- MySQL: Check FTS index fragmentation
SHOW INDEX FROM io_thecodeforge.articles;
SELECT * FROM INFORMATION_SCHEMA.INNODB_FTS_INDEX_TABLE
WHERE TABLE_ID = (SELECT TABLE_ID FROM INFORMATION_SCHEMA.INNODB_TABLES WHERE NAME = 'io_thecodeforge/articles');

-- Rebuild fulltext index (blocks writes) — schedule during maintenance
OPTIMIZE TABLE io_thecodeforge.articles;

-- Set min token length (requires restart)
SET GLOBAL innodb_ft_min_token_size = 2;

-- PostgreSQL: create index CONCURRENTLY (no downtime)
CREATE INDEX CONCURRENTLY idx_fts_articles ON io_thecodeforge.articles USING GIN(to_tsvector('english', body));

-- Monitor index size
SELECT pg_size_pretty(pg_relation_size('idx_fts_articles'));

-- Rebuild online without locking
REINDEX INDEX CONCURRENTLY idx_fts_articles;

-- Increase memory for index operations
SET maintenance_work_mem = '1024MB';  -- for REINDEX
SET work_mem = '64MB';  -- for query-time sorting

💡Use concurrent rebuilds to avoid downtime

PostgreSQL's CONCURRENTLY option lets you rebuild indexes without blocking writes. In MySQL, schedule OPTIMIZE TABLE during maintenance windows and set innodb_ft_cache_size high enough to avoid excessive disk flushes.

📊 Production Insight

In MySQL, OPTIMIZE TABLE on a table with a fulltext index rebuilds that index completely — on a 100GB table this can take hours.

PostgreSQL's CONCURRENTLY rebuild is slower overall but doesn't block production traffic — trade throughput for availability.

Failure: a team that ran OPTIMIZE TABLE on a 200GB table during peak hours, causing a 45-minute write outage. The marketing team couldn't add new products.

Rule: For tables >10GB, always test rebuild times on a staging replica before production. Schedule rebuilds during the lowest traffic window (e.g., Sunday 3 AM).

🎯 Key Takeaway

Index rebuilds are expensive — always do them during low traffic or use CONCURRENTLY.

Monitor index size weekly; a sudden growth spike may indicate tokenisation issues.

Test your rebuild strategy with a production-data clone before the first real run.

MySQL vs PostgreSQL: Implementation Differences That Matter

While both MySQL and PostgreSQL offer full-text search, their internals differ significantly. Choosing the wrong one can lead to performance pain or missing features.

Index storage: MySQL stores the FTS index in hidden auxiliary tables (INNODB_FTS_INDEX_TABLE, INNODB_FTS_INDEX_CACHE) managed by InnoDB. PostgreSQL stores it as a GiST or GIN index on a tsvector column.

Tokenisation and stemming: MySQL uses a built-in tokeniser and stemmer that are less configurable. You can change the minimum token length via innodb_ft_min_token_size and set a custom stopword table, but the stemmer is fixed per language (English, French, Spanish, etc.). PostgreSQL uses text search configurations that separate tokeniser (parser) and dictionary (stemmer, stopwords, thesaurus), giving you the ability to chain dictionaries and create custom ones.

Index maintenance: MySQL requires OPTIMIZE TABLE to rebuild the FTS index (table is locked). PostgreSQL supports CREATE INDEX CONCURRENTLY and REINDEX CONCURRENTLY for zero-downtime rebuilds.

Query flexibility: PostgreSQL supports ranking weights per column using setweight and can store tsvector as a generated column. MySQL's ranking is less flexible; you can use multiple MATCH clauses with custom multipliers but it's less efficient.

Performance characteristics

MySQL InnoDB FTS: Optimised for transactional workloads with moderate write rates. Reads are fast but not as fast as PostgreSQL GIN.
PostgreSQL GIN: Faster reads (10-20%) but slower writes (30-50% overhead due to lexeme compression). GiST is slower for reads but faster for writes.

Debugging: MySQL's FTS debugging requires querying INFORMATION_SCHEMA tables. PostgreSQL allows direct querying of the tsvector column, making debugging trivial.

Production recommendation: Use PostgreSQL if you need custom stemming, zero-downtime rebuilds, or fair ranking (BM25). Use MySQL if you're already in a MySQL environment and have simple text search requirements with moderate write rates.

mysql_vs_postgres_fts.sqlSQL

-- MySQL: Creating an FTS index
ALTER TABLE io_thecodeforge.articles ADD FULLTEXT(title, body);

-- MySQL: Checking FTS stats (auxiliary tables)
SELECT * FROM INFORMATION_SCHEMA.INNODB_FTS_INDEX_TABLE;

-- MySQL: Rebuild FTS index (blocks writes)
OPTIMIZE TABLE io_thecodeforge.articles;

-- PostgreSQL: Creating GIN index on generated tsvector column
ALTER TABLE io_thecodeforge.articles ADD COLUMN search_vector tsvector
  GENERATED ALWAYS AS (to_tsvector('english', coalesce(title, '') || ' ' || coalesce(body, ''))) STORED;
CREATE INDEX idx_articles_fts ON io_thecodeforge.articles USING GIN(search_vector);

-- PostgreSQL: Query the tsvector directly for debugging
SELECT id, search_vector FROM io_thecodeforge.articles;

-- PostgreSQL: Rebuild without blocking
REINDEX INDEX CONCURRENTLY idx_articles_fts;

⚠ Parser differences matter

MySQL's default parser treats URLs as separate tokens (e.g., 'https' as one token). PostgreSQL's parser handles URLs differently. Always test with your actual data to catch tokenisation surprises before production.

📊 Production Insight

In MySQL, the FTS index is not directly queryable — debugging requires INFORMATION_SCHEMA tables.

PostgreSQL allows direct querying of the tsvector column, making debugging trivial.

Failure: a team that relied on MySQL's FTS for a real-time search feature discovered the index could not be rebuilt online — they had to redesign for PostgreSQL mid-project.

🎯 Key Takeaway

MySQL FTS is simpler but less flexible; PostgreSQL FTS offers more control and zero-downtime maintenance.

Choose based on operational needs, not just syntax familiarity.

Always test index rebuild procedures before going to production.

Choose your FTS implementation based on operational needs

IfNeed zero-downtime index rebuilds

→

UseChoose PostgreSQL with CONCURRENTLY rebuilds. MySQL OPTIMIZE TABLE locks the table.

IfSimple text search, infrequent writes, minimal config

→

UseMySQL InnoDB FTS is simpler to set up and has lower operational overhead.

IfCustom stemming per language or domain

→

UsePostgreSQL with custom text search configurations (thesaurus, synonym dictionaries).

IfRead-heavy workload with large text columns (product catalogue, blog search)

→

UsePostgreSQL GIN index offers 10-20% faster reads than MySQL FTS for large datasets.

Full-Text Search Queries: How They Actually Work at the Protocol Level

You've seen MATCH/AGAINST. Fine. But what happens when you type a query like '+smartphone -iphone warranty' in BOOLEAN MODE? The query parser breaks that string into tokens at the same word boundaries the indexer used. Then it traverses the inverted index looking for documents containing 'smartphone' while excluding those that also contain 'iphone'. The Boolean operators are evaluated as set operations on document ID lists.

Here's where juniors get burned: query expansion and proximity searches. MySQL's WITH QUERY EXPANSION performs a two-phase search — the first returns results, the second re-runs the query using the most relevant terms from the first result set. Sounds clever until you realise it can return garbage if your initial query matched irrelevant documents. Never use it on production user-facing search without manual relevance testing.

SQL Server's CONTAINSTABLE and FREETEXTTABLE return ranked results with internal scoring you can inspect. The difference? FREETEXT does linguistic stemming and thesaurus expansion automatically. CONTAINS gives you exact control over prefixes, proximity, and inflectional forms. Pick the right weapon.

ProximitySearchKills.sqlSQL

// io.thecodeforge — database tutorial

-- MySQL: exact phrase with proximity
SELECT product_id, product_name
FROM products
WHERE MATCH(product_name, description)
AGAINST('"next day delivery"' IN BOOLEAN MODE);

-- SQL Server: proximity within 5 words
SELECT p.ProductID, p.Name
FROM Production.Product p
WHERE CONTAINS(p.Name, 'smartphone NEAR iphone', LANGUAGE 'English');

-- PostgreSQL: using tsquery operators
SELECT id, title
FROM documents
WHERE to_tsvector('english', content) @@
      to_tsquery('english', 'smartphone <-> iphone');
-- <-> means adjacent, <2> means within 2 words

Output

product_id | product_name

----------+-------------------

1001 | Smartphone XYZ next day delivery

2045 | Guaranteed next day delivery phones

(2 rows affected)

⚠ Production Trap: Query Expansion Bleed

WITH QUERY EXPANSION can pull in documents that share stopwords or common terms with your results, not semantically related ones. Your 'laptop battery life' query might return 'life insurance' documents. Test on a sample before enabling.

🎯 Key Takeaway

BOOLEAN MODE for AND/OR/NOT precision; NATURAL LANGUAGE MODE for ranked relevance; WITH QUERY EXPANSION only after manual testing — it's a shotgun, not a scalpel.

The Full-Text Index Architecture Nobody Draws on the Whiteboard

The inverted index is the heart, but the rest of the body matters too. Every full-text index has three structural layers: the token list, the postings list, and the position list. The token list maps each unique word to a compressed integer ID. The postings list maps that ID to a list of document IDs where the word appears. The position list tracks word offsets within each document for proximity searches.

SQL Server stores these as internal system tables inside a full-text catalog — a virtual container that doesn't map to a single file. Each catalog can span multiple filegroups, and each index has its own population history. When you rebuild a full-text index, SQL Server drops all internal structures and re-parses every row through the filter daemon and word breaker. That's why a rebuild on a 10-million-row table takes hours, not seconds.

PostgreSQL stores tsvector directly in the table column. No separate catalog. This means updates are row-level — when you change a row, PostgreSQL re-parses only that row through a trigger or computed column. The trade-off: no incremental population like SQL Server's change tracking. You pay for every UPDATE with CPU to re-lex the text. Know your write pattern before choosing.

IndexStructureDiagnostic.sqlSQL

// io.thecodeforge — database tutorial

-- SQL Server: inspect full-text catalog details
SELECT c.name AS catalog_name,
       i.name AS index_name,
       i.is_active,
       i.crawl_type_description,
       i.item_count,
       i.unique_key_count
FROM sys.fulltext_catalogs c
JOIN sys.fulltext_indexes i ON c.fulltext_catalog_id = i.fulltext_catalog_id;

-- PostgreSQL: check tsvector stats per row
SELECT id,
       length(to_tsvector('english', content)) AS vector_size,
       nspname || '.' || relname AS table_name
FROM documents d
JOIN pg_class ON oid = d.tableoid
LIMIT 5;

Output

catalog_name | index_name | is_active | crawl_type | item_count

-------------+-------------------+-----------+------------+-----------

ProductFtCat | ProductDescFT | 1 | FULL | 2842154

SalesFtCat | SalesNotesFT | 1 | INCREMENTAL| 892104

(2 rows affected)

🔥Senior Shortcut: Monitor Crawl Type

SQL Server's sys.fulltext_indexes.crawl_type_description tells you if your index did a FULL rescan (expensive) or INCREMENTAL (cheap, uses change tracking). If you see FULL on every scheduled job, you've misconfigured change tracking. Fix it before the DBAs hunt you.

🎯 Key Takeaway

Inverted index size = unique tokens * average document frequency. Each token ID is 4 bytes, each document ID is 4 bytes, each position is 2 bytes. For 1M documents with 100K unique tokens, expect about 800MB for the postings list alone. Plan your disk.

PATINDEX and CHARINDEX — Pattern Matching Beyond LIKE in SQL Server

LIKE is fine for simple wildcards, but it chokes when you need position-aware substring searches without full-text indexing. PATINDEX and CHARINDEX fill that gap. CHARINDEX is your scalpel: it returns the starting position of one string inside another, zero if missing. No wildcards, no guesswork. PATINDEX steps it up with regex-like patterns — find any numeric digit, any non-alphanumeric character, or custom patterns using % and _ with square brackets. Why this matters: you can extract specific substrings from messy data without resorting to slow CLR functions or client-side parsing. Use them when full-text search is overkill but LIKE is too weak. Both ignore collation unless you force it, so watch case sensitivity. They're also non-SARGable — indexes won't help. Keep pattern matching lean.

PatternMatching.sqlSQL

// io.thecodeforge — database tutorial

-- Find position of 'urgent' (case-sensitive by collation)
SELECT CHARINDEX('urgent', body, 1) AS pos
FROM tickets
WHERE body LIKE '%urgent%';

-- PATINDEX: find first numeric digit position
SELECT PATINDEX('%[0-9]%', description) AS first_digit_pos
FROM products
WHERE PATINDEX('%[0-9]%', description) > 0;

-- PATINDEX with exclusion pattern: non-alpha start
SELECT PATINDEX('%[^a-zA-Z]%', title) AS first_non_alpha
FROM articles;

Output

pos | first_digit_pos | first_non_alpha

---- | --------------- | ---------------

12 | 4 | 1

0 | 0 | 3

⚠ Production Trap:

PATINDEX only supports a subset of regex-like patterns — don't expect full regex. Use square brackets for character groups, not quantifiers like + or *. For complex patterns, consider CLR integration or full-text search.

🎯 Key Takeaway

CHARINDEX finds positions, PATINDEX finds patterns — both outrun LIKE when you need speed and precision.

PostgreSQL Full-Text Search: tsvector, tsquery, GIN Index

PostgreSQL offers a robust full-text search engine built directly into the database. The core components are tsvector (a sorted list of lexemes) and tsquery (a query expression). To use it, you first convert text into a tsvector using the to_tsvector function, which tokenizes, stems, and removes stopwords based on a language configuration (e.g., 'english'). Queries are expressed as tsquery using to_tsquery or plainto_tsquery. The real power comes from indexing: a Generalized Inverted Index (GIN) on a tsvector column dramatically accelerates search operations. For example, you can create a table with a tsvector column and a GIN index, then query with the @@ operator. This approach supports ranking, phrase search, and even custom dictionaries. Unlike MySQL's FULLTEXT indexes, PostgreSQL's implementation is more flexible and standards-compliant, but requires explicit column management. The GIN index is slower to build than a B-tree but provides excellent query performance for text search. In production, consider using a generated column to automatically maintain the tsvector, and use the websearch_to_tsquery function for user-friendly query parsing.

postgresql_fulltext.sqlSQL

-- Create table with tsvector column
CREATE TABLE products (
    id SERIAL PRIMARY KEY,
    name TEXT,
    description TEXT,
    search_vector TSVECTOR
);

-- Populate tsvector using a trigger or generated column
CREATE OR REPLACE FUNCTION products_search_update() RETURNS trigger AS $$
BEGIN
    NEW.search_vector := to_tsvector('english', COALESCE(NEW.name, '') || ' ' || COALESCE(NEW.description, ''));
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER trg_products_search
    BEFORE INSERT OR UPDATE ON products
    FOR EACH ROW EXECUTE FUNCTION products_search_update();

-- Create GIN index
CREATE INDEX idx_products_search ON products USING GIN(search_vector);

-- Query: find products containing 'ink'
SELECT * FROM products
WHERE search_vector @@ to_tsquery('english', 'ink');

-- Ranking with ts_rank
SELECT *, ts_rank(search_vector, to_tsquery('english', 'ink')) AS rank
FROM products
WHERE search_vector @@ to_tsquery('english', 'ink')
ORDER BY rank DESC;

🔥Stopwords and Stemming in PostgreSQL

📊 Production Insight

Use generated columns (PostgreSQL 12+) to automatically maintain the tsvector without triggers. For large datasets, consider using gin__pending_list_limit to control index build memory. Monitor pg_stat_user_indexes for index usage.

🎯 Key Takeaway

PostgreSQL's full-text search uses tsvector/tsquery with GIN indexes for efficient, language-aware text search that handles stopwords and stemming automatically.

Full-Text Search in MySQL: InnoDB FULLTEXT Indexes

MySQL's InnoDB storage engine supports FULLTEXT indexes, enabling efficient text search without external tools. Unlike PostgreSQL, MySQL's FULLTEXT indexes are created directly on CHAR, VARCHAR, or TEXT columns. The index is built using an inverted index structure, but the implementation is less customizable than PostgreSQL's. To use it, you create a FULLTEXT index on the desired columns and then query with the MATCH() ... AGAINST() syntax. MySQL supports two search modes: IN NATURAL LANGUAGE MODE (default) and IN BOOLEAN MODE. Natural language mode ranks results by relevance, while boolean mode allows operators like + (must have), - (must not), and * (wildcard). Stopwords are handled via the built-in stopword list, which can be customized by setting the ft_stopword_file system variable. However, MySQL's FULLTEXT index has limitations: it only works with MyISAM and InnoDB tables, and the minimum word length (default 4) can be adjusted with ft_min_word_len. For the 'ink' problem, a search for 'ink' might be excluded if 'ink' is a stopword (it is not by default) or if the word length is too short. In MySQL 8.0, the ngram parser is available for CJK languages. Performance-wise, FULLTEXT indexes are slower to build than B-tree indexes but provide fast query response. In production, monitor Innodb_fts_num_doc_indexed and Innodb_fts_total_cache_size for index health.

mysql_fulltext.sqlSQL

-- Create table with FULLTEXT index
CREATE TABLE products (
    id INT AUTO_INCREMENT PRIMARY KEY,
    name VARCHAR(255),
    description TEXT,
    FULLTEXT INDEX ft_products (name, description)
) ENGINE=InnoDB;

-- Insert sample data
INSERT INTO products (name, description) VALUES
('Ink Pen', 'A high-quality ink pen for writing'),
('Printer Ink', 'Black ink cartridge for printers'),
('Paper', 'A4 size paper for printing');

-- Natural language search
SELECT *, MATCH(name, description) AGAINST('ink' IN NATURAL LANGUAGE MODE) AS relevance
FROM products
WHERE MATCH(name, description) AGAINST('ink' IN NATURAL LANGUAGE MODE)
ORDER BY relevance DESC;

-- Boolean mode search (require 'ink', exclude 'printer')
SELECT * FROM products
WHERE MATCH(name, description) AGAINST('+ink -printer' IN BOOLEAN MODE);

-- Check stopwords (default list)
SELECT * FROM INFORMATION_SCHEMA.INNODB_FT_DEFAULT_STOPWORD;

⚠ Stopwords Can Hide 'ink' Products

📊 Production Insight

Always test with short words. Set ft_min_word_len=3 (or lower) globally if your data contains short terms. Rebuild FULLTEXT indexes after changing stopword or word length settings using ALTER TABLE ... DROP INDEX ... ADD INDEX.

🎯 Key Takeaway

MySQL's FULLTEXT indexes are easy to set up but have configuration pitfalls like minimum word length that can silently exclude short terms like 'ink'.

Elasticsearch vs Built-in Full-Text Search: Decision Guide

When should you use a dedicated search engine like Elasticsearch over built-in database full-text search? The answer depends on scale, complexity, and operational overhead. Built-in solutions (MySQL FULLTEXT, PostgreSQL tsvector) are ideal for simple search needs within a single database, with minimal setup and no extra infrastructure. They handle stopwords, stemming, and basic ranking well, but struggle with advanced features like fuzzy search, autocomplete, faceted navigation, and distributed search across large datasets. Elasticsearch excels at these: it provides near-real-time indexing, complex query DSL, aggregations, and horizontal scalability. However, it introduces operational complexity (cluster management, memory tuning) and data synchronization challenges. For the 'ink' problem, both built-in and Elasticsearch can solve it, but Elasticsearch offers more control over analysis (custom tokenizers, filters) to ensure 'ink' is indexed correctly. A practical decision: if your search volume is under 1 million documents and queries are simple, stick with built-in. If you need advanced features, high throughput, or multi-tenant search, invest in Elasticsearch. Hybrid architectures are also common: use database full-text for simple lookups and Elasticsearch for complex search. Remember that Elasticsearch is not a primary data store; always keep your source of truth in the database.

elasticsearch_vs_sql.sqlSQL

-- Example: PostgreSQL tsquery for simple search (built-in)
SELECT * FROM products
WHERE search_vector @@ to_tsquery('english', 'ink');

-- Equivalent Elasticsearch query (via HTTP API)
-- GET /products/_search
-- {
--   "query": {
--     "match": {
--       "description": "ink"
--     }
--   }
-- }

-- For advanced: Elasticsearch with custom analyzer to handle short words
-- PUT /products
-- {
--   "settings": {
--     "analysis": {
--       "analyzer": {
--         "my_analyzer": {
--           "type": "standard",
--           "stopwords": "_english_",
--           "filter": ["lowercase"]
--         }
--       }
--     }
--   },
--   "mappings": {
--     "properties": {
--       "description": {
--         "type": "text",
--         "analyzer": "my_analyzer"
--       }
--     }
--   }
-- }

💡Start Simple, Scale Later

📊 Production Insight

If you anticipate needing Elasticsearch later, design your database schema with a separate search table or column to simplify data export. Monitor query performance and user feedback to decide when to migrate.

🎯 Key Takeaway

Choose built-in full-text search for simplicity and low overhead; choose Elasticsearch for advanced features, scalability, and real-time search at scale.

● Production incidentPOST-MORTEMseverity: high

The Invisible Products: When Stopwords Hid an Entire Category

Symptom

Search for 'ink' returns no results. Search for 'ink cartridge' returns only 'cartridge' matches, missing all ink products. LIKE '%ink%' works fine. MySQL logs show the query executes, but FTS reports no matches.

Assumption

The team assumed the FTS index was stale or the query syntax was wrong. They ran OPTIMIZE TABLE, rebuilt the index, even switched from NATURAL LANGUAGE to BOOLEAN mode. Nothing worked. They never considered that 'ink' might be a stopword.

Root cause

MySQL's default stopword list includes common English words — 'the', 'a', 'of', 'it', 'and' — but also 'ink'? No, the default list actually doesn't include 'ink'. The team's specific version had a custom stopword list from a previous DBA who added domain-specific noise words. 'Ink' was included because the DBA thought it was too common in print logs. On PostgreSQL, the default stopword list also doesn't include 'ink'. But both databases allow custom stopword tables. The team inherited an undocumented custom stopword configuration that filtered critical product terms. The real issue: no one had reviewed the stopword list for domain relevance. Technical terms like 'set', 'key', 'host', 'port' also risk being filtered depending on the list version. The team learned that default stopwords are trained on general English, not on e-commerce catalogues.

Fix

1. Checked the stopword list: MySQL SHOW VARIABLES LIKE 'innodb_ft_server_stopword_table'; PostgreSQL SELECT * FROM pg_catalog.pg_ts_config_map. 2. Found 'ink' in the custom stopword list. Removed it: SET GLOBAL innodb_ft_server_stopword_table = ''; (MySQL) or ALTER TEXT SEARCH CONFIGURATION ... DROP MAPPING FOR ... (PostgreSQL). 3. Rebuilt the full-text index: OPTIMIZE TABLE products; (MySQL) or REINDEX INDEX CONCURRENTLY idx_products_fts; (PostgreSQL). 4. Added a scheduled job to export the stopword list weekly to version control and compare against domain-critical terms. 5. Created a monitoring query that periodically searches for terms that should exist (like 'ink') and alerts if count = 0. 6. Documented the stopword configuration in the team's runbook.

Key lesson

Stopwords are not universal. Default lists suit general English, not e-commerce, medical, or technical domains.
Always export and review your full-text configuration (stopwords, stemmer, min token length) before deploying to production.
Test FTS with real user queries from day one. If a term should return results, verify it does.
For domain-critical terms (product names, SKU prefixes, technical acronyms), ensure they are NOT in the stopword list.
Set up monitoring: a weekly job that queries a set of known terms and alerts if any return zero results.

Production debug guideSymptom → Action for common FTS failures in production5 entries

Symptom · 01

FTS query returns no results but LIKE finds matches

→

Fix

Check if the text column is included in the full-text index using SHOW INDEX or \d+ table. Verify stopwords are not filtering your search terms by querying the stopword table directly. Check min token length — 'AI' with length 2 filtered if min=3.

Symptom · 02

Relevance ranking ignores your most important keywords

→

Fix

Review the ranking algorithm (BM25 vs TF-IDF). In PostgreSQL, adjust the weight column configuration using setweight. In MySQL, tweak the WITH QUERY EXPANSION option or add manual multipliers in ORDER BY.

Symptom · 03

FTS query is slow (over 1 second) on a large table

→

Fix

Check index size and fragmentation. Run ANALYZE TABLE (MySQL) or VACUUM ANALYZE (PostgreSQL). Increase innodb_ft_cache_size (MySQL) or work_mem (PostgreSQL). Consider partitioning the table if it exceeds 100GB.

Symptom · 04

Search misses plurals or verb forms — 'running' doesn't match 'ran'

→

Fix

Examine the stemming dictionary. For MySQL, set the proper parser plugin (ngram for CJK, mecab for Japanese). For PostgreSQL, choose the correct text search configuration (english vs simple) and verify the dictionary (snowball vs simple).

Symptom · 05

New products don't appear in search results after bulk insert

→

Fix

Check if the FTS index is stale. For MySQL InnoDB, the index updates transactionally but you may need OPTIMIZE TABLE after bulk operations. For PostgreSQL, ensure the tsvector column is generated or the index includes the column. Run REINDEX CONCURRENTLY.

★ Full-Text Search Debug Cheat SheetCommands to run when FTS breaks in production. Run these first.

FTS index not picking up new data−

Immediate action

Check last rebuild timestamp and run a manual rebuild

Commands

SELECT MAX(last_modified) FROM products; -- compare to index metadata

OPTIMIZE TABLE products; -- MySQL or REINDEX INDEX CONCURRENTLY idx_products_fts; -- PostgreSQL

Fix now

Force a full rebuild: OPTIMIZE TABLE products; (MySQL, locks table) or REINDEX INDEX CONCURRENTLY idx_products_fts; (PostgreSQL, no lock)

Wrong ranking order, irrelevant results at top+

Full-text query times out (>5 seconds)+

Short words like 'AI', 'UX' not returning results+

Phrase search using quotes doesn't work+

LIKE vs Full-Text Search: Head-to-Head

Aspect	LIKE '%term%'	Full-Text Search
Data structure	B-Tree (unusable with leading wildcard)	Inverted index
Speed (1M rows, 1KB text)	2-10 seconds (full sequential scan)	10-100 ms (index lookup)
Ranking	None (order of insertion or random)	TF-IDF or BM25 relevance score
Stemming	None ('running' ≠ 'run')	Built-in language-specific stemmer
Stopword handling	None ('the' is searched like any word)	Automatic filtering of noise words
Phrase search	LIKE '%exact phrase%' (scan, no ordering)	Boolean mode operators or <-> adjacency
Index maintenance	None (B-Tree updates automatically)	Requires periodic rebuild (OPTIMIZE or REINDEX)
Prefix search	LIKE 'prefix%' (uses index, but leading wildcard does not)	wildcard in boolean mode ('run')
Infrastructure	Built into all SQL databases, no setup	Requires index creation and configuration

⚙ Quick Reference

11 commands from this guide

File	Command / Code	Purpose
fts_internals_demo.sql	SELECT * FROM INFORMATION_SCHEMA.INNODB_FTS_INDEX_TABLE;	How the Inverted Index Works Internally
match_against_demo.sql	SELECT id, title,	MATCH/AGAINST Syntax and Relevance Ranking
custom_stopword.sql	SET GLOBAL innodb_ft_server_stopword_table = 'mydb/my_stopwords';	Stopwords, Stemming and Language Support
fts_tuning.sql	SHOW INDEX FROM io_thecodeforge.articles;	Performance Tuning and Index Maintenance
mysql_vs_postgres_fts.sql	ALTER TABLE io_thecodeforge.articles ADD FULLTEXT(title, body);	MySQL vs PostgreSQL
ProximitySearchKills.sql	SELECT product_id, product_name	Full-Text Search Queries
IndexStructureDiagnostic.sql	SELECT c.name AS catalog_name,	The Full-Text Index Architecture Nobody Draws on the Whitebo
PatternMatching.sql	SELECT CHARINDEX('urgent', body, 1) AS pos	PATINDEX and CHARINDEX
postgresql_fulltext.sql	CREATE TABLE products (	PostgreSQL Full-Text Search
mysql_fulltext.sql	CREATE TABLE products (	Full-Text Search in MySQL
elasticsearch_vs_sql.sql	SELECT * FROM products	Elasticsearch vs Built-in Full-Text Search

Key takeaways

FTS relies on an inverted index

no B-Tree magic. It maps tokens to row IDs, enabling O(log n) lookups instead of O(n) scans.

Ranking is algorithmic

MySQL uses TF-IDF (favours long documents), PostgreSQL uses BM25 (normalises length). Choose based on document length variance.

Stopwords and stemming must be tailored to your domain. Default English stopwords hide technical terms. Review and customise.

Always rebuild indexes after bulk operations or you'll get stale results. PostgreSQL CONCURRENTLY allows zero downtime; MySQL OPTIMIZE TABLE locks.

Test FTS with real user queries before production deployment. A query that works on 1k rows can fail on 1M rows due to tokenisation or ranking.

For typos and faceted search, FTS falls short. Know when to hand off to Elasticsearch

it's not a failure, it's the right architectural decision.

MySQL FTS is simpler but less flexible; PostgreSQL FTS offers more control and zero-downtime maintenance. Choose based on operational needs.

Phrase search requires BOOLEAN MODE in MySQL or <-> operator in PostgreSQL. NATURAL LANGUAGE MODE ignores quotes.

INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01SENIOR

Explain how an inverted index is built and queried in a SQL database's f...

Q02SENIOR

What is the difference between TF-IDF and BM25 ranking, and when would y...

Q03SENIOR

How do you debug a full-text search that returns no results for a query ...

Q04SENIOR

When should you use a dedicated search engine like Elasticsearch over SQ...

Q05SENIOR

How would you design a search system for a multilingual product catalogu...

Q01 of 05SENIOR

Explain how an inverted index is built and queried in a SQL database's full-text search engine.

ANSWER

An inverted index is built by tokenising each document's text, normalising tokens (lowercasing, stemming), discarding stopwords, and then constructing a mapping from each unique token to the list of document IDs where it appears, along with positional information and frequency. In MySQL InnoDB, this is stored in hidden auxiliary tables. In PostgreSQL, it's a GiST or GIN index on a tsvector column. Query execution: (1) tokenise and stem the query text, (2) look up each token in the inverted index (O(log n) per token), (3) retrieve posting lists (list of row IDs), (4) intersect them for multi-word queries, (5) compute a relevance score (TF-IDF or BM25) for each candidate row using term frequency and inverse document frequency, (6) sort by score and return top N. Complexity: O(k log n + m) where k = number of unique tokens in query, n = number of unique tokens in corpus, m = number of candidate rows. This is far faster than a full table scan (O(total documents × document length)).

FAQ · 9 QUESTIONS

Frequently Asked Questions

What is Full-Text Search in SQL in simple terms?

Can I use full-text search with a partial match like 'prefix*'?

Does full-text search support fuzzy matching (typo tolerance)?

How does full-text search handle multi-language content?

Is full-text search better than Elasticsearch?

How do I monitor full-text index health in production?

Can I use full-text search on a database view?

How does FTS handle NULL values?

Does FTS support scoring for my own custom fields?

Naren Founder & Principal Engineer

20+ years shipping high-throughput database systems. Drawn from code that ran under real load.

✓ Verified

production tested

July 18, 2026

last updated

2,466

articles · all by Naren

🔥

That's SQL Advanced. Mark it forged?

11 min read · try the examples if you haven't