Elasticsearch Internals Explained — Shards, Mappings, Queries and Production Gotchas
Every engineering team eventually hits a wall with relational databases and full-text search. You add a LIKE '%keyword%' clause, the query planner gives up doing index scans, and suddenly a product search that should return in 20ms is taking 4 seconds. At that point your DBA says 'we need Elasticsearch' — and they're right. But Elasticsearch isn't just a fast search box. It's a distributed, schema-flexible, near-real-time analytics engine built on Apache Lucene, and understanding its internals is what separates engineers who use it correctly from those who quietly destroy cluster performance over six months.
The problem Elasticsearch solves is multidimensional. Traditional databases store rows and optimise for exact matches and range scans. Elasticsearch stores documents and optimises for relevance-ranked full-text retrieval, aggregations over millions of documents, and horizontal scale-out without a DBA rearchitecting everything. It trades strong ACID consistency for incredible read throughput, and it makes that trade explicitly — which means you need to know exactly where the line is.
By the end of this article you'll understand how Elasticsearch physically stores and retrieves data (inverted indexes, segments, shards), how to define mappings that won't blow up in production, how to write Query DSL that the cluster actually executes efficiently, and — crucially — which beginner mistakes silently tank performance until the moment they catastrophically don't.
How Elasticsearch Actually Stores Data — Inverted Indexes, Segments and Shards
When you index a document in Elasticsearch, a chain of transformations happens before a single byte lands on disk. First, the document passes through an analyzer: a pipeline of character filters, tokenizers, and token filters that converts raw text into a stream of tokens. The default standard analyzer lowercases everything and splits on whitespace and punctuation. The resulting tokens are written into a Lucene inverted index — a data structure that maps each unique token to a sorted list of document IDs that contain it, along with position and frequency metadata.
But Lucene doesn't write directly to one giant index. It writes to small, immutable files called segments. Every time you index a batch of documents, a new segment is created in memory, then flushed to disk. Searches fan out across all segments and merge the results. Periodically, Lucene merges smaller segments into larger ones in the background — reducing the per-search overhead of scanning many files. This is why you'll see merge threads in your cluster stats, and why merge-heavy write workloads need careful tuning.
Elasticsearch wraps Lucene in a shard — a single Lucene instance. An index is split into N primary shards at creation time. This number is immutable without reindexing, which is the single most important architectural decision you'll make. Each primary shard has replica shards that serve as hot standbys for reads and failover. Understanding this hierarchy — token → segment → shard → index — is the mental model you need to debug every performance problem you'll ever have with Elasticsearch.
# ───────────────────────────────────────────────────────────────── # Step 1: Create an index with explicit shard/replica settings. # 3 primary shards means data is split across 3 Lucene instances. # 1 replica means each primary is copied once — 6 shards total. # WARNING: primary shard count cannot be changed after creation. # ───────────────────────────────────────────────────────────────── curl -X PUT "localhost:9200/products" \ -H "Content-Type: application/json" \ -d '{ "settings": { "number_of_shards": 3, "number_of_replicas": 1, "refresh_interval": "1s" } }' # ───────────────────────────────────────────────────────────────── # Step 2: Use the Analyze API to see EXACTLY what tokens # Elasticsearch will store for a given text value. # This is invaluable for debugging why a search doesn't match. # ───────────────────────────────────────────────────────────────── curl -X POST "localhost:9200/products/_analyze" \ -H "Content-Type: application/json" \ -d '{ "analyzer": "standard", "text": "High-Performance Running Shoes (Trail Edition)" }' # ───────────────────────────────────────────────────────────────── # Step 3: Check the low-level segment stats for the index. # Look at "num_docs", "size_in_bytes", and "count" (segment count). # A high segment count means merges haven't caught up — slower searches. # ───────────────────────────────────────────────────────────────── curl "localhost:9200/products/_segments?pretty"
{"acknowledged":true,"shards_acknowledged":true,"index":"products"}
# Step 2 response — the standard analyzer produced these tokens:
{
"tokens": [
{"token":"high", "start_offset":0, "end_offset":4, "type":"<ALPHANUM>", "position":0},
{"token":"performance", "start_offset":5, "end_offset":16, "type":"<ALPHANUM>", "position":1},
{"token":"running", "start_offset":17, "end_offset":24, "type":"<ALPHANUM>", "position":2},
{"token":"shoes", "start_offset":25, "end_offset":30, "type":"<ALPHANUM>", "position":3},
{"token":"trail", "start_offset":32, "end_offset":37, "type":"<ALPHANUM>", "position":4},
{"token":"edition", "start_offset":38, "end_offset":45, "type":"<ALPHANUM>", "position":5}
]
}
# Note: 'High-Performance' was split into two tokens and lowercased.
# The hyphen was treated as a token separator, NOT stored as a token.
# Step 3 response (abbreviated) — 1 segment, freshly flushed:
{
"_shards": {"total":3,"successful":3,"failed":0},
"indices": {
"products": {
"shards": {
"0": [{"num_docs":0,"size_in_bytes":3726,"committed":true,"search":true,"version":"9.6.0"}]
}
}
}
}
Mappings Deep Dive — Controlling How Elasticsearch Interprets Your Documents
A mapping is Elasticsearch's schema definition. Unlike a relational database, ES will happily infer a mapping from your first document — a feature called dynamic mapping. On day one this feels magical. By month six it's the reason your index has 47 unexpected fields, a price field mapped as text instead of float, and a production incident.
Every field type carries performance implications. A text field is analyzed and stored in the inverted index — great for full-text search, useless for exact matching or aggregations. A keyword field is stored as-is (no analysis) and indexed as a single token — perfect for filtering, sorting, and aggregations. Many fields need both: a product name you want to search fuzzily AND filter exactly. For that, Elasticsearch supports multi-fields — store one value in two sub-fields with different types.
doc_values is another critical concept. By default, numeric, date, keyword, and boolean fields store a column-oriented data structure alongside the inverted index. This is what powers fast aggregations and sorting. Disabling doc_values saves disk space but kills sorting and aggregation on that field — a trade-off worth making on fields you know will never be sorted or aggregated.
_source is the original JSON document stored verbatim. It's what _search returns. You can disable it to save disk, but then you lose update operations and the ability to reindex — almost never worth it in production.
# ───────────────────────────────────────────────────────────────── # Create an index with a carefully designed explicit mapping. # Never rely on dynamic mapping in production — be intentional. # ───────────────────────────────────────────────────────────────── curl -X PUT "localhost:9200/product_catalog" \ -H "Content-Type: application/json" \ -d '{ "settings": { "number_of_shards": 3, "number_of_replicas": 1, "analysis": { "analyzer": { "product_name_analyzer": { "type": "custom", "tokenizer": "standard", "filter": ["lowercase", "asciifolding", "product_shingle"] } }, "filter": { "product_shingle": { "type": "shingle", "min_shingle_size": 2, "max_shingle_size": 3 } } } }, "mappings": { "dynamic": "strict", "properties": { "product_id": { "type": "keyword" }, "product_name": { "type": "text", "analyzer": "product_name_analyzer", "fields": { "raw": { "type": "keyword", "ignore_above": 256 }, "suggest": { "type": "search_as_you_type" } } }, "price_usd": { "type": "scaled_float", "scaling_factor": 100 }, "category_tags": { "type": "keyword" }, "description": { "type": "text", "analyzer": "english", "doc_values": false, "norms": false }, "in_stock": { "type": "boolean" }, "created_at": { "type": "date", "format": "strict_date_optional_time||epoch_millis" }, "warehouse_location": { "type": "geo_point" } } } }' # ───────────────────────────────────────────────────────────────── # Index a sample product document to verify the mapping accepts it. # ───────────────────────────────────────────────────────────────── curl -X POST "localhost:9200/product_catalog/_doc/SKU-9821" \ -H "Content-Type: application/json" \ -d '{ "product_id": "SKU-9821", "product_name": "High-Performance Trail Running Shoes", "price_usd": 129.99, "category_tags": ["footwear", "trail", "running"], "description": "Lightweight shoes engineered for off-road terrain with superior grip.", "in_stock": true, "created_at": "2024-03-15T09:30:00Z", "warehouse_location": { "lat": 37.7749, "lon": -122.4194 } }' # ───────────────────────────────────────────────────────────────── # Try indexing a document with an UNMAPPED field — dynamic:strict # will reject it, protecting schema integrity. # ───────────────────────────────────────────────────────────────── curl -X POST "localhost:9200/product_catalog/_doc" \ -H "Content-Type: application/json" \ -d '{"product_id":"X","product_name":"Test","price_usd":1,"category_tags":[],"description":"x","in_stock":true,"created_at":"2024-01-01","warehouse_location":{"lat":0,"lon":0},"unexpected_field":"surprise"}'
{"acknowledged":true,"shards_acknowledged":true,"index":"product_catalog"}
# Document indexed successfully:
{
"_index": "product_catalog",
"_id": "SKU-9821",
"_version": 1,
"result": "created",
"_shards": {"total":2,"successful":1,"failed":0},
"_seq_no": 0,
"_primary_term": 1
}
# Strict mapping REJECTS the document with the unknown field:
{
"error": {
"type": "strict_dynamic_mapping_exception",
"reason": "mapping set to strict, dynamic introduction of [unexpected_field] within [_doc] is not allowed"
},
"status": 400
}
Query DSL Internals — Query Context vs Filter Context and Why It Changes Everything
Elasticsearch's Query DSL has two fundamentally different execution modes, and mixing them up is the most common performance mistake in production clusters.
Query context computes a _score — a relevance float calculated using BM25 (a probabilistic ranking function that accounts for term frequency, inverse document frequency, and field length). Every document that matches is scored. Scoring is expensive and the results cannot be cached.
Filter context asks a binary yes/no question: does this document match? No scoring. The result is a bitset — a compact binary array where bit N is 1 if document N matches. Elasticsearch aggressively caches these bitsets in the filter cache, keyed by query. The second time the same filter runs on a shard, it reads from cache in microseconds.
The practical rule: use query context only when relevance ranking matters (full-text search). Use filter context for everything else — dates, status fields, boolean flags, numeric ranges, keyword exact matches. A bool query lets you combine both in the same request.
function_score and script_score let you customise scoring — for example, boosting products with higher inventory or closer warehouse location. These are powerful but expensive: they disable the filter cache for the scoring component and run per-document.
# ───────────────────────────────────────────────────────────────── # A production-grade bool query combining: # - must (query context) → relevance scored, NOT cached # - filter (filter context) → binary, bitset-cached # - should (query context) → boosts score if matched, optional # - must_not (filter context) → binary exclusion, cached # ───────────────────────────────────────────────────────────────── curl -X GET "localhost:9200/product_catalog/_search?explain=false" \ -H "Content-Type: application/json" \ -d '{ "size": 10, "_source": ["product_name", "price_usd", "category_tags", "in_stock"], "query": { "bool": { "must": [ { "multi_match": { "query": "trail running shoes", "fields": ["product_name^3", "description"], "type": "best_fields", "fuzziness": "AUTO" } } ], "filter": [ { "term": { "in_stock": true } }, { "terms": { "category_tags": ["trail", "running"] } }, { "range": { "price_usd": { "gte": 50.00, "lte": 200.00 } } }, { "geo_distance": { "distance": "100km", "warehouse_location": { "lat": 37.7749, "lon": -122.4194 } } } ], "should": [ { "term": { "category_tags": { "value": "trail", "boost": 1.5 } } } ], "must_not": [ { "term": { "category_tags": "discontinued" } } ], "minimum_should_match": 0 } }, "sort": [ { "_score": { "order": "desc" } }, { "price_usd": { "order": "asc" } } ], "aggs": { "price_histogram": { "histogram": { "field": "price_usd", "interval": 25 } }, "tags_breakdown": { "terms": { "field": "category_tags", "size": 10 } } } }' # ───────────────────────────────────────────────────────────────── # Profile the query to see where time is actually being spent. # Run this in dev/staging — profile output is verbose. # ───────────────────────────────────────────────────────────────── curl -X GET "localhost:9200/product_catalog/_search" \ -H "Content-Type: application/json" \ -d '{"profile": true, "query": {"match": {"product_name": "trail shoes"}}}'
{
"took": 4,
"timed_out": false,
"_shards": {"total":3,"successful":3,"skipped":0,"failed":0},
"hits": {
"total": {"value":1,"relation":"eq"},
"max_score": 3.7185938,
"hits": [
{
"_index": "product_catalog",
"_id": "SKU-9821",
"_score": 3.7185938,
"_source": {
"product_name": "High-Performance Trail Running Shoes",
"price_usd": 129.99,
"category_tags": ["footwear", "trail", "running"],
"in_stock": true
}
}
]
},
"aggregations": {
"price_histogram": {
"buckets": [
{"key": 125.0, "doc_count": 1}
]
},
"tags_breakdown": {
"buckets": [
{"key": "footwear", "doc_count": 1},
{"key": "running", "doc_count": 1},
{"key": "trail", "doc_count": 1}
]
}
}
}
# 'took: 4ms' — filter context fields (in_stock, tags, price, geo)
# were evaluated from bitset cache. Only the multi_match scored.
# The profile endpoint would show TermQuery and BooleanQuery timing
# broken down per shard — use it to find expensive query branches.
Production Cluster Health — Routing, Near-Real-Time Search and Reindexing Strategies
Three operational concepts separate engineers who run Elasticsearch well from those who don't: document routing, the refresh cycle, and reindexing.
Routing determines which shard receives a document. By default: shard = hash(document_id) % number_of_primary_shards. This distributes evenly but means every search must query all shards and merge results (scatter-gather). Custom routing lets you send related documents to the same shard — for example, routing all documents for a tenant by tenant_id. Searches with a routing key skip non-matching shards entirely, dramatically reducing cross-node network traffic in multi-tenant architectures.
Near-real-time (NRT) search: indexed documents aren't immediately searchable. First, the in-memory buffer is refreshed — flushed to a new Lucene segment and opened for search. The default refresh_interval is 1 second. During bulk indexing, set it to -1 (disable) for a massive throughput boost, then re-enable after the load. This is how teams import 100M documents in minutes instead of hours.
Reindexing is unavoidable. You'll change a mapping, need to add a new analyzer, or split shards. The Reindex API copies documents from one index to another. The production pattern is: create new index → reindex in background → use an alias pointing to the old index → atomically swap the alias to the new index → delete the old index. Users never see downtime.
# ───────────────────────────────────────────────────────────────── # PRODUCTION ZERO-DOWNTIME REINDEX PATTERN # ───────────────────────────────────────────────────────────────── # Step 1: Create an alias pointing to the current index. # Application code always reads/writes through the alias, never the raw index name. curl -X POST "localhost:9200/_aliases" \ -H "Content-Type: application/json" \ -d '{ "actions": [ { "add": { "index": "product_catalog", "alias": "products_live" } } ] }' # Step 2: Create the new index with updated mapping (e.g., new field or analyzer). curl -X PUT "localhost:9200/product_catalog_v2" \ -H "Content-Type: application/json" \ -d '{ "settings": { "number_of_shards": 5, "number_of_replicas": 0, "refresh_interval": "-1" }, "mappings": { "dynamic": "strict", "properties": { "product_id": { "type": "keyword" }, "product_name": { "type": "text", "analyzer": "english", "fields": { "raw": { "type": "keyword" } } }, "price_usd": { "type": "scaled_float", "scaling_factor": 100 }, "category_tags":{ "type": "keyword" }, "description": { "type": "text", "analyzer": "english" }, "in_stock": { "type": "boolean" }, "created_at": { "type": "date" }, "warehouse_location": { "type": "geo_point" }, "brand": { "type": "keyword" } } } }' # Step 3: Run the Reindex API — copies docs from old to new index in the background. # Use 'wait_for_completion: false' for large datasets; track with Tasks API. curl -X POST "localhost:9200/_reindex?wait_for_completion=false" \ -H "Content-Type: application/json" \ -d '{ "conflicts": "proceed", "source": { "index": "product_catalog", "size": 1000 }, "dest": { "index": "product_catalog_v2", "op_type": "create" } }' # Step 4: Poll the task to see reindex progress. # Replace TASK_ID with the ID returned from the reindex call above. curl "localhost:9200/_tasks/TASK_ID?pretty" # Step 5: After reindex completes — restore settings, then atomically swap the alias. curl -X PUT "localhost:9200/product_catalog_v2/_settings" \ -H "Content-Type: application/json" \ -d '{"index": {"number_of_replicas": 1, "refresh_interval": "1s"}}' curl -X POST "localhost:9200/_aliases" \ -H "Content-Type: application/json" \ -d '{ "actions": [ { "remove": { "index": "product_catalog", "alias": "products_live" } }, { "add": { "index": "product_catalog_v2", "alias": "products_live" } } ] }' # Step 6: Verify the alias now points to the new index. curl "localhost:9200/_cat/aliases/products_live?v"
{"acknowledged":true}
# Step 2 — new index created with 0 replicas for faster reindex ingestion:
{"acknowledged":true,"shards_acknowledged":true,"index":"product_catalog_v2"}
# Step 3 — reindex started asynchronously:
{"task":"oTUltX4IQMOUUVeiohTt8A:12345"}
# Step 4 — task progress (mid-reindex):
{
"completed": false,
"task": {
"action": "indices:data/write/reindex",
"description": "reindex from [product_catalog] to [product_catalog_v2]",
"status": {
"total": 50000,
"updated": 0,
"created": 32400,
"deleted": 0,
"batches": 33,
"version_conflicts": 0
}
}
}
# Step 6 — alias now points to v2 with zero application downtime:
aliase index filter routing.index routing.search is_write_index
products_live product_catalog_v2 - - - -
| Aspect | Query Context (must/should) | Filter Context (filter/must_not) |
|---|---|---|
| Produces relevance score | Yes — BM25 float _score | No — binary match only |
| Result caching | Never cached | Bitset cached per shard |
| Performance | Slower — per-doc scoring | Faster — cache hit after first run |
| Use when | Full-text search, ranking matters | Filtering by status, date, keyword, range |
| Affects _score | Directly sets or contributes to score | No effect on score |
| Example clause | match, multi_match, query_string | term, terms, range, geo_distance |
| Cache invalidation | N/A | When segment merges or index is refreshed |
| Network overhead | Same scatter-gather across shards | Skipped shards if routing is set |
🎯 Key Takeaways
- The inverted index maps tokens to document IDs — Elasticsearch does expensive analysis work at WRITE time so READ time is fast. Understanding this explains every performance decision the engine makes.
- Primary shard count is set once at index creation and cannot change. Choosing the wrong number is the most expensive architectural mistake in Elasticsearch — aim for 10–50GB per shard and plan for 12–18 months of growth.
- Filter context uses cached bitsets and is orders of magnitude faster than query context for non-ranking constraints. The rule is simple: if the field doesn't affect ranking, it belongs in
bool.filter, notbool.must. - The zero-downtime reindex pattern (create v2 → reindex → swap alias) is the standard production playbook for any mapping change. Always set
refresh_interval: -1andnumber_of_replicas: 0during the reindex, then restore them before cutting over.
⚠ Common Mistakes to Avoid
- ✕Mistake 1: Letting dynamic mapping run wild in production — ES auto-maps new fields and you end up with
priceastext(from a bad first document), meaning all subsequent numeric prices are rejected with a mapper_parsing_exception. Fix: set'dynamic': 'strict'in your mapping and explicitly define every field before going live. Add new fields via the PUT mapping API with a deploy, not on the fly. - ✕Mistake 2: Running expensive aggregations in query context — wrapping a
termsaggregation filter insidemustinstead offiltermeans every aggregation execution rescores documents unnecessarily, disables bitset caching, and can cause GC pressure on large datasets. Fix: always put non-scoring constraints inbool.filter. Atermsfilter that runs 1000 times per second should hit the bitset cache, not the scorer. - ✕Mistake 3: Setting shard count too low at index creation and then needing horizontal scale — with 1 primary shard, all indexing hits one node and search can't parallelize. You can't add shards post-creation. Fix: use the Split API (doubles shard count) as an emergency measure, or better, size shards by expected data volume at design time — 10–50GB per shard is the Elastic-recommended range. Use Index Templates to enforce this automatically for time-based indices.
Interview Questions on This Topic
- QExplain the difference between query context and filter context in Elasticsearch. When would using a `must` clause instead of a `filter` clause cause a production performance problem?
- QA colleague says 'our Elasticsearch writes are slow — can we just add more replicas to fix it?' What's actually happening under the hood, and what would you actually do to improve bulk indexing throughput?
- QIf you need to change the analyzer on a high-traffic Elasticsearch index without any downtime, walk me through exactly how you'd do it — including what you'd configure before the reindex, and how you'd cut over traffic to the new index.
Frequently Asked Questions
How many shards should an Elasticsearch index have?
Elastic's official guidance is 10–50GB per shard. So if your index will hold 150GB of data, 3–5 primary shards is a sensible starting point. Oversharding (e.g., 50 shards for 1GB of data) wastes heap memory because each shard carries Lucene overhead, and it slows down searches because the coordinator has to scatter-gather across more shards. Undershardin prevents horizontal scaling. Size for where you'll be in 12–18 months, not today.
What is the difference between an Elasticsearch index and a Lucene index?
An Elasticsearch index is a logical namespace that groups documents and is split across N primary shards. Each primary shard IS a Lucene index — a self-contained set of Lucene segments stored on one node. When you search an ES index, the coordinator node sends the query to all relevant shards (each running a Lucene search independently), collects results, merges and ranks them, then returns the final response. ES is essentially a distributed coordination layer on top of many Lucene instances.
Why does Elasticsearch say a document is searchable after 1 second even though I just indexed it?
This is Elasticsearch's near-real-time (NRT) behaviour. When you index a document, it goes into an in-memory write buffer first. Every refresh_interval (default: 1 second), the buffer is flushed into a new Lucene segment and opened for search. Until that refresh happens, the document exists in memory but isn't yet in a searchable segment. If you need a document immediately searchable in tests or specific workflows, use the ?refresh=true parameter on the index request — but don't use this in high-throughput production code as it forces a refresh on every write.
Written and reviewed by senior developers with real-world experience across enterprise, startup and open-source projects. Every article on TheCodeForge is written to be clear, accurate and genuinely useful — not just SEO filler.