Senior 7 min · March 06, 2026

Neo4j Index Fragmentation — UUID Bulk Imports 10x Slowdown

Index fragmentation from random UUID bulk inserts slowed Neo4j lookups 10x (20ms to 2000ms).

N
Naren · Founder
Plain-English first. Then code. Then the interview question.
About
 ● Production Incident 🔎 Debug Guide
Quick Answer
  • Neo4j stores nodes and relationships as fixed-size records with direct pointers — no JOINs needed.
  • Cypher is declarative; the planner picks a strategy based on cardinality estimates.
  • B-tree indexes accelerate node lookups; full-text indexes for string searches.
  • Missing or wrong indexes are the #1 cause of production slow queries.
  • Memory allocation (page cache vs heap) directly impacts traversal speed.
  • Always use PROFILE to see actual row counts — EXPLAIN guesses.
Plain-English First

Imagine every person in your school has a string connecting them to every friend, teacher, and club they belong to. A regular spreadsheet would need a massive lookup table just to find who knows who. Neo4j is the database that stores those strings directly — the connections ARE the data, not an afterthought. When you ask 'who are my friend's friends?', Neo4j just follows the strings instead of scanning millions of rows. That's the magic — no table scans, just pointer walks.

Most performance problems in production databases aren't caused by bad queries — they're caused by using the wrong data model. When your application's core questions are about relationships — fraud rings, recommendation engines, access control graphs, supply chain dependencies — a relational database forces you to JOIN your way through the problem. Those JOINs get exponentially slower as your dataset grows, not because your DBA made a mistake, but because the relational model was never designed for highly connected data.

Neo4j solves this with a property graph model where relationships are first-class, physically stored citizens. Unlike a relational database that must compute relationships at query time via JOINs, Neo4j pre-materializes every relationship as a pointer in storage. Traversing a million-hop graph takes the same time per hop whether your database has 100 nodes or 100 billion — a property called index-free adjacency. This is the core architectural decision that makes Neo4j structurally different from every relational or document database you've used.

By the end of this article you'll understand how Neo4j stores data on disk, how Cypher queries are planned and executed, which index types to choose for different access patterns, where the real performance cliffs are in production, and the gotchas that routinely bite engineers who come from a relational background. You'll walk away able to design a graph schema, write production-quality Cypher, and explain Neo4j's internal architecture to an interviewer or a skeptical CTO.

Here's the thing: if you're migrating from PostgreSQL, you'll find Cypher's syntax refreshingly different and the index-free adjacency a game-changer for deep traversals.

What is Neo4j Graph Database Basics?

Neo4j's property graph model stores entities as nodes and connections as relationships. Each node can have any number of key-value properties. Relationships are directed, named, and can also have properties. This model maps directly to how your brain thinks about connected data — people, transactions, places, events — and the paths between them.

When you run MATCH (a:Person)-[:KNOWS]->(b:Person) RETURN a,b, Neo4j doesn't perform a JOIN. It follows a pointer from node a to the relationship record, then to node b. That's it. One memory dereference per hop.

Crucially, this means the cost of traversing a path is proportional to the number of hops, not the total graph size. That's why you can do 10-hop queries on a billion-node graph and get consistent sub-second response times. The trade-off? Writing data is more expensive because every relationship update must update multiple physical pointers. But for read-heavy graph workloads, it's a win.

ForgeExample.javaDATABASE
1
2
3
4
5
6
7
8
// TheCodeForgeNeo4j Graph Database Basics example
// Always use meaningful names, not x or n
public class ForgeExample {
    public static void main(String[] args) {
        String topic = "Neo4j Graph Database Basics";
        System.out.println("Learning: " + topic + " 🔥");
    }
}
Output
Learning: Neo4j Graph Database Basics 🔥
Pointer, Not Lookup
  • Start node: 15 bytes, points to first relationship and first property.
  • Relationship: 34 bytes, includes type ID, next/prev for both directions.
  • Property chain: dynamic, each property record ~41 bytes plus key/value size.
  • Reading one relationship = one disk page (if cached, one memory access).
  • In a relational DB, one join = index lookup + B-tree traversal (multiple pages).
Production Insight
New users often treat Neo4j like SQL: they create indexes on every column and expect magic.
Missing indexes cause NodeByLabelScan scans that degrade linearly with data size.
Always profile your top 5 queries in staging before going live.
The #1 rookie mistake: assuming Cypher automatically uses indexes like SQL does.
Key Takeaway
Understand the storage model before writing queries.
Indexes are not optional — they're the difference between 10ms and 10s.
Know your traversal patterns before schema design.
When to Use Neo4j vs Relational
IfYour queries involve deep traversals (> 3 hops) or variable-length paths
UseNeo4j. The traversal cost per hop is constant, unlike relational JOIN chains that explode with depth.
IfYour data model is heavily normalized with many many-to-many relationships
UseNeo4j is a strong fit. The graph model maps directly to the problem without bridge tables.
IfYour primary access pattern is single-record lookups by primary key
UseStick with relational. An index on PK in Postgres is as fast as Neo4j for that exact query, and simpler to operate.
IfYou need ACID compliance and transactional writes across many entities
UseBoth work, but Neo4j achieves ACID with record-level locking. Ensure your write patterns don't create hot spots on single nodes.

Neo4j Storage Internals: How Nodes and Relationships Live on Disk

Neo4j's physical storage model is the foundation of its speed. Each node is stored as a fixed-size record (15 bytes for the node itself, plus property chain pointers). Relationships are also fixed-size records (34 bytes) with start node ID, end node ID, relationship type, and pointers to previous/next relationship for both nodes. This is the 'index-free adjacency' — from any node you can walk all its relationships by following in-memory pointers, not hash lookups. The property chain links to a separate property store where key-value pairs are stored as dynamic records.

This matters in production: a traversal of 1,000 relationships reads exactly 1,000 relationship records, regardless of total graph size. That's why graph queries stay fast as data grows — the cost per hop is constant. The downside? Storage is rigid. Every node occupies the same fixed-size slot even if it has many properties (the rest go to overflow). Plan your property layout to avoid overflow chains that add extra reads.

A common trap: storing an array of 10,000 IDs on a single node forces the property chain to span many overflow records. Each overflow read costs a disk I/O (or page cache miss). That one 'convenient' property can turn a 10ms traversal into a 500ms crawl.

io/thecodeforge/neo4j/NodeCreation.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
package io.thecodeforge.neo4j;

import org.neo4j.driver.*;

public class NodeCreation {
    public static void main(String[] args) {
        try (Driver driver = GraphDatabase.driver("bolt://localhost:7687",
                AuthTokens.basic("neo4j", "password"))) {
            try (Session session = driver.session()) {
                session.run("CREATE (u:User {name: $name, email: $email})",
                        Values.parameters("name", "Alice", "email", "alice@corp.com"));
            }
        }
    }
}
Output
Node created in store. Each User node occupies 15B + property chain size.
Overflow Chain Trap
If you store a large array or long string on a node, Neo4j creates an overflow record chain. Each extra record costs an additional page read. For a 10KB property, you'll have ~250 overflow records, turning every node retrieval into 250 disk I/Os. Move large data to external storage, or model it as separate connected nodes.
Production Insight
Node record size is fixed — storing many small properties is fine, but one large string property forces an overflow record (extra I/O).
Never store blobs in property values; use external storage and store a reference.
If you see many overflow records in db.index.status(), consider redesigning the schema.
Monitor overflow with db.index.status() periodically.
Key Takeaway
Neo4j pre-links relationships as physical pointers.
Constant-time per hop, regardless of graph size.
Plan for property size to avoid overflow chains.
Respect record size limits — overflow kills performance.
Storage Decision: Fixed vs Variable Properties
IfYou have nodes with 10+ properties and most are rarely queried
UseUse a simpler model: store frequently accessed properties on the node, move rarely used ones to a separate 'profile' node connected by a HAS_PROFILE relationship.
IfProperty values are consistently < 40 characters
UseUse native types (String, Integer, Double) — they fit inline in the property record.
IfYou need to store large text or binary data
UseNever store in property — use a reference (URL or object key) and fetch on demand.

Cypher Execution: How Neo4j Plans and Runs Your Queries

Cypher is a declarative query language, like SQL for graphs. When you send a Cypher query, three steps happen: parsing (syntax tree), semantic analysis (type/scope checking), and query planning. The planner reads the AST and builds a set of possible execution plans using graph statistics — label counts, degree distributions, index selectivity — to estimate cost. It picks the cheapest plan (by default). The plan is a tree of operators like NodeByLabelScan, NodeIndexScan, ExpandAll, Filter, Projection.

The planner uses a cost model based on cardinality estimates from stored statistics (updated periodically or by calling db.stats.collect()). If statistics are stale, the planner may pick a terrible strategy. For example, if it thinks a label has 100 nodes but it actually has 10 million, scanning that label becomes catastrophic.

Execution happens via an interpreted pipeline (default) or an experimental compiled runtime (faster but more memory). In production, use PROFILE to compare estimated vs actual rows. A 10x mismatch means stale stats or a bad query shape.

Here's a common trap: the planner cannot see correlations between properties. So WHERE n.city = 'Berlin' AND n.status = 'active' will multiply selectivities even if all active users are in Berlin. That leads to underestimates.

query_profile.cypherCYPHER
1
2
3
4
5
6
7
8
9
10
11
12
// Step 1: See the plan without running (EXPLAIN)
EXPLAIN MATCH (u:User {email: 'alice@corp.com'}) RETURN u

// Step 2: Execute and get actual row counts (PROFILE)
PROFILE MATCH (u:User {email: 'alice@corp.com'}) RETURN u

// Output of PROFILE shows:
// +--------------+----------------+---------+-----------+----------------+
// | Operator     | Estimated Rows | Rows    | DB Hits   | Memory (Bytes)|
// +--------------+----------------+---------+-----------+----------------+
// | NodeIndexSeek| 1              | 1       | 2         | 10             |
// ...
Output
If Estimated Rows = 1 but actual Rows = 5000, your statistics are stale.
Cardinality Estimation Traps
The planner doesn't know about correlations between properties. For MATCH (u:User {city: 'Berlin', status: 'active'}), it multiplies selectivity (e.g., 0.1 * 0.2 = 0.02) even if all active users are in Berlin. This leads to underestimates and bad index choices. Fix: break such queries into two hops, or manually force index usage with USING INDEX.
Production Insight
Stale stats cause the planner to pick NodeByLabelScan when an index exists.
This is the #1 cause of surprise production slowdowns.
Schedule db.stats.collect('ALL') after any bulk write (import, large delete).
Use PROFILE after schema changes to detect plan degradation.
Key Takeaway
EXPLAIN guesses, PROFILE tells truth.
Stale statistics are the silent killer of query performance.
Force index hints only as a temporary escape hatch.
Plan Diagnosis
IfPROFILE shows NodeByLabelScan but you have an index
UseCheck predicate structure: does it use a function on the indexed property? Does the label match exactly? Is the index online?
IfEstimated rows vs actual rows differ by 100x or more
UseForce statistics refresh: CALL db.stats.collect('ALL'). If still off, consider USING INDEX hint to override planner.
IfExpandAll operator has high DB hits (millions)
UseYou're traversing high-degree nodes. Add LIMIT and reorder the MATCH to filter earlier.

Indexes in Neo4j: Types, Use Cases and How to Choose

Neo4j offers four index types: B-tree (default), Full-Text, Lookup, and Text (for CONTAINS). B-tree indexes are the workhorse — they support equality, range, and prefix searches. Full-text indexes use Lucene under the hood for tokenised queries. Lookup indexes speed up queries by label (NodeByLabelScan) or relationship type (RelationshipTypeScan). Text indexes are a specialised variant for CONTAINS matching.

You create indexes for labels-property pairs that appear in WHERE clauses. The index stores the property value in sorted order with a pointer to the node record. When you query with WHERE n.email = 'x', the planner can seek directly to the leaf page.

Composite indexes (multiple properties) are useful when queries always specify those properties together. Order matters: put the most selective property first. In production, monitor index size via CALL db.indexes() — a fragmented B-tree index can double the number of leaf pages, degrading reads.

create_indexes.cypherCYPHER
1
2
3
4
5
6
7
8
9
10
11
12
13
14
// B-tree index (default)
CREATE INDEX user_email_idx FOR (u:User) ON (u.email);

// Composite index — put high-selectivity column first
CREATE INDEX user_city_status_idx FOR (u:User) ON (u.city, u.status);

// Full-text index for string searching
CREATE FULLTEXT INDEX user_name_ft FOR (u:User) ON EACH [u.name];

// Text index for CONTANS (faster than full-text for exact substring)
CREATE TEXT INDEX user_bio_text FOR (u:User) ON (u.bio);

// Check index status
CALL db.indexes() YIELD name, state, type, labelsOrTypes, properties;
Output
All indexes created. Use `CALL db.indexes()` to verify ONLINE state.
One Index per Predicate Pattern
Don't index every property — index only those used in high-volume WHERE, JOIN on relationships, or ORDER BY on string columns. Each index adds write overhead and occupies page cache memory.
Production Insight
Full-text indexes are not updated synchronously by default — they have an eventual consistency mode.
If you query immediately after a write, the index may miss results.
Use db.index.fulltext.awaitEventuallyConsistentIndexRefresh() before querying if consistency is critical.
Monitor index size growth via CALL db.indexes() to catch fragmentation.
Key Takeaway
Match index type to query pattern — not just existence.
Composite indexes save when predicates are always together.
Full-text indexes are eventually consistent — know the trade-off.
Index Selection Guide
IfQuery uses equality (=) or range (<, >) on a property
UseUse a standard B-tree index on that property.
IfQuery uses CONTAINS or ENDS WITH on a large string property
UseUse a TEXT index (for CONTAINS) or FULLTEXT (for tokenised search).
IfQuery filters on two properties that always appear together
UseCreate a composite B-tree index with the most selective property first.
IfYou frequently scan all nodes of a label with no filter
UseA lookup index (automatic) helps NodeByLabelScan, but consider adding a dummy property filtered by existence.

Production Performance Tuning: Memory, Cache, and Configuration

Neo4j runs on the JVM, so heap and garbage collection matter. Two critical memory pools: page cache (caches graph records from disk) and heap (query execution, transactions). The page cache should be large enough to fit your entire graph (or at least the hot set). Heap is for query results, transaction state, and JVM overhead.

Start with these settings
  • dbms.memory.pagecache.size: set to 80% of available RAM for dedicated servers. Formula: graph store size * 1.2 (oversampling).
  • dbms.memory.heap.max_size: default 512M is too low for any production workload. Start at 4GB and monitor GC with db.tool.gc() or JMX.
  • dbms.memory.heap.initial_size: set equal to max to avoid startup jitter.
  • dbms.tx_state.memory_max_size: cap per transaction to prevent runaway queries from OOMing the heap.

G1GC is the default and works well with large heaps. Watch for concurrent mode failures (increase heap or tune -XX:InitiatingHeapOccupancyPercent).

In production, use neo4j-admin memrec to get recommended memory settings based on your store size.

neo4j.confPROPERTIES
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# TheCodeForgeProduction Neo4j Memory Configuration
# Example for a 32GB RAM server with 200GB store

dbms.memory.heap.initial_size=4G
dbms.memory.heap.max_size=4G
dbms.memory.pagecache.size=20G

dbms.tx_state.memory_max_size=512M

dbms.memory.off_heap.max_size=2G

# Prevent query results from consuming all heap
dbms.memory.query_max_size=256M

# G1GC tuning
# Add to JAVA_OPTS: -XX:+UseG1GC -XX:MaxGCPauseMillis=100 -XX:G1HeapRegionSize=32m
Output
Restart Neo4j to apply. Verify memory usage with `CALL dbms.listConfig() YIELD name, value WHERE name CONTAINS 'memory'`.
Swap Is the Enemy
If your page cache exceeds physical RAM, the OS swaps. Swapping destroys performance — graph traversals become disk-bound. Never set page cache size above available RAM minus heap and OS overhead. Use free -m to check before deploying.
Production Insight
G1GC concurrent mode failure happens when the heap fills before the concurrent GC cycle finishes.
Solution: increase heap size or lower -XX:InitiatingHeapOccupancyPercent (default 45).
Monitor GC logs with gcviewer or export via JMX to Prometheus.
Use neo4j-admin memrec for baseline recommendations.
Also monitor page fault rate with perf stat -e major-faults,minor-faults to see if page cache is too small.
Key Takeaway
Page cache > heap for graph workloads.
G1GC needs tuning for large heaps.
Swap kills performance — stay in physical memory.
Memory Allocation Decision
IfYour graph fits entirely in page cache (hot set < RAM)
UseSet page cache to fit all store files. Queries will run at memory speed.
IfGraph is larger than available RAM
UseSet page cache to cover the working set. Use dbms.memory.pagecache.warmup.enabled=true to load hot pages on startup.
IfFrequent OutOfMemoryErrors during complex queries
UseReduce dbms.tx_state.memory_max_size, add LIMIT on queries, and consider splitting large traversals into batches.

Common Production Gotchas: Mistakes That Sabotage Neo4j Performance

Even with perfect schema and indexes, several patterns routinely cause production pain:

  1. Accidental Cartesian Products: When a MATCH pattern matches multiple paths, the planner may generate a cross product. For example, MATCH (a:User), (b:User) without a relationship returns N*N rows. Always verify with PROFILE — a huge DB Hits spike is the clue.
  2. Unbounded Variable-Length Paths: MATCH (x)-[]->(y) without a bound can traverse the entire graph, exhausting heap. Always specify a range: [1..5].
  3. Stale Statistics: Already discussed — but note that statistics are not automatically updated after DELETE operations. Schedule a periodic db.stats.collect('ALL').
  4. Large Property Lists: Storing an array of 10,000 IDs on a node looks convenient but causes massive property record chains. Normalise into separate relationship-connected nodes.
  5. Over-indexing: Too many indexes increase write latency and page cache pressure. An index for every property is wasteful. Index only the predicates used in hot queries.
  6. Not using batch operations for large imports: Using separate CREATE statements for each node/relationship causes massive transaction overhead. Use UNWIND or the LOAD CSV command for bulk imports.
gotchas.cypherCYPHER
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
// Gotcha 1: Cartesian product (DON'T)
MATCH (u:User), (p:Product)
WHERE u.email = 'alice@corp.com'
RETURN u, p

// Fix: Add relationship
MATCH (u:User)-[:BOUGHT]->(p:Product)
WHERE u.email = 'alice@corp.com'
RETURN u, p

// Gotcha 2: Unbounded var-length path (DON'T)
MATCH (a)-[*]->(b)

// Fix: Always specify max depth
MATCH (a)-[*1..5]->(b)

// Gotcha 3: Checking for index being used
PROFILE MATCH (u:User {email: 'alice@corp.com'}) RETURN u
// Look for NodeIndexSeek in plan

// Gotcha 4: Slow bulk import (DON'T)
CREATE (:User {name: 'Alice'})
CREATE (:User {name: 'Bob'})
// ... 10,000 separate CREATEs

// Fix: Use UNWIND for batch insert
UNWIND $users AS user
CREATE (:User {name: user.name})
Output
With PROFILE, compare estimated and actual rows. If rows = product of label sizes, you have a Cartesian.
The Single-Query Guard
Before deploying any query, run PROFILE with a single row output. Check for CartesianProduct or Apply operators that indicate unintended cross products. Also verify that the estimated rows match the actual rows within 2x.
Production Insight
The costliest mistake is deploying a query that worked in dev (with 1k nodes) but explodes in prod (1M nodes).
Always test queries against production-scale data, even if it's a restored subset.
Use LIMIT in development to cap accidental explosions.
Automate PROFILE checks in CI pipeline.
Key Takeaway
Unbounded paths are time bombs.
Always LIMIT your queries in production.
Test at scale before deploying.
Query Safety Pre-flight
IfQuery takes > 1s in production but was < 10ms in dev
UseCheck for Cartesian product or unbounded traversal. Add LIMIT and re-evaluate.
IfAfter bulk insert, previously fast queries slow down
UseCheck index fragmentation (db.index.status) and statistics (db.stats.retrieve). Rebuild index if needed.
IfWrite performance degraded after adding indexes
UseToo many indexes slow writes. Drop unused indexes identified by db.indexes() and correlation with query patterns.

Graph Data Modeling Best Practices for Production

Good graph modeling is the difference between a smooth production system and a tangled mess. Three rules: avoid supernodes (nodes with tens of thousands of relationships), model actions as relationships not properties, and use labels to group nodes logically.

A supernode — like a 'Everyone' node connected to all users — kills traversal performance because ExpandAll on that node reads millions of relationships. Solution: break it into domain-specific star nodes or use index-assisted lookups instead of direct traversal.

Modeling tip: if you find yourself storing 'transaction_date' as a node property and then querying by time range, consider making 'Date' a node and connecting transactions to it. That turns a property filter into a relationship traversal, which is faster and more natural for time-series patterns.

Also, use existence constraints to enforce schema at the database level: CREATE CONSTRAINT FOR (u:User) REQUIRE u.email IS UNIQUE. This also creates an index — two birds with one stone.

io/thecodeforge/neo4j/ModelingBestPractices.cypherCYPHER
1
2
3
4
5
6
7
8
9
10
11
12
13
14
// Avoid supernodes: don't connect all users to a single 'AllUsers' node
// Instead, use label-based indexes

// Good: enforce uniqueness and create index
CREATE CONSTRAINT user_email_unique IF NOT EXISTS FOR (u:User) REQUIRE u.email IS UNIQUE;

// Model time as a node for range traversals
CREATE (d:Date {date: '2026-01-01'})
MATCH (t:Transaction {date: '2026-01-01'})
MERGE (t)-[:OCCURRED_ON]->(d);

// Query: all transactions on specific date
MATCH (t:Transaction)-[:OCCURRED_ON]->(d:Date {date: '2026-01-01'})
RETURN t
Output
Constraint created. Time node used for efficient traversal.
Supernode Watch
If a single node has more than 10,000 relationships, you have a supernode. Profile your ExpandAll operator — if DB hits are in the millions, that node is the culprit. Remodel to distribute the degree.
Production Insight
Supernodes are the silent killer of graph performance — one node with 100k relationships can slow every traversal through it.
Monitor using MATCH (n) RETURN labels(n), size((n)--()) as deg ORDER BY deg DESC LIMIT 10.
If you see a node with degree > 10k, redesign your model.
Key Takeaway
Avoid supernodes; use labels and constraints.
Model connections as relationships, not properties.
Test your model against production data volume before deploying.
Modeling Choice: Property vs Relationship
IfYou need to query by the value frequently
UseStore as property and create an index. The index seek is fast.
IfThe value represents a connection between two entities
UseModel as a relationship. That's what the graph is for.
IfYou have a time series or hierarchy
UseModel as separate nodes chained by relationships. Enables efficient traversal without property scans.

Monitoring and Alerting for Neo4j Production

Even with a well-tuned graph, production incidents happen. You need visibility into four key areas: query performance, index health, memory pressure, and replication lag (if clustered).

For query performance, set up Prometheus exporters to capture neo4j_query_execution_time and neo4j_query_memory metrics. Create alerts for queries that exceed 500ms p99. Use CALL dbms.listQueries() to capture slow queries before they die.

Index health: monitor CALL db.index.status() for size/entries ratio. A ratio above 1.5 indicates fragmentation. Alert on that.

Memory: track page cache hit ratio (neo4j_page_cache_hits / total). A ratio below 99% means you need more page cache or a smaller hot set.

Log tailing: set up grep 'OUT_OF_MEMORY' /var/log/neo4j/debug.log to catch OOMs early. Use the HTTP API for real-time metrics: GET /db/manage/server/jmx/domain/org.neo4j/bean%3Aname%3DPageCache.

monitor.shBASH
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
#!/bin/bash
# TheCodeForgeNeo4j Monitoring Script
# Capture key metrics every 60 seconds

while true; do
  # Query performance: p99 latency
  echo "--- $(date) ---" >> /var/log/neo4j_monitor.log
  curl -s "http://localhost:7474/db/manage/server/jmx/domain/org.neo4j/bean%3Aname%3DQueryExecution" | jq '.beans[].queryExecutionTime.p99' >> /var/log/neo4j_monitor.log
  
  # Page cache hit ratio
  curl -s "http://localhost:7474/db/manage/server/jmx/domain/org.neo4j/bean%3Aname%3DPageCache" | jq '.beans[].hitRatio' >> /var/log/neo4j_monitor.log
  
  # Index fragmentation check (requires admin authentication)
  cypher-shell -u neo4j -p password "CALL db.index.status() YIELD index_name, size, num_entries WHERE size / num_entries > 1.5 RETURN index_name" >> /var/log/neo4j_monitor.log
  
  sleep 60
done
Output
Logs captured. Use Grafana dashboards with Prometheus for real-time alerts.
Key Metrics Dashboard
Focus on three panels: Query Latency (p50, p95, p99), Page Cache Hit Ratio, and Index Fragmentation Score. If fragmentation exceeds 1.5 or hit ratio drops below 99%, page the on-call engineer.
Production Insight
Most teams don't monitor index fragmentation until a 'sudden' slowdown triggers an incident.
Set up proactive alerts: if index size grows by 20% in a day without corresponding data growth, investigate.
Use neo4j-admin check-consistency weekly to catch store corruption early.
Replication lag in clusters: if more than 10 seconds behind, your read replicas are stale — redirect reads to the leader.
Key Takeaway
Monitor page cache hit ratio — it's your canary.
Index fragmentation is invisible until it bites.
Alert on query p99, not average — p99 protects your users.
Alert Priority Triage
IfPage cache hit ratio < 90%
UseCritical — application is disk-bound. Increase page cache or reduce load immediately.
IfIndex fragmentation > 2.0 AND query latency > 1s
UseHigh — drop and recreate the fragmented index. Schedule during low-write window.
Ifp99 query latency > 500ms but no index issues
UseMedium — profile top queries; likely stale statistics or a high-degree node traversal.
IfReplication lag > 30s
UseMedium — check network bandwidth or leader write load. Consider adding read replicas.
● Production incidentPOST-MORTEMseverity: high

Index Fragmentation Slowed Read Queries 10x in a Recommendation Engine

Symptom
Cypher queries with node lookups by property (e.g., MATCH (u:User {email: $email})) degraded from ~20ms to ~2000ms. Other queries unaffected.
Assumption
Assumed the index was being used correctly because it existed. No one checked index fragmentation after the bulk load.
Root cause
The bulk import inserted 10 million users with random UUIDs as internal IDs. Neo4j's default B-tree index became heavily fragmented — leaf pages had many dead entries, causing excessive disk reads per lookup.
Fix
Rebuilt all indexes with CALL db.index.fulltext.awaitEventuallyConsistentIndexRefresh followed by CREATE INDEX ... IF NOT EXISTS after dropping and recreating. Then switched to sequential internal IDs for bulk loads by using db.ids.reuse_types_over_deleted_nodes configuration.
Key lesson
  • Index fragmentation happens silently — monitor index page density via db.index.status() procedures.
  • Prefer sequential IDs (like auto-increment or timestamp-based) for bulk inserts to reduce fragmentation.
  • Always rebuild indexes after large bulk loads, especially for high-selectivity properties used in lookups.
  • Use PROFILE regularly — the query plan won't tell you about physical index health.
Production debug guideSymptom → Action for the three most common production issues4 entries
Symptom · 01
Query is slow but EXPLAIN shows index usage
Fix
Switch to PROFILE to compare estimated vs actual rows. Large disparity means cardinality estimates are off — rebuild statistics with db.stats.retrieve('GRAPH COUNTS').
Symptom · 02
Index exists but is not used by the planner
Fix
Check predicate shape: the index property must be compared using equality (=) or IN, not functions like toUpper() or substring. Also ensure the label is present in the MATCH pattern.
Symptom · 03
Out of memory during large traversal
Fix
Enable dbms.memory.heap.max_size and dbms.memory.pagecache.size. For traversals, throttle with LIMIT and use UNWIND to batch. Check for accidental cartesian products in the query.
Symptom · 04
Full scan on label despite existing index
Fix
Verify the index is ONLINE via CALL db.indexes(). Check if the predicate uses a function (e.g., toUpper) or if the type is wrong. Force index usage with USING INDEX as a temporary measure.
★ Quick Reference: Debugging Slow Cypher QueriesStart here when production queries degrade. Each row is a symptom with commands to run.
Single node lookup is slow (<100ms becomes >500ms)
Immediate action
Check index existence and fragmentation
Commands
CALL db.indexes() YIELD name, state, type WHERE state='ONLINE'
CALL db.index.status('index_name') YIELD index_name, num_entries, size
Fix now
Drop and recreate the index: DROP INDEX index_name; CREATE INDEX FOR (n:Label) ON (n.prop)
Cardinality mismatch: PROFILE shows rows 100x estimate+
Immediate action
Refresh graph statistics
Commands
CALL db.stats.retrieve('GRAPH COUNTS')
CALL db.stats.collect('GRAPH COUNTS')
Fix now
Run CALL db.stats.collect('ALL') to force full statistics update
Query runs out of heap memory (OOM)+
Immediate action
Identify the query that consumed memory
Commands
CALL dbms.listQueries() YIELD queryId, query, elapsedTimeMillis, allocationBytes
CALL dbms.killQuery(queryId)
Fix now
Add LIMIT clause and ensure no unbounded variable-length paths
Full scan on large label (NodeByLabelScan shown in PROFILE)+
Immediate action
Determine if existing index could match the predicate
Commands
CALL db.labels() YIELD label WHERE size([index in db.indexes() WHERE index.labels[0]=label]) = 0
EXPLAIN MATCH (n:Label) WHERE n.prop = 'value'
Fix now
Create index: CREATE INDEX FOR (n:Label) ON (n.prop)
Write performance degraded after adding indexes+
Immediate action
Audit index usage and drop unused ones
Commands
CALL db.indexes() YIELD name, state, type, labelsOrTypes, properties
PROFILE MATCH (n:Label) WHERE n.prop = 'value' RETURN n LIMIT 1
Fix now
Drop unused indexes: DROP INDEX unused_index_name
Neo4j Key Concepts Comparison
ConceptUse CaseExample
Neo4j Graph Database BasicsCore usageSee code above
Index-Free AdjacencyFast graph traversal10k hops = 10k pointer dereferences
B-tree IndexEquality/range lookupsCREATE INDEX FOR (n:User) ON (n.email)
Full-Text IndexTokenised searchCREATE FULLTEXT INDEX FOR (n:User) ON EACH [n.name]
Page CacheCaching graph recordsdbms.memory.pagecache.size=20G

Key takeaways

1
Neo4j stores relationships as physical pointers, enabling constant-time traversal per hop regardless of graph size.
2
Cypher query planning depends on cardinality estimates—stale statistics are the #1 cause of bad plans.
3
B-tree indexes for equality/range, Full-Text for search, TEXT for CONTAINS. Match index type to your predicate.
4
Memory configuration
page cache should dominate (80% of RAM) over heap. Never let page cache exceed physical RAM.
5
Always use PROFILE to validate plans before deploying to production.
6
Unbounded variable-length paths are production-time bombs—always specify a max depth.
7
Index fragmentation is silent—monitor and rebuild after bulk loads.
8
Supernodes degrade traversals exponentially—design your graph to avoid them.
9
Monitor page cache hit ratio below 99% means you need more memory or a smaller hot set.
10
Set up alerts on query p99 latency, not average—average hides the outliers that kill user experience.

Common mistakes to avoid

9 patterns
×

Memorising syntax before understanding the concept

Symptom
You can recite API calls but cannot design a schema for a production scenario. Your queries work in tutorials but fail under real data distributions.
Fix
Start every new concept by asking: 'What problem does this solve in production?' Write a test case that fails before you look up the syntax.
×

Skipping practice and only reading theory

Symptom
You understand the concepts but freeze when facing a real performance problem. Your theoretical knowledge doesn't translate to debugging.
Fix
After reading each section, run the example code locally. Modify it. Break it. Fix it. Practice is the only way to internalize.
×

Using `depends_on` style thinking in Cypher (expecting automatic index usage without explicit index hints)

Symptom
Queries suddenly slow after data growth; PROFILE shows NodeByLabelScan instead of index seek.
Fix
Always verify index existence and predicate shape. Use USING INDEX in the query as a temporary hint, but fix the underlying issue (stale stats or missing index).
×

Creating indexes on every property without considering query patterns

Symptom
Write throughput drops drastically; page cache fills with index data instead of graph data.
Fix
Audit index usage via CALL db.indexes() and system logs. Drop indexes that are never used in WHERE clauses. Index only the predicates in your 10 most critical queries.
×

Not limiting variable-length path ranges

Symptom
Query runs out of memory on production graphs with high degree nodes (e.g., 'MATCH (a)-[*]->(b)').
Fix
Always specify a max depth: [*1..5]. If unbounded is truly needed, use breadth-first traversal via shortestPath or allShortestPaths.
×

Overlooking index fragmentation after bulk imports

Symptom
Lookup queries degrade 5–10x after large inserts, even though indexes exist.
Fix
Run CALL db.index.status() and compare size vs entries. If fragmentation is high, drop and recreate the index. Use sequential IDs for bulk loads.
×

Not adjusting page cache when adding new data or increasing RAM

Symptom
After adding 100GB of new data, previously fast queries become disk-bound.
Fix
Check free memory and increase dbms.memory.pagecache.size accordingly. Use neo4j-admin memrec for recommendations.
×

Ignoring supernodes during schema design

Symptom
Traversals that pass through a central node (e.g., 'all users') become extremely slow; PROFILE shows millions of DB hits on ExpandAll.
Fix
Run a degree check query to identify high-degree nodes. Remodel to distribute connections across domain-specific nodes or use index-based lookups instead of full traversal.
×

Not using batch operations for large imports

Symptom
Bulk insert of millions of nodes/relationships takes hours instead of minutes; transaction log grows excessively.
Fix
Use UNWIND with parameter arrays or LOAD CSV with periodic commit. Avoid iterating over individual CREATE statements in a loop.
INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01SENIOR
Explain how index-free adjacency works in Neo4j and why it matters for p...
Q02SENIOR
How do you debug a Cypher query that suddenly becomes slow in production...
Q03SENIOR
What is the difference between B-tree and Full-Text indexes in Neo4j? Wh...
Q04SENIOR
How does Neo4j handle concurrent writes? Explain the locking strategy.
Q05JUNIOR
What is a node in Neo4j and how is it different from a row in a relation...
Q06SENIOR
Explain how Neo4j's page cache interacts with the operating system's pag...
Q07SENIOR
How do you detect and fix a supernode in a production graph?
Q08SENIOR
How do you handle read replicas in a Neo4j cluster?
Q01 of 08SENIOR

Explain how index-free adjacency works in Neo4j and why it matters for performance.

ANSWER
Index-free adjacency means each node and relationship record stores direct pointers (IDs) to its connected relationships. To traverse from one node to its neighbor, Neo4j reads the relationship record directly using the pointer—no B-tree lookup or hash index required. This makes traversal cost per hop constant regardless of graph size. In production, this allows queries that navigate many hops (e.g., 10 hops) to stay fast even as the graph grows to billions of nodes. The trade-off is that storage is rigid (fixed record sizes) and write operations must update multiple pointers, but reads benefit enormously.
FAQ · 8 QUESTIONS

Frequently Asked Questions

01
Is Neo4j faster than a relational database for all types of queries?
02
When should I use a composite index in Neo4j?
03
How do I handle a query that returns too many results and causes OOM?
04
What is the best way to back up a Neo4j database?
05
Can Neo4j run on multiple machines (clustering)?
06
What are the symptoms of a supernode and how do you fix it?
07
How do you monitor Neo4j health in production?
08
What is the difference between EXPLAIN and PROFILE?
🔥

That's NoSQL. Mark it forged?

7 min read · try the examples if you haven't

Previous
Elasticsearch Basics
13 / 15 · NoSQL
Next
Apache Kafka Basics