Imagine you're trying to find 'friends of friends who live in London and like jazz' in a spreadsheet. You'd need multiple VLOOKUPs, joins, and temp tables — and it'd take 20 minutes to write. In a graph database, you draw the pattern: (me)-[:FRIEND]->(friend)-[:FRIEND]->(fof) WHERE fof.city = 'London' AND fof.likes = 'Jazz'. The database walks the graph like following breadcrumbs — it doesn't scan tables, it hops from node to node. That's Cypher: a language that lets you describe the shape of the data you want, and the engine finds every instance of that shape in the graph.
The key difference from SQL: in SQL, relationships are inferred through foreign keys and reconstructed with joins at query time. In Cypher, relationships are first-class citizens — they're stored directly on disk as physical connections between nodes. When you traverse from one node to another, you're following a pointer, not scanning a join table. This is why a 6-level deep friend-of-friend query in SQL takes minutes but in Cypher takes milliseconds.
A recommendation engine I built for an e-commerce platform was generating 'customers who bought X also bought Y' queries using SQL. With 2 million products and 50 million purchase records, the query took 45 seconds — too slow for real-time recommendations on the product page. We rewrote the data layer in Neo4j with Cypher. Same logic, same data. The query dropped to 12 milliseconds. Not 12 seconds — 12 milliseconds. The difference wasn't better hardware or clever caching. It was that the graph stored relationships as direct pointers, so finding 'people who bought this product and what else they bought' was a two-hop traversal instead of a three-table join with a GROUP BY on 50 million rows.
Cypher is the query language for Neo4j — the most widely-used graph database. It's declarative (you describe what you want, not how to get it), pattern-based (you draw the shape of your data using ASCII art), and designed specifically for traversing relationships. If you've written SQL, the mental model shift is: stop thinking in tables and joins, start thinking in nodes and paths.
This guide covers every clause you'll use daily — CREATE, MATCH, WHERE, RETURN, SET, DELETE, MERGE, WITH, UNWIND, and CALL — with production examples from real systems. We'll also cover variable-length paths, aggregation, subqueries, indexing, and the six mistakes that cause the most pain in production.
Cypher Syntax Fundamentals: Nodes, Relationships, and Patterns
Before writing any query, you need to understand the three building blocks: nodes, relationships, and properties. Everything in Cypher is built on these.
Nodes represent entities — people, products, cities, orders. Written as (variable:Label {properties}). The parentheses look like circles (nodes in a graph). The label is a type tag (like a table name). Properties are key-value pairs stored on the node.
Relationships connect nodes — FRIEND, PURCHASED, LOCATED_IN. Written as -[variable:TYPE {properties}]->. The arrow indicates direction. Relationships always have exactly one type and exactly one direction. They can also store properties (like a 'since' date on a FRIEND relationship, or a 'quantity' on a PURCHASED relationship).
Patterns combine nodes and relationships into shapes: (a:Person)-[:FRIEND]->(b:Person). This is the core of Cypher — you write a pattern that describes the shape of data you want, and the engine finds every instance of that pattern in the graph.
Variables bind to nodes or relationships so you can reference them later in WHERE, RETURN, or WITH. If you don't need to reference something, you can leave it anonymous: (a)-[:FRIEND]->() finds all people that 'a' is friends with, without naming the destination.
Three rules that save you hours of debugging: 1. Node labels start with uppercase: :Person, :Product 2. Relationship types start with uppercase: :FRIEND, :PURCHASED 3. Property keys start with lowercase: name, email, created_at
// io.thecodeforge: Node syntax — (variable:Label {key: value})
// This creates a node with label 'Person' and two properties
CREATE (alice:Person {name: 'Alice', age: 32, city: 'London'})
RETURN alice;
// io.thecodeforge: Relationship syntax — -[variable:TYPE {key: value}]->
// This creates a directed relationship with a property
CREATE (bob:Person {name: 'Bob', age: 28, city: 'Paris'})
CREATE (alice)-[r:FRIEND {since: date('2019-06-15'), context: 'university'}]->(bob)
RETURN alice, r, bob;
// io.thecodeforge: Pattern — the fundamental unit of Cypher queries
// This pattern matches any Person who is FRIENDs with any other PersonMATCH (p1:Person)-[:FRIEND]->(p2:Person)
RETURN p1.name AS person, p2.name AS friend;
// io.thecodeforge: Anonymous nodes — when you don't need to reference them
// Finds all people Alice is friends with, without naming the relationship
MATCH (alice:Person {name: 'Alice'})-[:FRIEND]->(friend)
RETURN friend.name, friend.city;
// io.thecodeforge: Multiple labels — a node can have more than one
CREATE (charlie:Person:Developer {name: 'Charlie', language: 'Java'})
RETURN charlie;
// io.thecodeforge: Matching multiple labels — find all Developer-PeopleMATCH (dev:Person:Developer)
RETURN dev.name, dev.language;
Output
╒══════════╤═══════════╤═══════════════╕
│"person" │"friend" │"friend.city" │
╞══════════╪═══════════╪═══════════════╡
│"Alice" │"Bob" │"Paris" │
└──────────┴───────────┴───────────────┘
Think in Shapes, Not Tables:
The single biggest mental shift from SQL to Cypher: stop thinking in 'what columns do I need?' and start thinking in 'what shape does my data look like?' If you can draw the pattern on a whiteboard, you can write it in Cypher. (a:Person)-[:BOUGHT]->(p:Product)<-[:BOUGHT]-(b:Person) reads exactly like what it finds: two people who bought the same product.
CREATE: Building Nodes and Relationships
CREATE is how you insert data into the graph. It always creates new nodes and relationships — even if duplicates already exist. This is the key difference from MERGE (covered later), which checks for existence first.
You can create single nodes, multiple nodes, nodes with relationships in one statement, and even entire subgraphs. The most common production pattern: create a node, create a relationship to an existing node, return what you created. This is how you'd model a new user signing up and connecting to an existing organization.
Properties can be any JSON-compatible type: strings, numbers, booleans, lists, dates, and temporal types. Neo4j stores them as key-value pairs on the node or relationship — there's no schema enforcement, which makes it easy to evolve your data model without migrations.
One gotcha: CREATE always creates. If you run the same CREATE statement twice, you get duplicate nodes. Use MERGE when you need idempotency (creating a node only if it doesn't already exist).
io/thecodeforge/graph/CreateNodes.cypherCYPHER
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
// io.thecodeforge: Create a single node with properties
CREATE (order:Order {
order_id: 'ORD-2024-001',
total: 149.99,
currency: 'GBP',
status: 'pending',
created_at: datetime()
})
RETURN order;
// io.thecodeforge: Create nodes and a relationship in one statement
// This is the most common production pattern — create entity + link to existing
CREATE (user:User {name: 'Dana', email: 'dana@thecodeforge.io'})
WITH user
MATCH (org:Organization {name: 'TheCodeForge'})
CREATE (user)-[rel:MEMBER_OF {role: 'engineer', joined: date('2024-01-15')}]->(org)
RETURN user.name, rel.role, org.name;
// io.thecodeforge: Create multiple nodes at once
CREATE
(london:City {name: 'London', country: 'UK', population: 8982000}),
(paris:City {name: 'Paris', country: 'France', population: 2161000}),
(berlin:City {name: 'Berlin', country: 'Germany', population: 3645000});
// io.thecodeforge: Create an entire subgraph in one statement
// Product with variants, categories, and supplier — all linked
CREATE (p:Product {sku: 'WIDGET-PRO', name: 'Widget Pro', price: 29.99})
CREATE (cat:Category {name: 'Electronics'})
CREATE (sup:Supplier {name: 'AcmeCorp', country: 'China'})
CREATE (p)-[:IN_CATEGORY]->(cat)
CREATE (p)-[:SUPPLIED_BY {lead_time_days: 14}]->(sup)
RETURN p.name, cat.name, sup.name;
// io.thecodeforge: Create a relationship between EXISTING nodes
// This is how you link entities after initial creation
MATCH (alice:Person {name: 'Alice'})
MATCH (bob:Person {name: 'Bob'})
CREATE (alice)-[:FRIEND {since: date('2023-03-10')}]->(bob)
RETURN'Friendship created'AS result;
Output
╒════════════════╤══════════╤═══════════════════╕
│"user.name" │"rel.role"│"org.name" │
╞════════════════╪══════════╪═══════════════════╡
│"Dana" │"engineer"│"TheCodeForge" │
└────────────────┴──────────┴───────────────────┘
CREATE Always Creates — Even Duplicates:
Running the same CREATE statement twice creates duplicate nodes. In production, this corrupts your graph — you'll have two 'Alice' nodes with no way to tell which is real. Use MERGE when you need idempotency. Rule of thumb: use CREATE for event-based data (orders, log entries) where duplicates are impossible by design. Use MERGE for entity-based data (users, products) where the same entity might be referenced multiple times.
MATCH and WHERE: Reading and Filtering Data
MATCH is the read operation in Cypher — equivalent to SELECT in SQL. You write a pattern, and the engine finds every instance of that pattern in the graph. MATCH alone returns everything matching the pattern. WHERE filters the results.
The WHERE clause supports the full set of comparison and logical operators: =, <>, <, >, <=, >=, AND, OR, NOT, IN, STARTS WITH, ENDS WITH, CONTAINS, IS NULL, IS NOT NULL, and regular expressions (=~ 'regex').
Performance note: always start your MATCH from a specific node, not an anonymous scan. (user:User {email: 'x'})-[:ORDERED]->(order) is fast because it starts from the User index. (n)-[:ORDERED]->(order) without a starting point forces a full graph scan.
Multiple MATCH clauses act like joins — each MATCH finds a pattern, and Cypher joins them on shared variables. If you want an outer join effect, use OPTIONAL MATCH (returns null when the pattern isn't found).
io/thecodeforge/graph/MatchAndFilter.cypherCYPHER
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
// io.thecodeforge: BasicMATCH — find all developers
MATCH (dev:Developer)
RETURN dev.name, dev.language, dev.years_experience;
// io.thecodeforge: MATCH with WHERE — filtering on properties
MATCH (dev:Developer)
WHERE dev.language = 'Java'AND dev.years_experience >= 5RETURN dev.name AS senior_java_dev;
// io.thecodeforge: Inline property matching — shorthand for simple WHERE
// Equivalent to the query above but more concise
MATCH (dev:Developer {language: 'Java'})
WHERE dev.years_experience >= 5RETURN dev.name AS senior_java_dev;
// io.thecodeforge: String matching — STARTSWITH, ENDSWITH, CONTAINSMATCH (user:User)
WHERE user.email ENDSWITH'@thecodeforge.io'AND user.name STARTSWITH'A'RETURN user.name, user.email;
// io.thecodeforge: IN operator — matching against a list
MATCH (dev:Developer)
WHERE dev.language IN ['Java', 'Kotlin', 'Scala']
RETURN dev.name, dev.language
ORDERBY dev.language;
// io.thecodeforge: ISNULL / ISNOTNULL — checking for missing properties
MATCH (order:Order)
WHERE order.shipped_at ISNULLAND order.status = 'confirmed'RETURN order.order_id, order.total
ORDERBY order.created_at DESCLIMIT10;
// io.thecodeforge: Regular expression matching
MATCH (product:Product)
WHERE product.sku =~ 'WIDGET-.*-PRO'RETURN product.name, product.sku;
// io.thecodeforge: OPTIONALMATCH — like a LEFTJOIN in SQL
// Returns users even if they have no orders
MATCH (user:User)
OPTIONALMATCH (user)-[:PLACED]->(order:Order)
RETURN user.name,
count(order) AS order_count,
CASEWHEN order ISNULLTHEN'no orders'ELSE'has orders'ENDAS status;
// io.thecodeforge: MultipleMATCH clauses — chained traversal
// Find developers who work on projects in a specific category
MATCH (dev:Developer)-[:CONTRIBUTES_TO]->(proj:Project)
MATCH (proj)-[:IN_CATEGORY]->(cat:Category {name: 'Backend'})
WHERE dev.years_experience >= 3RETURN dev.name, proj.name, cat.name
ORDERBY dev.years_experience DESC;
Output
╒═══════════════════╤══════════════╤═══════════╕
│"dev.name" │"proj.name" │"cat.name" │
╞═══════════════════╪══════════════╪═══════════╡
│"Alice" │"ForgeAPI" │"Backend" │
│"Charlie" │"DataPipeline"│"Backend" │
└───────────────────┴──────────────┴───────────┘
Start Your MATCH From an Index:
Cypher's query planner looks for the most selective starting point. If you MATCH (dev:Developer {email: 'x'})-[:WORKS_ON]->(proj), it starts from the email index (O(log n)) and traverses outward. If you MATCH (dev)-[:WORKS_ON]->(proj:Project {name: 'x'}), it starts from the project index. But if you MATCH (dev)-[:WORKS_ON]->(proj) with no indexed starting point, it scans every developer in the graph. Always anchor your pattern on a node you can index.
RETURN, ORDER BY, LIMIT, and DISTINCT: Shaping Your Results
RETURN defines what the query gives back — equivalent to SELECT in SQL. You can return nodes, relationships, properties, computed values, and aggregations. ORDER BY sorts results (ascending by default, use DESC for descending). LIMIT caps the result set. DISTINCT removes duplicates.
You can alias results with AS for cleaner output. You can use CASE expressions for conditional logic. You can call functions like toString(), toInteger(), datetime(), and coalesce() inside RETURN.
Pagination pattern: use SKIP and LIMIT together for cursor-based pagination. Don't use SKIP for large offsets (skip 10000 rows is slow) — use a WHERE clause on the sort key instead (WHERE user.created_at < last_seen_date).
io/thecodeforge/graph/ReturnAndSort.cypherCYPHER
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
// io.thecodeforge: RETURN with aliases and computed values
MATCH (dev:Developer)
RETURN dev.name AS developer,
dev.language AS primary_language,
dev.years_experience AS experience,
CASEWHEN dev.years_experience >= 10THEN'Principal'WHEN dev.years_experience >= 5THEN'Senior'WHEN dev.years_experience >= 2THEN'Mid'ELSE'Junior'ENDAS level
ORDERBY dev.years_experience DESC;
// io.thecodeforge: DISTINCT — remove duplicate results
MATCH (dev:Developer)-[:CONTRIBUTES_TO]->(proj:Project)
RETURNDISTINCT dev.name AS developer
ORDERBY dev.name;
// io.thecodeforge: Pagination with SKIP and LIMIT
// Page1: first 10 results
MATCH (user:User)
RETURN user.name, user.email, user.created_at
ORDERBY user.created_at DESCLIMIT10;
// Page2: skip first 10, return next 10MATCH (user:User)
RETURN user.name, user.email, user.created_at
ORDERBY user.created_at DESCSKIP10LIMIT10;
// io.thecodeforge: RETURN relationships and their properties
MATCH (a:Person)-[r:FRIEND]->(b:Person)
RETURN a.name AS person,
b.name AS friend,
r.since AS friends_since,
duration.between(r.since, date()).years AS years_of_friendship
ORDERBY r.since ASC;
// io.thecodeforge: Coalesce — provide defaults for missing properties
MATCH (product:Product)
RETURN product.name,
coalesce(product.description, 'No description available') AS description,
coalesce(product.rating, 0) AS rating
ORDERBY product.rating DESC;
SKIP 10000 LIMIT 10 forces the engine to process and discard 10,000 rows before returning 10. For large datasets, use keyset pagination instead: WHERE user.created_at < $last_seen_date ORDER BY user.created_at DESC LIMIT 10. This lets the engine use the index to jump directly to the right position. Same result, 1000x faster on large graphs.
SET and REMOVE: Updating Nodes and Relationships
SET adds or updates properties on nodes and relationships. REMOVE deletes properties or removes labels from nodes. These are the UPDATE and ALTER equivalents in Cypher.
SET can assign a single property, multiple properties at once, or copy properties from a map. It can also add labels to a node (SET n:NewLabel). REMOVE does the inverse — removes properties or labels.
Production pattern: combine MATCH + SET for targeted updates. Always filter by an indexed property (like an ID or email) to avoid scanning the entire graph. Use coalesce() with SET to preserve existing values when partial updates arrive from an API.
io/thecodeforge/graph/SetAndRemove.cypherCYPHER
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
// io.thecodeforge: SET — update a single property
MATCH (user:User {email: 'alice@thecodeforge.io'})
SET user.last_login = datetime()
RETURN user.name, user.last_login;
// io.thecodeforge: SET — update multiple properties at once
MATCH (order:Order {order_id: 'ORD-2024-001'})
SET order.status = 'shipped',
order.shipped_at = datetime(),
order.tracking_number = 'TRACK-99821'RETURN order.order_id, order.status, order.tracking_number;
// io.thecodeforge: SET — update from a map (common in RESTAPI handlers)
// The map comes from the request body in production
WITH {status: 'delivered', delivered_at: '2024-03-15T14:30:00Z'} AS updates
MATCH (order:Order {order_id: 'ORD-2024-001'})
SET order += updates // += merges the map into existing properties
RETURN order.order_id, order.status, order.delivered_at;
// io.thecodeforge: SET — add a label to an existing node
MATCH (user:User {email: 'alice@thecodeforge.io'})
SET user:AdminRETURNlabels(user) AS labels; // ['User', 'Admin']
// io.thecodeforge: REMOVE — delete a property
MATCH (user:User {email: 'alice@thecodeforge.io'})
REMOVE user.temporary_token
RETURN user.name, user.temporary_token; // null — property is gone
// io.thecodeforge: REMOVE — remove a label from a node
MATCH (user:User {email: 'alice@thecodeforge.io'})
REMOVE user:AdminRETURNlabels(user) AS labels; // ['User'] — Admin label removed
// io.thecodeforge: Production pattern — partial update with coalesce
// Preserve existing rating if the update doesn't include one
MATCH (product:Product {sku: 'WIDGET-PRO'})
SET product.name = coalesce($update_name, product.name),
product.price = coalesce($update_price, product.price),
product.updated_at = datetime()
RETURN product.name, product.price;
SET user += {name: 'Alice V2'} merges the map into existing properties — other properties stay. SET user = {name: 'Alice V2'} replaces ALL properties with just the ones in the map — everything else is deleted. In a REST PATCH endpoint, use +=. In a REST PUT endpoint, use =. Getting this wrong in production silently deletes user data.
DELETE and DETACH DELETE: Removing Data
DELETE removes nodes and relationships. DETACH DELETE removes a node and ALL its relationships in one operation. This distinction matters — you cannot delete a node that still has relationships. If you try, Neo4j throws an error. DETACH DELETE bypasses this by removing relationships first, then the node.
In production, soft deletes (setting a 'deleted: true' property) are almost always better than hard deletes. Hard deletes break referential integrity in the graph — other nodes that referenced the deleted node now have dangling relationships. Soft deletes preserve the graph structure while hiding the node from queries (WHERE n.deleted IS NULL OR n.deleted = false).
When you must hard delete, always use DETACH DELETE for nodes. Using plain DELETE on a node with relationships will fail.
// io.thecodeforge: DELETE — remove a relationship only
// The nodes remain; only the connection is severed
MATCH (alice:Person {name: 'Alice'})-[r:FRIEND]->(bob:Person {name: 'Bob'})
DELETE r
RETURN'Friendship removed'AS result;
// io.thecodeforge: DETACHDELETE — remove a node and ALL its relationships
// Usethis when deleting an entity entirely
MATCH (user:User {email: 'spam@bot.net'})
DETACHDELETE user;
// io.thecodeforge: Soft delete — production pattern
// Mark as deleted instead of removing; preserves graph integrity
MATCH (order:Order {order_id: 'ORD-2024-999'})
SET order.deleted = true,
order.deleted_at = datetime(),
order.deleted_by = 'admin@thecodeforge.io'RETURN order.order_id, order.deleted;
// io.thecodeforge: Filtering out soft-deleted nodes in queries
// EveryMATCH in your app should include this filter
MATCH (order:Order)
WHERE order.deleted ISNULLRETURN order.order_id, order.total
ORDERBY order.created_at DESCLIMIT20;
// io.thecodeforge: Delete all relationships of a specific type
// Usefulfor bulk cleanup during data migrations
MATCH ()-[r:TEMPORARY_LINK]->()
DELETE r
RETURNcount(r) AS deleted_count;
// io.thecodeforge: Conditional delete — only delete if certain conditions met
MATCH (session:Session)
WHERE session.last_active < datetime() - duration('P30D')
AND session.active = falseDETACHDELETE session
RETURNcount(session) AS expired_sessions_deleted;
Output
╒═══════════════════════════╕
│"expired_sessions_deleted" │
╞═══════════════════════════╡
│147 │
└───────────────────────────┘
Never DELETE a Node Without DETACH:
MATCH (n:User {id: 'x'}) DELETE n will throw 'Cannot delete node, because it still has relationships.' You must either delete the relationships first or use DETACH DELETE. In production, forgetting DETACH is the #1 cause of delete failures in batch jobs. If you're unsure, always use DETACH DELETE — removing orphaned relationships is almost always the right behavior.
MERGE: Idempotent Create-or-Find
MERGE is CREATE + MATCH in one operation. It checks if a pattern exists — if yes, it matches it; if no, it creates it. This makes MERGE the idempotent alternative to CREATE, essential for scripts that might run multiple times, ETL pipelines, and any scenario where duplicate data is dangerous.
MERGE has two companion clauses: ON CREATE SET (properties to set only when creating a new node) and ON MATCH SET (properties to set only when matching an existing node). Together, these give you upsert behavior: insert if new, update if existing.
Critical gotcha: MERGE matches the ENTIRE pattern, including relationships. If you MERGE (a)-[:FRIEND]->(b) and the nodes exist but the relationship doesn't, MERGE creates the relationship. But if only (a) exists and (b) doesn't, MERGE creates both (a) and (b) — it doesn't look up (b) by name first. This is the most common source of MERGE confusion.
MERGE (a:Person {name: 'Alice'})-[:FRIEND]->(b:Person {name: 'Bob'}) checks if the ENTIRE pattern exists. If Alice exists but Bob doesn't, MERGE creates a NEW Bob node — it doesn't look up Bob by name in the broader graph. To avoid phantom nodes, MATCH the nodes first, then MERGE the relationship: MATCH (a:Person {name: 'Alice'}) MATCH (b:Person {name: 'Bob'}) MERGE (a)-[:FRIEND]->(b).
WITH: Chaining Subqueries and Aggregation Pipelines
WITH is the pipe operator of Cypher — it passes results from one part of the query to the next. Think of it as the equivalent of a subquery boundary or a GROUP BY + HAVING in SQL. Every clause after WITH operates on the results of the clause before it.
WITH has three critical uses: 1. Aggregation pipelines — aggregate data, then filter or sort on the aggregated result. In SQL, you'd use HAVING. In Cypher, you aggregate in one WITH, then filter in the next WHERE. 2. Query chaining — break a complex query into readable stages. Each WITH acts as a checkpoint. 3. Limiting scope — pass only specific variables forward, which the engine uses to optimize the query plan.
The rule: any variable not listed in the WITH clause is no longer available downstream. This is deliberate — it lets the query planner know exactly what data flows forward, enabling optimizations.
io/thecodeforge/graph/WithClause.cypherCYPHER
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
// io.thecodeforge: WITHfor aggregation — find prolific contributors
// Count projects per developer, then filter on the count
MATCH (dev:Developer)-[:CONTRIBUTES_TO]->(proj:Project)
WITH dev, count(proj) AS project_count
WHERE project_count >= 3RETURN dev.name AS developer, project_count
ORDERBY project_count DESC;
// io.thecodeforge: WITHfor query chaining — multi-stage filtering
// Stage1: Find active projects
// Stage2: Find developers on those projects
// Stage3: Aggregate and filter
MATCH (proj:Project {status: 'active'})
WITH proj
MATCH (dev:Developer)-[:CONTRIBUTES_TO]->(proj)
WITH dev, collect(proj.name) AS projects, count(proj) AS num_projects
WHERE num_projects >= 2RETURN dev.name, projects, num_projects
ORDERBY num_projects DESC;
// io.thecodeforge: WITH with LIMIT — process in batches
// Find the 100 most recent users, then do expensive operations on them
MATCH (user:User)
WITH user
ORDERBY user.created_at DESCLIMIT100MATCH (user)-[:PLACED]->(order:Order)
WITH user, count(order) AS order_count, sum(order.total) AS total_spent
WHERE total_spent > 500RETURN user.name, user.email, order_count, total_spent
ORDERBY total_spent DESC;
// io.thecodeforge: WITH as a scope boundary — only pass what you need
MATCH (dev:Developer)-[:CONTRIBUTES_TO]->(proj:Project)
WITH dev.name AS name, dev.language AS lang, count(proj) AS projects
// dev is no longer available here — only name, lang, projects
WHERE lang = 'Java'RETURN name, lang, projects
ORDERBY projects DESC;
// io.thecodeforge: WITH + COLLECT — build lists per group
MATCH (cat:Category)<-[:IN_CATEGORY]-(product:Product)
WITH cat, collect(product.name) AS products, count(product) AS product_count
WHERE product_count >= 5RETURN cat.name AS category, products, product_count
ORDERBY product_count DESC;
You cannot chain a WHERE clause directly after an aggregation like count(). The aggregation result must pass through a WITH (or RETURN) first. MATCH (dev)-[:CONTRIBUTES_TO]->(proj) WITH dev, count(proj) AS cnt WHERE cnt > 3 works. MATCH (dev)-[:CONTRIBUTES_TO]->(proj) WHERE count(proj) > 3 does NOT work — it's a syntax error. This is the Cypher equivalent of SQL's HAVING vs WHERE distinction.
Aggregation Functions: COUNT, COLLECT, SUM, AVG, and Friends
Cypher supports the standard set of aggregation functions: count(), collect(), sum(), avg(), min(), max(), percentileCont(), percentileDisc(), and stDev(). The two you'll use most are count() and collect().
count() has two forms: count(n) counts non-null values of n, count() counts all rows regardless of nulls. This distinction matters — count(user) skips rows where user is null (like from an OPTIONAL MATCH), while count() includes them.
collect() is the most powerful aggregation function in Cypher — it gathers all values into a list. This is how you build nested results: group by a category, collect the items in that category into a list, return the category with its items. It replaces SQL's GROUP_CONCAT and subqueries for building nested JSON responses.
All aggregation functions ignore null values except count(*).
// io.thecodeforge: COUNT — two forms matter
// count(*) counts all rows including nulls
// count(order) counts only non-null orders
MATCH (user:User)
OPTIONALMATCH (user)-[:PLACED]->(order:Order)
RETURN user.name,
count(*) AS total_rows, // includes users with no orders
count(order) AS orders_placed // excludes null orders
ORDERBY orders_placed DESC;
// io.thecodeforge: COLLECT — build nested lists
// Group products by category, return category with its products
MATCH (cat:Category)<-[:IN_CATEGORY]-(product:Product)
WITH cat.name AS category, collect(product) AS products
RETURN category, size(products) AS count, products
ORDERBY count DESC;
// io.thecodeforge: COLLECT with property extraction — cleaner output
// Collect just the names, not full node objects
MATCH (proj:Project)<-[:CONTRIBUTES_TO]-(dev:Developer)
WITH proj.name AS project, collect(dev.name) AS contributors
RETURN project, contributors
ORDERBYsize(contributors) DESC;
// io.thecodeforge: SUM and AVG — numeric aggregations
MATCH (order:Order)-[:CONTAINS]->(item:OrderItem)
WITH order,
sum(item.quantity * item.unit_price) AS order_total
RETURNavg(order_total) AS avg_order_value,
min(order_total) AS min_order,
max(order_total) AS max_order,
sum(order_total) AS total_revenue,
count(order) AS total_orders;
// io.thecodeforge: PERCENTILE — distribution analysis
MATCH (user:User)-[:PLACED]->(order:Order)
WITH user, count(order) AS order_count
RETURNpercentileCont(order_count, 0.5) AS median_orders,
percentileCont(order_count, 0.95) AS p95_orders,
percentileCont(order_count, 0.99) AS p99_orders,
avg(order_count) AS mean_orders;
// io.thecodeforge: COLLECTfor top-N per group
// Top3 products per category by rating
MATCH (cat:Category)<-[:IN_CATEGORY]-(product:Product)
WITH cat, product
ORDERBY product.rating DESCWITH cat.name AS category, collect(product)[0..3] AS top_products
RETURN category, top_products;
The pattern collect(x) ORDER BY x.score DESC then collect(x)[0..3] is Cypher's equivalent of a window function with ROW_NUMBER(). It gives you the top N items per group without a subquery. This is the standard pattern for leaderboards, top products per category, and most-active users per team.
Variable-Length Paths: Traversing Multiple Hops
Variable-length paths let you traverse a variable number of relationships in one pattern. Instead of writing out each hop explicitly, you specify a range: [1..3] means 1 to 3 hops, [] means any number of hops, [*3] means exactly 3 hops.
This is the feature that makes graph databases shine for social networks (friends of friends), supply chains (supplier of supplier), and recommendation engines (users who bought what you bought bought what?).
Critical warning: unbounded paths ([*] with no upper limit) will explore every reachable node in the graph and can cause out-of-memory errors on large graphs. ALWAYS set a maximum depth. For most use cases, 1-5 hops is sufficient. Beyond 6 hops, the results are usually meaningless (everyone on Facebook is within 4.7 hops of everyone else).
Use shortestPath() and allShortestPaths() when you specifically want the shortest route between two nodes, not all possible paths.
// io.thecodeforge: Friends of friends — 2 hops exactly
MATCH (me:Person {name: 'Alice'})-[:FRIEND*2]->(fof)
WHERE fof <> me // exclude Alice herself
RETURNDISTINCT fof.name AS friend_of_friend
ORDERBY fof.name;
// io.thecodeforge: Friends up to 3 levels deep — 1 to 3 hops
MATCH (me:Person {name: 'Alice'})-[:FRIEND*1..3]->(connection)
WHERE connection <> me
RETURNDISTINCT connection.name AS connection,
min(length(shortestPath((me)-[:FRIEND*]->(connection)))) AS degrees
ORDERBY degrees, connection.name;
// io.thecodeforge: Supply chain — find suppliers of suppliers
MATCH (product:Product {name: 'Widget Pro'})-[:SUPPLIED_BY*1..4]->(supplier)
RETURN product.name AS product,
supplier.name AS ultimate_supplier,
supplier.country AS origin_country;
// io.thecodeforge: Shortest path between two people
MATCH path = shortestPath(
(alice:Person {name: 'Alice'})-[:FRIEND*]-(bob:Person {name: 'Bob'})
)
RETURN [n INnodes(path) | n.name] AS path_names,
length(path) AS degrees_of_separation;
// io.thecodeforge: All shortest paths (if multiple equally-short routes exist)
MATCH path = allShortestPaths(
(alice:Person {name: 'Alice'})-[:FRIEND*]-(bob:Person {name: 'Bob'})
)
RETURN [n INnodes(path) | n.name] AS path_names,
length(path) AS hops;
// io.thecodeforge: Bounded traversal with filtering at each hop
// Find paths where every intermediate person is over 25MATCH (alice:Person {name: 'Alice'})-[:FRIEND*1..3]->(target:Person)
WHEREall(node INnodes(path) WHERE node.age >= 25OR node = alice OR node = target)
RETURNDISTINCT target.name, length(path) AS hops;
MATCH (a)-[]->(b) with no upper limit explores every reachable node from 'a'. On a graph with 1 million nodes and dense relationships, this can consume all available memory and crash the server. Always use [1..5] or similar bounded ranges. If you need to find all reachable nodes, use the APOC procedure apoc.path.expandConfig with a maxDepth parameter and a timeout.
UNWIND: Working with Lists and Expanding Collections
UNWIND takes a list and expands it into individual rows — the opposite of collect(). If collect() gathers rows into a list, UNWIND spreads a list into rows. This is essential for processing batch data, transforming aggregated results back into individual records, and working with JSONarrays passed from application code.
The most common production use: UNWIND a parameter list to create or merge multiple nodes in one query. Instead of executing 100 separate CREATE statements (100 round trips), you UNWIND a list of 100 items and create all nodes in one query (1 round trip).
UNWIND + MERGE is the standard ETL pattern for loading data into Neo4j.
// io.thecodeforge: UNWIND — expand a list into rows
UNWIND ['Alice', 'Bob', 'Charlie'] AS name
MERGE (p:Person {name: name})
RETURN p.name;
// io.thecodeforge: UNWINDfor batch creation — 1 query, 100 nodes
// In production, $users comes from a JSON request body
UNWIND $users AS user_data
MERGE (u:User {email: user_data.email})
ONCREATESET u.name = user_data.name,
u.created_at = datetime()
RETURNcount(u) AS users_processed;
// io.thecodeforge: UNWIND to create relationships in batch
// Connect a list of skills to a developer in one query
MATCH (dev:Developer {name: 'Alice'})
UNWIND ['Java', 'Spring Boot', 'Neo4j', 'Kubernetes'] AS skill_name
MERGE (s:Skill {name: skill_name})
MERGE (dev)-[:HAS_SKILL {level: 'expert'}]->(s)
RETURN dev.name, collect(s.name) AS skills;
// io.thecodeforge: UNWIND after COLLECT — flatten and reprocess
// Find all categories, collect their products, then flatten for analysis
MATCH (cat:Category)<-[:IN_CATEGORY]-(product:Product)
WITH cat, collect(product) AS products
UNWIND products AS product
RETURN cat.name AS category,
product.name AS product,
product.price AS price
ORDERBY cat.name, product.price DESC;
// io.thecodeforge: UNWIND with index — track position in the list
UNWIND ['alpha', 'beta', 'gamma', 'delta'] AS item
WITH item, range(0, 3)[indexOf(['alpha','beta','gamma','delta'], item)] AS idx
MERGE (n:Item {name: item})
SET n.sort_order = idx
RETURN n.name, n.sort_order
ORDERBY n.sort_order;
// io.thecodeforge: UNWINDfor data cleanup — split comma-separated values
MATCH (legacy:LegacyRecord)
WHERE legacy.tags ISNOTNULLWITH legacy, split(legacy.tags, ',') AS tag_list
UNWIND tag_list AS tag
MERGE (t:Tag {name: trim(tag)})
MERGE (legacy)-[:TAGGED_WITH]->(t)
RETURNcount(DISTINCT legacy) AS records_processed, count(DISTINCT t) AS tags_created;
Output
╒═══════════════════════╤═════════════════╕
│"records_processed" │"tags_created" │
╞═══════════════════════╪═════════════════╡
│4,291 │187 │
└───────────────────────┴─────────────────┘
UNWIND $list Is the Batch API Pattern:
Every Neo4j driver (Java, Python, JavaScript) supports parameterized queries. Pass a list as $items, UNWIND it in Cypher, and process all items in a single database round trip. This is 100x faster than executing individual queries in a loop from your application code. In Spring Data Neo4j, use @Query with UNWIND $batch AS row MERGE ... for batch operations.
Subqueries with CALL: Encapsulating Complex Logic
CALL introduces subqueries in Cypher — self-contained query blocks that execute independently and return results to the outer query. There are two forms: CALL {} for inline subqueries (like SQL subqueries) and CALL procedure.name() for invoking stored procedures (like stored procedures in SQL).
CALL {} subqueries are essential for isolating aggregation, avoiding variable scope conflicts, and breaking complex queries into readable chunks. Each CALL {} block has its own scope — variables must be explicitly passed in and out.
CALL procedure is how you invoke APOC procedures, Graph Data Science algorithms, and custom procedures. The syntax is CALL db.indexes() YIELD indexName, state RETURN indexName.
// io.thecodeforge: CALL {} — inline subquery for isolated aggregation
// Find developers, and for each, count their projects in a subquery
MATCH (dev:Developer)
CALL {
WITH dev
MATCH (dev)-[:CONTRIBUTES_TO]->(proj:Project)
RETURNcount(proj) AS project_count, collect(proj.name) AS project_names
}
RETURN dev.name, project_count, project_names
ORDERBY project_count DESC;
// io.thecodeforge: CALL {} — find the most recent order per user
MATCH (user:User)
CALL {
WITH user
MATCH (user)-[:PLACED]->(order:Order)
RETURN order
ORDERBY order.created_at DESCLIMIT1
}
RETURN user.name, order.order_id, order.total, order.created_at
ORDERBY order.created_at DESC;
// io.thecodeforge: CALL db.indexes — list all indexes in the database
CALL db.indexes()
YIELD name, type, entityType, labelsOrTypes, properties, state
RETURN name, type, entityType, labelsOrTypes, properties, state
ORDERBY entityType, name;
// io.thecodeforge: CALL db.schema.visualization — see the graph schema
CALL db.schema.visualization()
YIELD nodes, relationships
RETURN nodes, relationships;
// io.thecodeforge: CALLapoc (ifAPOC is installed) — advanced procedures
// Export a subgraph to JSONfor backup or migration
CALL apoc.export.json.query(
'MATCH (u:User)-[:PLACED]->(o:Order) RETURN u, o LIMIT 100',
null,
{stream: true}
)
YIELD data
RETURN data;
// io.thecodeforge: CALL with UNION — combine results from multiple patterns
MATCH (dev:Developer {name: 'Alice'})
CALL {
WITH dev
MATCH (dev)-[:CONTRIBUTES_TO]->(proj:Project)
RETURN proj.name AS item, 'project'AS type
UNIONWITH dev
MATCH (dev)-[:HAS_SKILL]->(skill:Skill)
RETURN skill.name AS item, 'skill'AS type
}
RETURN dev.name, collect({name: item, type: type}) AS contributions;
CALL {} Isolates Scope — Use It to Avoid Variable Conflicts:
If your query has multiple MATCH clauses that reuse the same variable names, the query planner can get confused about which variable refers to what. Wrapping each section in CALL {} gives it a clean scope. This is especially important in complex queries with 4+ MATCH clauses — the readability improvement alone is worth the extra syntax.
Indexes and Constraints: Schema Management for Performance
Neo4j is often described as 'schema-optional,' but that doesn't mean you should skip indexing. Without indexes, every MATCH on a property scans every node with that label — O(n). With a range index, the same MATCH uses a B-tree lookup — O(log n). With a uniqueness constraint, you get both an index AND a guarantee that no duplicates can be created.
Three types of indexes: 1. Range index — the default, supports equality and range queries (=, <, >, IN). CREATE INDEX FOR (n:Label) ON (n.property) 2. Text index — optimized for CONTAINS, ENDS WITH, and full-text search. CREATE TEXT INDEX FOR (n:Label) ON (n.property) 3. Point index — for geospatial queries. CREATE POINT INDEX FOR (n:Label) ON (n.location)
Constraints enforce data integrity: 1. Uniqueness — no two nodes with the same label can have the same property value. CREATE CONSTRAINT FOR (n:User) REQUIRE n.email IS UNIQUE 2. Node key — uniqueness + existence (the property must exist and be unique). CREATE CONSTRAINT FOR (n:User) REQUIRE (n.email, n.tenant_id) IS NODE KEY 3. Existence — the property must exist (not null). CREATE CONSTRAINT FOR (n:User) REQUIRE n.email IS NOT NULL
Production rule: create uniqueness constraints on every entity's primary identifier before loading data. This prevents duplicates at the database level and gives you a free index for lookups.
// io.thecodeforge: Create a range index — for fast property lookups
CREATEINDEX user_email_index FOR (u:User) ON (u.email);
CREATEINDEX product_sku_index FOR (p:Product) ON (p.sku);
CREATEINDEX order_status_index FOR (o:Order) ON (o.status);
// io.thecodeforge: Create a composite index — for queries that filter on multiple props
CREATEINDEX user_tenant_email_index FOR (u:User) ON (u.tenant_id, u.email);
// io.thecodeforge: Create a text index — forCONTAINS and ENDSWITH queries
CREATETEXTINDEX product_name_text_index FOR (p:Product) ON (p.name);
// io.thecodeforge: Create a uniqueness constraint — prevents duplicate entities
CREATECONSTRAINT user_email_unique FOR (u:User) REQUIRE u.email ISUNIQUE;
CREATECONSTRAINT product_sku_unique FOR (p:Product) REQUIRE p.sku ISUNIQUE;
CREATECONSTRAINT order_id_unique FOR (o:Order) REQUIRE o.order_id ISUNIQUE;
// io.thecodeforge: Create a node key — composite uniqueness + existence
CREATECONSTRAINT user_tenant_key FOR (u:User) REQUIRE (u.email, u.tenant_id) ISNODEKEY;
// io.thecodeforge: Create an existence constraint — property must exist
CREATECONSTRAINT user_email_exists FOR (u:User) REQUIRE u.email ISNOTNULL;
// io.thecodeforge: List all indexes and their status
SHOWINDEXESYIELD name, type, entityType, labelsOrTypes, properties, state
ORDERBY entityType, name;
// io.thecodeforge: List all constraints
SHOWCONSTRAINTSYIELD name, type, entityType, labelsOrTypes, properties
ORDERBY entityType, name;
// io.thecodeforge: Drop an index (if you need to recreate it)
DROPINDEX user_email_index;
// io.thecodeforge: Drop a constraint
DROPCONSTRAINT user_email_unique;
A uniqueness constraint automatically creates a range index on the same property. If you create both CREATE INDEX FOR (u:User) ON (u.email) and CREATE CONSTRAINT FOR (u:User) REQUIRE u.email IS UNIQUE, you have two indexes doing the same work — wasting memory and slowing down writes. Rule: if you need uniqueness, create the constraint (it includes an index). If you only need fast lookups without uniqueness, create the index.
Real-World Patterns: Recommendation Engine, Fraud Detection, and Social Graph
The value of Cypher becomes clear when you see it applied to real problems. Here are three production patterns that demonstrate why graph databases outperform relational databases for connected data.
Recommendation Engine — 'Customers who bought X also bought Y.' In SQL, this requires a self-join on the purchases table with a GROUP BY, which gets exponentially slower as purchase history grows. In Cypher, it's a two-hop traversal: find everyone who bought the same product, then find what else they bought.
Fraud Detection — Find rings of accounts that share addresses, phone numbers, or devices. In SQL, this requires multiple joins across identity tables. In Cypher, you match a pattern where multiple accounts connect to the same shared resource.
Social Graph — Find the shortest path between two people, suggest connections based on mutual friends, identify communities. These are all natural graph traversals that Cypher handles natively.
// io.thecodeforge: RECOMMENDATIONENGINE
// 'Customers who bought this also bought' — collaborative filtering
MATCH (customer:Customer)-[:PURCHASED]->(product:Product {sku: 'WIDGET-PRO'})
<-[:PURCHASED]-(other:Customer)-[:PURCHASED]->(recommended:Product)
WHERE recommended <> product // exclude the product they already bought
ANDNOT (customer)-[:PURCHASED]->(recommended) // exclude what they already own
RETURN recommended.name AS product,
recommended.sku AS sku,
count(other) AS bought_by_others
ORDERBY bought_by_others DESCLIMIT5;
// io.thecodeforge: FRAUDDETECTION — shared identity signals
// Find accounts that share a phone number OR address — potential fraud ring
MATCH (a1:Account)-[:HAS_PHONE]->(phone:Phone)<-[:HAS_PHONE]-(a2:Account)
WHERE a1 <> a2
WITH a1, a2, collect(DISTINCT phone.number) AS shared_phones
MATCH (a1)-[:HAS_ADDRESS]->(addr:Address)<-[:HAS_ADDRESS]-(a2)
WITH a1, a2, shared_phones, collect(DISTINCT addr.full_address) AS shared_addresses
WHEREsize(shared_phones) >= 1ANDsize(shared_addresses) >= 1RETURN a1.account_id AS account_1,
a2.account_id AS account_2,
shared_phones,
shared_addresses,
size(shared_phones) + size(shared_addresses) AS risk_score
ORDERBY risk_score DESCLIMIT20;
// io.thecodeforge: SOCIALGRAPH — mutual friends suggestion
MATCH (me:Person {name: 'Alice'})-[:FRIEND]->(mutual:Person)-[:FRIEND]->(suggestion:Person)
WHERENOT (me)-[:FRIEND]->(suggestion) // not already friends
AND suggestion <> me // not yourself
RETURN suggestion.name AS suggested_friend,
count(mutual) AS mutual_friends,
collect(mutual.name) AS through
ORDERBY mutual_friends DESCLIMIT10;
// io.thecodeforge: SOCIALGRAPH — community detection (simplified)
// Find groups of 3+ people who are all friends with each other
MATCH (a:Person)-[:FRIEND]->(b:Person)-[:FRIEND]->(c:Person)-[:FRIEND]->(a)
WHEREid(a) < id(b) ANDid(b) < id(c) // avoid counting same triangle multiple times
RETURN a.name, b.name, c.name AS triangle
ORDERBY a.name;
// io.thecodeforge: KNOWLEDGEGRAPH — find all entities connected to a concept
MATCH (concept:Concept {name: 'Microservices'})-[:RELATED_TO*1..2]->(related)
WHERE related <> concept
RETURNDISTINCTlabels(related)[0] AS type,
related.name AS entity,
related.description AS description
ORDERBY type, entity
LIMIT20;
The entire 'customers who bought X also bought Y' feature is: (me)-[:PURCHASED]->(product)<-[:PURCHASED]-(other)-[:PURCHASED]->(rec). That's it. Four nodes, three relationships, one pattern. In SQL, this is a self-join with aggregation and exclusion subqueries — 15+ lines of SQL that gets slower as the purchase table grows. In Cypher, it's a single pattern that runs in milliseconds because the graph stores the relationships as direct pointers.
PROFILE and EXPLAIN: Query Performance Analysis
Every query you write for production should be profiled first. Neo4j provides two tools: EXPLAIN shows the execution plan without running the query (fast, safe for production). PROFILE runs the query and shows the execution plan with actual row counts and database hits (slower, shows real performance).
The execution plan is a tree of operators: NodeByLabelScan (scans all nodes with a label), NodeIndexSeek (uses an index), Expand (traverses a relationship), Filter (applies WHERE conditions), ProduceResult (returns data). Your goal: minimize NodeByLabelScan operations and maximize NodeIndexSeek operations.
The 'Eager' operator is a red flag — it means the engine had to load all intermediate results into memory before continuing. This happens with certain aggregation patterns and can cause memory issues on large graphs. If you see Eager, consider restructuring the query.
Database hits are the unit of work — fewer hits = faster query. A query with 100 database hits is fast. A query with 10 million database hits is slow, regardless of how many rows it returns.
io/thecodeforge/graph/QueryProfiling.cypherCYPHER
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
// io.thecodeforge: EXPLAIN — show plan without executing (safe for production)
EXPLAINMATCH (dev:Developer {language: 'Java'})-[:CONTRIBUTED_TO]->(proj:Project)
WHERE proj.status = 'active'RETURN dev.name, proj.name;
// io.thecodeforge: PROFILE — execute and show real metrics
PROFILEMATCH (dev:Developer {language: 'Java'})-[:CONTRIBUTED_TO]->(proj:Project)
WHERE proj.status = 'active'RETURN dev.name, proj.name;
// io.thecodeforge: BADQUERY — no label, forces full scan
// This scans every node in the database
EXPLAINMATCH (n)
WHERE n.email = 'alice@thecodeforge.io'RETURN n;
// io.thecodeforge: GOODQUERY — uses label + index
// This uses the User label index and the email property index
EXPLAINMATCH (u:User {email: 'alice@thecodeforge.io'})
RETURN u;
// io.thecodeforge: Identifying the Eager operator
// If you see Eager in the plan, the engine is buffering all results
EXPLAINMATCH (order:Order)
WITH order
ORDERBY order.created_at DESCMATCH (order)-[:CONTAINS]->(item:OrderItem)
RETURN order, collect(item) AS items;
// io.thecodeforge: Checkingif an index is being used
// Runthis after creating an index to verify the plan uses it
EXPLAINMATCH (u:User {email: 'test@thecodeforge.io'})
RETURN u.name;
// If the plan shows NodeIndexSeek, the index is working.
// If it shows NodeByLabelScan, the index isn't being used — check your query.
Total database hits: 5,694,587 ← This is slow. Add an index.
NodeIndexSeek Good, NodeByLabelScan Bad:
When reading a PROFILE or EXPLAIN output, look for these two operators. NodeIndexSeek means the engine used an index — fast. NodeByLabelScan means it scanned every node with that label — slow. If your query shows NodeByLabelScan where you expected NodeIndexSeek, check that the index exists (SHOW INDEXES) and that your query filters on the indexed property.
Aspect
Relational (SQL)
Graph (Cypher)
Data Model
Tables and Rows
Nodes and Relationships
Relationship Logic
Foreign Keys & Joins
Direct Traversal (Pointers)
Query Syntax
Verbose SELECT/JOIN
ASCII-Art Pattern Matching
Performance at Depth
Degrades exponentially with join depth
Constant for local traversals (index-free adjacency)
Schema
Rigid — ALTER TABLE for changes
Schema-optional — add properties without migration
Many-to-Many
Requires junction table
Native — just draw multiple relationships
Aggregation
GROUP BY + HAVING
WITH + WHERE (after aggregation)
Subqueries
Nested SELECT / CTE
CALL {} blocks
Batch Operations
INSERT INTO ... SELECT
UNWIND $list MERGE ...
Full-Text Search
Full-Text Index / LIKE '%x%'
TEXT INDEX + CONTAINS / apoc.text
Key takeaways
1
Cypher is a pattern-based query language
you describe the SHAPE of your data using ASCII art, and the engine finds every instance of that shape in the graph. Think in nodes and paths, not tables and joins.
2
CREATE always creates new nodes, even if duplicates exist. MERGE checks for existence first and creates only if missing. Use MERGE for entities, CREATE for events.
3
Always use labels in your MATCH patterns
matching without a label forces a full database scan (O(n)), while matching with a label uses the index (O(log n)).
4
WITH is the pipe operator of Cypher
it chains query stages, enables aggregation filtering (the HAVING equivalent), and limits variable scope for query optimization.
5
Variable-length paths ([*1..5]) are Cypher's superpower for social networks, supply chains, and recommendation engines
but NEVER use unbounded paths ([*]) in production.
6
UNWIND expands lists into rows
use it with MERGE for batch operations that process thousands of records in a single database round trip.
7
Profile every production query with EXPLAIN or PROFILE. Look for NodeIndexSeek (good) vs NodeByLabelScan (bad). If you see Eager, restructure the query.
8
Create uniqueness constraints on every entity's primary identifier before loading data
this prevents duplicates at the database level and gives you a free index.
9
collect() gathers rows into a list, UNWIND spreads a list into rows. Together they're the foundation of nested result structures and batch processing in Cypher.
10
Soft deletes (setting a 'deleted' property) are almost always better than hard deletes (DETACH DELETE) in production
hard deletes break graph integrity.
11
Cypher's recommendation engine query is just two hops
(me)-[:PURCHASED]->(product)<-[:PURCHASED]-(other)-[:PURCHASED]->(rec). In SQL, this is a 15-line self-join. In Cypher, it's a single pattern.
12
Use keyset pagination (WHERE created_at < $cursor LIMIT 10) instead of SKIP-based pagination (SKIP 100000 LIMIT 10)
SKIP forces the engine to discard rows, keyset jumps directly to the right position.
INTERVIEW PREP · PRACTICE MODE
Interview Questions on This Topic
FAQ · 10 QUESTIONS
Frequently Asked Questions
01
Does Cypher support ACID transactions like SQL?
Yes, Neo4j is fully ACID-compliant. Every Cypher query runs within a transaction — if any part of the query fails, the entire transaction is rolled back. For multi-statement transactions, you can use BEGIN and COMMIT explicitly through the driver API. Neo4j also uses optimistic locking: if two transactions modify the same node concurrently, the second one fails with a deadlock exception and must be retried. In your application code, wrap Cypher calls in a retry loop with exponential backoff.
Was this helpful?
02
What is the difference between CREATE and MERGE?
CREATE always creates a new node or relationship, even if an identical one already exists — this leads to duplicates. MERGE checks if the pattern exists first: if it does, MERGE matches it (does nothing); if it doesn't, MERGE creates it. MERGE is the idempotent version of CREATE. Use CREATE for event-based data (orders, log entries) where each record is inherently unique. Use MERGE for entity-based data (users, products) where the same entity might be referenced multiple times. MERGE also supports ON CREATE SET and ON MATCH SET for upsert behavior.
Was this helpful?
03
How do I handle many-to-many relationships in Cypher?
Many-to-many relationships are natural in graph databases — no junction table needed. Just create direct relationships between the nodes: (student:Student)-[:ENROLLED_IN]->(course:Course). If a student is enrolled in 5 courses, they have 5 direct relationships. If a course has 200 students, it has 200 incoming relationships. Querying is a simple pattern match: MATCH (s:Student)-[:ENROLLED_IN]->(c:Course) RETURN s.name, c.name. No join tables, no foreign keys, no performance degradation as the dataset grows.
Was this helpful?
04
What is the 'ASCII-art' syntax exactly?
Cypher uses text characters to represent graph elements. Parentheses (node) represent a node (they look like circles). Brackets -[rel:TYPE]-> represent a directed relationship (the arrow shows direction). Combining them: (a:Person)-[:FRIEND]->(b:Person) visually represents 'a Person node connected by a FRIEND relationship to another Person node.' This makes the query look like the data it queries — if you can draw it on a whiteboard, you can write it in Cypher.
Was this helpful?
05
Can I use Cypher to query non-graph data?
Cypher is optimized for graph traversals — using it for flat, non-connected data (like a simple key-value store) is inefficient. If your data has no relationships or the relationships are trivial (one-to-many without traversal needs), a relational database or document store is simpler and faster. Cypher's advantage kicks in when you need to traverse relationships: friends of friends, supply chain hops, recommendation paths, fraud ring detection. If your queries are mostly 'give me row X where column Y = Z,' you don't need a graph.
Was this helpful?
06
How do I import data from CSV into Neo4j?
Use LOAD CSV WITH HEADERS FROM 'file:///data.csv' AS row to read CSV files. The WITH HEADERS clause treats the first row as column names, accessible as row.column_name. Combine with MERGE for idempotent imports: LOAD CSV WITH HEADERS FROM 'file:///users.csv' AS row MERGE (u:User {email: row.email}) SET u.name = row.name. For large files (>100K rows), use PERIODIC COMMIT (Neo4j 4.x) or CALL { } IN TRANSACTIONS (Neo4j 5.x) to batch the writes and avoid running out of memory.
Was this helpful?
07
How do I paginate results efficiently in Cypher?
Don't use SKIP for large offsets — SKIP 100000 LIMIT 10 forces the engine to process and discard 100,000 rows. Use keyset pagination instead: pass the last seen value and filter on it. For example: MATCH (u:User) WHERE u.created_at < $last_seen_date RETURN u ORDER BY u.created_at DESC LIMIT 10. This lets the engine use the index to jump directly to the right position. Same pagination result, 1000x faster on large datasets. Store $last_seen_date from the last item of the previous page.
Was this helpful?
08
What is the WITH clause and when do I need it?
WITH is Cypher's pipe operator — it passes results from one query stage to the next. You need it in three situations: (1) after aggregation to filter on the aggregated result (like SQL's HAVING), (2) to chain multiple query stages for readability, and (3) to limit which variables pass forward (the engine uses this for optimization). Any variable not listed in the WITH clause becomes unavailable downstream. Example: MATCH (dev)-[:CONTRIBUTES_TO]->(proj) WITH dev, count(proj) AS cnt WHERE cnt > 3 — the count must pass through a WITH before you can filter on it.
Was this helpful?
09
How do I optimize a slow Cypher query?
Run PROFILE on the query and look for three things: (1) NodeByLabelScan — the engine is scanning all nodes with a label instead of using an index. Fix: create an index on the filtered property. (2) Eager operator — the engine is buffering all intermediate results in memory. Fix: restructure the query to reduce intermediate result size. (3) High database hits — the engine is doing too much work. Fix: start your MATCH from the most selective node (the one with the fewest matches), use labels everywhere, and avoid unbounded variable-length paths.
Was this helpful?
10
Can I use Cypher with Spring Data Neo4j?
Yes, Spring Data Neo4j (SDN) is the official integration. You define entity classes with @Node annotations, repositories that extend Neo4jRepository, and custom queries with @Query("cypher here"). SDN handles object-graph mapping — it converts Neo4j nodes and relationships into Java objects and back. For complex queries, use @Query with native Cypher. For batch operations, use UNWIND $batch AS row MERGE ... in a @Query method. SDN works with both Neo4j Community and Enterprise editions.