Beginner 13 min · March 09, 2026

Cypher Query Language Basics

Cypher Query Language: 45s SQL to 12ms in Neo4j

Q: Does Cypher support ACID transactions like SQL?

Yes, Neo4j is fully ACID-compliant. Every Cypher query runs within a transaction — if any part of the query fails, the entire transaction is rolled back. For multi-statement transactions, you can use BEGIN and COMMIT explicitly through the driver API. Neo4j also uses optimistic locking: if two transactions modify the same node concurrently, the second one fails with a deadlock exception and must be retried. In your application code, wrap Cypher calls in a retry loop with exponential backoff.

Q: What is the difference between CREATE and MERGE?

CREATE always creates a new node or relationship, even if an identical one already exists — this leads to duplicates. MERGE checks if the pattern exists first: if it does, MERGE matches it (does nothing); if it doesn't, MERGE creates it. MERGE is the idempotent version of CREATE. Use CREATE for event-based data (orders, log entries) where each record is inherently unique. Use MERGE for entity-based data (users, products) where the same entity might be referenced multiple times. MERGE also supports ON CREATE SET and ON MATCH SET for upsert behavior.

Q: How do I handle many-to-many relationships in Cypher?

Many-to-many relationships are natural in graph databases — no junction table needed. Just create direct relationships between the nodes: (student:Student)-[:ENROLLED_IN]->(course:Course). If a student is enrolled in 5 courses, they have 5 direct relationships. If a course has 200 students, it has 200 incoming relationships. Querying is a simple pattern match: MATCH (s:Student)-[:ENROLLED_IN]->(c:Course) RETURN s.name, c.name. No join tables, no foreign keys, no performance degradation as the dataset grows.

Q: What is the 'ASCII-art' syntax exactly?

Cypher uses text characters to represent graph elements. Parentheses (node) represent a node (they look like circles). Brackets -[rel:TYPE]-> represent a directed relationship (the arrow shows direction). Combining them: (a:Person)-[:FRIEND]->(b:Person) visually represents 'a Person node connected by a FRIEND relationship to another Person node.' This makes the query look like the data it queries — if you can draw it on a whiteboard, you can write it in Cypher.

Q: Can I use Cypher to query non-graph data?

Cypher is optimized for graph traversals — using it for flat, non-connected data (like a simple key-value store) is inefficient. If your data has no relationships or the relationships are trivial (one-to-many without traversal needs), a relational database or document store is simpler and faster. Cypher's advantage kicks in when you need to traverse relationships: friends of friends, supply chain hops, recommendation paths, fraud ring detection. If your queries are mostly 'give me row X where column Y = Z,' you don't need a graph.

Q: How do I import data from CSV into Neo4j?

Use LOAD CSV WITH HEADERS FROM 'file:///data.csv' AS row to read CSV files. The WITH HEADERS clause treats the first row as column names, accessible as row.column_name. Combine with MERGE for idempotent imports: LOAD CSV WITH HEADERS FROM 'file:///users.csv' AS row MERGE (u:User {email: row.email}) SET u.name = row.name. For large files (>100K rows), use PERIODIC COMMIT (Neo4j 4.x) or CALL { } IN TRANSACTIONS (Neo4j 5.x) to batch the writes and avoid running out of memory.

Q: How do I paginate results efficiently in Cypher?

Don't use SKIP for large offsets — SKIP 100000 LIMIT 10 forces the engine to process and discard 100,000 rows. Use keyset pagination instead: pass the last seen value and filter on it. For example: MATCH (u:User) WHERE u.created_at < $last_seen_date RETURN u ORDER BY u.created_at DESC LIMIT 10. This lets the engine use the index to jump directly to the right position. Same pagination result, 1000x faster on large datasets. Store $last_seen_date from the last item of the previous page.

Q: What is the WITH clause and when do I need it?

WITH is Cypher's pipe operator — it passes results from one query stage to the next. You need it in three situations: (1) after aggregation to filter on the aggregated result (like SQL's HAVING), (2) to chain multiple query stages for readability, and (3) to limit which variables pass forward (the engine uses this for optimization). Any variable not listed in the WITH clause becomes unavailable downstream. Example: MATCH (dev)-[:CONTRIBUTES_TO]->(proj) WITH dev, count(proj) AS cnt WHERE cnt > 3 — the count must pass through a WITH before you can filter on it.

Q: How do I optimize a slow Cypher query?

Run PROFILE on the query and look for three things: (1) NodeByLabelScan — the engine is scanning all nodes with a label instead of using an index. Fix: create an index on the filtered property. (2) Eager operator — the engine is buffering all intermediate results in memory. Fix: restructure the query to reduce intermediate result size. (3) High database hits — the engine is doing too much work. Fix: start your MATCH from the most selective node (the one with the fewest matches), use labels everywhere, and avoid unbounded variable-length paths.

Q: Can I use Cypher with Spring Data Neo4j?

Yes, Spring Data Neo4j (SDN) is the official integration. You define entity classes with @Node annotations, repositories that extend Neo4jRepository, and custom queries with @Query("cypher here"). SDN handles object-graph mapping — it converts Neo4j nodes and relationships into Java objects and back. For complex queries, use @Query with native Cypher. For batch operations, use UNWIND $batch AS row MERGE ... in a @Query method. SDN works with both Neo4j Community and Enterprise editions.

SQL recommendation query took 45 seconds; Cypher on Neo4j cut it to 12 milliseconds.

Naren Founder & Principal Engineer

20+ years shipping high-throughput database systems. Notes here come from systems that actually shipped.

✓ Verified

production tested

July 19, 2026

last updated

2,466

articles · all by Naren

Before you start⏱ 20 min

✓Basic programming fundamentals
✓A computer with internet access
✓Willingness to follow along with examples

⚡Quick Answer

Cypher is Neo4j's declarative, pattern-matching query language for property graphs, using ASCII-art syntax like (node)-[:REL]->(node).
Pattern matching replaces SQL joins: a 45-second SQL recommendation query dropped to 12ms in Cypher by traversing direct pointers instead of computing table joins.
Relationships are first-class citizens stored as doubly-linked lists, enabling O(1) per-hop traversal — no recursive CTEs or application-layer loops needed.
Use Cypher for connected data (recommendations, fraud, social graphs); avoid it for flat, aggregate-heavy OLAP workloads better suited to columnar stores.
Key clauses are MATCH (pattern binding), WHERE (subgraph pruning), CREATE/MERGE (mutations), and RETURN (output shaping with ORDER BY, LIMIT, DISTINCT).

✦ Definition~90s read

What is Cypher Query Language Basics?

Cypher is a declarative, pattern-matching query language purpose-built for property graph databases, most notably Neo4j. Unlike SQL, which operates on tabular joins and set theory, Cypher lets you express graph traversals using ASCII-art syntax that mirrors the graph structure itself: (node)-[:RELATIONSHIP]->(otherNode).

★

Imagine you're trying to find 'friends of friends who live in London and like jazz' in a spreadsheet.

This isn't a cosmetic difference—it fundamentally changes how you model and query connected data. A SQL query that requires five JOINs and a recursive CTE to find friends-of-friends can become a single Cypher pattern match, often dropping from 45 seconds to 12 milliseconds on a well-indexed Neo4j instance because the database traverses pointers directly rather than computing Cartesian products.

Cypher exists because relational databases punish relationship-heavy queries. If your data is inherently a graph—social networks, recommendation engines, fraud detection, supply chains—SQL forces you to denormalize or pay exponential join costs. Cypher solves this by treating relationships as first-class citizens, not foreign key constraints.

You don't join tables; you walk edges. This makes it ideal for pathfinding, variable-length traversals, and multi-hop queries where SQL would require recursive CTEs or application-layer loops. That said, Cypher is not a replacement for SQL—don't use it for flat, aggregate-heavy reporting or OLAP workloads where columnar stores like ClickHouse or DuckDB dominate.

Under the hood, Neo4j compiles Cypher into a traversal plan that exploits index-backed adjacency lists. The MATCH clause is the heart of this: it binds variables to nodes and relationships that match your pattern, then filters with WHERE. Unlike SQL's WHERE which evaluates row-by-row, Cypher's WHERE can prune entire subgraphs early. CREATE and SET handle mutations, while RETURN shapes output with ORDER BY, LIMIT, and DISTINCT—familiar concepts, but operating on graph paths rather than flat rows.

The key insight: Cypher isn't SQL with different syntax; it's a language optimized for the traversal cost model, where the fastest query is the one that touches the fewest nodes along the shortest paths.

Plain-English First

Imagine you're trying to find 'friends of friends who live in London and like jazz' in a spreadsheet. You'd need multiple VLOOKUPs, joins, and temp tables — and it'd take 20 minutes to write. In a graph database, you draw the pattern: (me)-[:FRIEND]->(friend)-[:FRIEND]->(fof) WHERE fof.city = 'London' AND fof.likes = 'Jazz'. The database walks the graph like following breadcrumbs — it doesn't scan tables, it hops from node to node. That's Cypher: a language that lets you describe the shape of the data you want, and the engine finds every instance of that shape in the graph.

The key difference from SQL: in SQL, relationships are inferred through foreign keys and reconstructed with joins at query time. In Cypher, relationships are first-class citizens — they're stored directly on disk as physical connections between nodes. When you traverse from one node to another, you're following a pointer, not scanning a join table. This is why a 6-level deep friend-of-friend query in SQL takes minutes but in Cypher takes milliseconds.

A recommendation engine I built for an e-commerce platform was generating 'customers who bought X also bought Y' queries using SQL. With 2 million products and 50 million purchase records, the query took 45 seconds — too slow for real-time recommendations on the product page. We rewrote the data layer in Neo4j with Cypher. Same logic, same data. The query dropped to 12 milliseconds. Not 12 seconds — 12 milliseconds. The difference wasn't better hardware or clever caching. It was that the graph stored relationships as direct pointers, so finding 'people who bought this product and what else they bought' was a two-hop traversal instead of a three-table join with a GROUP BY on 50 million rows.

Cypher is the query language for Neo4j — the most widely-used graph database. It's declarative (you describe what you want, not how to get it), pattern-based (you draw the shape of your data using ASCII art), and designed specifically for traversing relationships. If you've written SQL, the mental model shift is: stop thinking in tables and joins, start thinking in nodes and paths.

This guide covers every clause you'll use daily — CREATE, MATCH, WHERE, RETURN, SET, DELETE, MERGE, WITH, UNWIND, and CALL — with production examples from real systems. We'll also cover variable-length paths, aggregation, subqueries, indexing, and the six mistakes that cause the most pain in production.

Why Cypher Isn't Just SQL for Graphs

Cypher is a declarative graph query language that uses pattern matching to traverse nodes and relationships, not joins. Where SQL requires O(n) table scans or index lookups for each join hop, Cypher expresses a multi-hop traversal in a single ASCII-art pattern like (a)-[:KNOWS]->(b)-[:WORKS_AT]->(c). The database engine compiles that pattern into an internal traversal plan, walking edges in O(1) per hop via pointer chasing — no hash joins, no nested loops. This is the core mechanic that turns a 45-second SQL query into a 12-millisecond Cypher query on a graph of 10 million nodes.

Cypher works by anchoring on a starting node (via label+property index), then expanding relationships in a depth-first or breadth-first walk. Each relationship is a doubly-linked list in storage, so following an edge is a direct memory offset — not a key lookup. The language is declarative: you describe the shape of the subgraph you want, and the planner chooses the traversal order. Critical property: Cypher is not relational algebra; it's graph algebra. There is no cross product, no GROUP BY in the traditional sense — aggregation happens over paths, not rows.

Use Cypher when your query's complexity is driven by the number of hops, not the number of rows. Real systems — fraud detection, recommendation engines, access control — need to answer "find all nodes within 5 hops of this user" or "does this path exist?". SQL solves these with recursive CTEs that degrade to O(n^depth). Cypher solves them in O(depth * branching_factor) with constant-time edge traversal. If your data is inherently relational and you query by join keys, stay on SQL. If your data is inherently connected and you query by traversal, Cypher is not optional — it's the difference between a timeout and a real-time response.

⚠ Pattern Matching Is Not Regex

Cypher patterns look like ASCII art but are compiled into traversal plans. Writing (a)-[:REL*]->(b) without a bound start node forces a full graph scan — same as a table scan in SQL.

📊 Production Insight

Teams migrating from SQL to Cypher often write unbounded variable-length path patterns like (a)-[:REL*]->(b) without a label or property filter on 'a'. This triggers a full-node scan over millions of nodes, causing query timeouts in production. Rule: always bind the start node with a label and a property predicate — e.g., (a:User {id: $userId}) — before any variable-length expansion.

🎯 Key Takeaway

Cypher is pattern-based graph algebra, not relational algebra — no joins, only traversals.

Every relationship is a pointer, making multi-hop queries O(hops) instead of O(joins).

Always bind your start node with a label+property filter to avoid full graph scans.

thecodeforge.io

Neo4J Cypher Basics

Cypher Syntax Fundamentals: Nodes, Relationships, and Patterns

Before writing any query, you need to understand the three building blocks: nodes, relationships, and properties. Everything in Cypher is built on these.

Nodes represent entities — people, products, cities, orders. Written as (variable:Label {properties}). The parentheses look like circles (nodes in a graph). The label is a type tag (like a table name). Properties are key-value pairs stored on the node.

Relationships connect nodes — FRIEND, PURCHASED, LOCATED_IN. Written as -[variable:TYPE {properties}]->. The arrow indicates direction. Relationships always have exactly one type and exactly one direction. They can also store properties (like a 'since' date on a FRIEND relationship, or a 'quantity' on a PURCHASED relationship).

Patterns combine nodes and relationships into shapes: (a:Person)-[:FRIEND]->(b:Person). This is the core of Cypher — you write a pattern that describes the shape of data you want, and the engine finds every instance of that pattern in the graph.

Variables bind to nodes or relationships so you can reference them later in WHERE, RETURN, or WITH. If you don't need to reference something, you can leave it anonymous: (a)-[:FRIEND]->() finds all people that 'a' is friends with, without naming the destination.

Three rules that save you hours of debugging: 1. Node labels start with uppercase: :Person, :Product 2. Relationship types start with uppercase: :FRIEND, :PURCHASED 3. Property keys start with lowercase: name, email, created_at

io/thecodeforge/graph/SyntaxFundamentals.cypherCYPHER

// io.thecodeforge: Node syntax — (variable:Label {key: value})
// This creates a node with label 'Person' and two properties
CREATE (alice:Person {name: 'Alice', age: 32, city: 'London'})
RETURN alice;

// io.thecodeforge: Relationship syntax — -[variable:TYPE {key: value}]->
// This creates a directed relationship with a property
CREATE (bob:Person {name: 'Bob', age: 28, city: 'Paris'})
CREATE (alice)-[r:FRIEND {since: date('2019-06-15'), context: 'university'}]->(bob)
RETURN alice, r, bob;

// io.thecodeforge: Pattern — the fundamental unit of Cypher queries
// This pattern matches any Person who is FRIENDs with any other Person
MATCH (p1:Person)-[:FRIEND]->(p2:Person)
RETURN p1.name AS person, p2.name AS friend;

// io.thecodeforge: Anonymous nodes — when you don't need to reference them
// Finds all people Alice is friends with, without naming the relationship
MATCH (alice:Person {name: 'Alice'})-[:FRIEND]->(friend)
RETURN friend.name, friend.city;

// io.thecodeforge: Multiple labels — a node can have more than one
CREATE (charlie:Person:Developer {name: 'Charlie', language: 'Java'})
RETURN charlie;

// io.thecodeforge: Matching multiple labels — find all Developer-People
MATCH (dev:Person:Developer)
RETURN dev.name, dev.language;

Output

╒══════════╤═══════════╤═══════════════╕

│"person" │"friend" │"friend.city" │

╞══════════╪═══════════╪═══════════════╡

│"Alice" │"Bob" │"Paris" │

└──────────┴───────────┴───────────────┘

💡Think in Shapes, Not Tables:

The single biggest mental shift from SQL to Cypher: stop thinking in 'what columns do I need?' and start thinking in 'what shape does my data look like?' If you can draw the pattern on a whiteboard, you can write it in Cypher. (a:Person)-[:BOUGHT]->(p:Product)<-[:BOUGHT]-(b:Person) reads exactly like what it finds: two people who bought the same product.

CREATE: Building Nodes and Relationships

CREATE is how you insert data into the graph. It always creates new nodes and relationships — even if duplicates already exist. This is the key difference from MERGE (covered later), which checks for existence first.

You can create single nodes, multiple nodes, nodes with relationships in one statement, and even entire subgraphs. The most common production pattern: create a node, create a relationship to an existing node, return what you created. This is how you'd model a new user signing up and connecting to an existing organization.

Properties can be any JSON-compatible type: strings, numbers, booleans, lists, dates, and temporal types. Neo4j stores them as key-value pairs on the node or relationship — there's no schema enforcement, which makes it easy to evolve your data model without migrations.

One gotcha: CREATE always creates. If you run the same CREATE statement twice, you get duplicate nodes. Use MERGE when you need idempotency (creating a node only if it doesn't already exist).

io/thecodeforge/graph/CreateNodes.cypherCYPHER

// io.thecodeforge: Create a single node with properties
CREATE (order:Order {
  order_id: 'ORD-2024-001',
  total: 149.99,
  currency: 'GBP',
  status: 'pending',
  created_at: datetime()
})
RETURN order;

// io.thecodeforge: Create nodes and a relationship in one statement
// This is the most common production pattern — create entity + link to existing
CREATE (user:User {name: 'Dana', email: 'dana@thecodeforge.io'})
WITH user
MATCH (org:Organization {name: 'TheCodeForge'})
CREATE (user)-[rel:MEMBER_OF {role: 'engineer', joined: date('2024-01-15')}]->(org)
RETURN user.name, rel.role, org.name;

// io.thecodeforge: Create multiple nodes at once
CREATE
  (london:City {name: 'London', country: 'UK', population: 8982000}),
  (paris:City {name: 'Paris', country: 'France', population: 2161000}),
  (berlin:City {name: 'Berlin', country: 'Germany', population: 3645000});

// io.thecodeforge: Create an entire subgraph in one statement
// Product with variants, categories, and supplier — all linked
CREATE (p:Product {sku: 'WIDGET-PRO', name: 'Widget Pro', price: 29.99})
CREATE (cat:Category {name: 'Electronics'})
CREATE (sup:Supplier {name: 'AcmeCorp', country: 'China'})
CREATE (p)-[:IN_CATEGORY]->(cat)
CREATE (p)-[:SUPPLIED_BY {lead_time_days: 14}]->(sup)
RETURN p.name, cat.name, sup.name;

// io.thecodeforge: Create a relationship between EXISTING nodes
// This is how you link entities after initial creation
MATCH (alice:Person {name: 'Alice'})
MATCH (bob:Person {name: 'Bob'})
CREATE (alice)-[:FRIEND {since: date('2023-03-10')}]->(bob)
RETURN 'Friendship created' AS result;

Output

╒════════════════╤══════════╤═══════════════════╕

│"user.name" │"rel.role"│"org.name" │

╞════════════════╪══════════╪═══════════════════╡

│"Dana" │"engineer"│"TheCodeForge" │

└────────────────┴──────────┴───────────────────┘

⚠ CREATE Always Creates — Even Duplicates:

Running the same CREATE statement twice creates duplicate nodes. In production, this corrupts your graph — you'll have two 'Alice' nodes with no way to tell which is real. Use MERGE when you need idempotency. Rule of thumb: use CREATE for event-based data (orders, log entries) where duplicates are impossible by design. Use MERGE for entity-based data (users, products) where the same entity might be referenced multiple times.

thecodeforge.io

Neo4J Cypher Basics

MATCH and WHERE: Reading and Filtering Data

MATCH is the read operation in Cypher — equivalent to SELECT in SQL. You write a pattern, and the engine finds every instance of that pattern in the graph. MATCH alone returns everything matching the pattern. WHERE filters the results.

The WHERE clause supports the full set of comparison and logical operators: =, <><>, <, >, <=, >=, AND, OR, NOT, IN, STARTS WITH, ENDS WITH, CONTAINS, IS NULL, IS NOT NULL, and regular expressions (=~ 'regex').

Performance note: always start your MATCH from a specific node, not an anonymous scan. (user:User {email: 'x'})-[:ORDERED]->(order) is fast because it starts from the User index. (n)-[:ORDERED]->(order) without a starting point forces a full graph scan.

Multiple MATCH clauses act like joins — each MATCH finds a pattern, and Cypher joins them on shared variables. If you want an outer join effect, use OPTIONAL MATCH (returns null when the pattern isn't found).

io/thecodeforge/graph/MatchAndFilter.cypherCYPHER

// io.thecodeforge: Basic MATCH — find all developers
MATCH (dev:Developer)
RETURN dev.name, dev.language, dev.years_experience;

// io.thecodeforge: MATCH with WHERE — filtering on properties
MATCH (dev:Developer)
WHERE dev.language = 'Java' AND dev.years_experience >= 5
RETURN dev.name AS senior_java_dev;

// io.thecodeforge: Inline property matching — shorthand for simple WHERE
// Equivalent to the query above but more concise
MATCH (dev:Developer {language: 'Java'})
WHERE dev.years_experience >= 5
RETURN dev.name AS senior_java_dev;

// io.thecodeforge: String matching — STARTS WITH, ENDS WITH, CONTAINS
MATCH (user:User)
WHERE user.email ENDS WITH '@thecodeforge.io'
  AND user.name STARTS WITH 'A'
RETURN user.name, user.email;

// io.thecodeforge: IN operator — matching against a list
MATCH (dev:Developer)
WHERE dev.language IN ['Java', 'Kotlin', 'Scala']
RETURN dev.name, dev.language
ORDER BY dev.language;

// io.thecodeforge: IS NULL / IS NOT NULL — checking for missing properties
MATCH (order:Order)
WHERE order.shipped_at IS NULL
  AND order.status = 'confirmed'
RETURN order.order_id, order.total
ORDER BY order.created_at DESC
LIMIT 10;

// io.thecodeforge: Regular expression matching
MATCH (product:Product)
WHERE product.sku =~ 'WIDGET-.*-PRO'
RETURN product.name, product.sku;

// io.thecodeforge: OPTIONAL MATCH — like a LEFT JOIN in SQL
// Returns users even if they have no orders
MATCH (user:User)
OPTIONAL MATCH (user)-[:PLACED]->(order:Order)
RETURN user.name,
       count(order) AS order_count,
       CASE WHEN order IS NULL THEN 'no orders' ELSE 'has orders' END AS status;

// io.thecodeforge: Multiple MATCH clauses — chained traversal
// Find developers who work on projects in a specific category
MATCH (dev:Developer)-[:CONTRIBUTES_TO]->(proj:Project)
MATCH (proj)-[:IN_CATEGORY]->(cat:Category {name: 'Backend'})
WHERE dev.years_experience >= 3
RETURN dev.name, proj.name, cat.name
ORDER BY dev.years_experience DESC;

Output

╒═══════════════════╤══════════════╤═══════════╕

│"dev.name" │"proj.name" │"cat.name" │

╞═══════════════════╪══════════════╪═══════════╡

│"Alice" │"ForgeAPI" │"Backend" │

│"Charlie" │"DataPipeline"│"Backend" │

└───────────────────┴──────────────┴───────────┘

🔥Start Your MATCH From an Index:

Cypher's query planner looks for the most selective starting point. If you MATCH (dev:Developer {email: 'x'})-[:WORKS_ON]->(proj), it starts from the email index (O(log n)) and traverses outward. If you MATCH (dev)-[:WORKS_ON]->(proj:Project {name: 'x'}), it starts from the project index. But if you MATCH (dev)-[:WORKS_ON]->(proj) with no indexed starting point, it scans every developer in the graph. Always anchor your pattern on a node you can index.

RETURN, ORDER BY, LIMIT, and DISTINCT: Shaping Your Results

RETURN defines what the query gives back — equivalent to SELECT in SQL. You can return nodes, relationships, properties, computed values, and aggregations. ORDER BY sorts results (ascending by default, use DESC for descending). LIMIT caps the result set. DISTINCT removes duplicates.

You can alias results with AS for cleaner output. You can use CASE expressions for conditional logic. You can call functions like toString(), toInteger(), datetime(), and coalesce() inside RETURN.

Pagination pattern: use SKIP and LIMIT together for cursor-based pagination. Don't use SKIP for large offsets (skip 10000 rows is slow) — use a WHERE clause on the sort key instead (WHERE user.created_at < last_seen_date).

io/thecodeforge/graph/ReturnAndSort.cypherCYPHER

// io.thecodeforge: RETURN with aliases and computed values
MATCH (dev:Developer)
RETURN dev.name AS developer,
       dev.language AS primary_language,
       dev.years_experience AS experience,
       CASE
         WHEN dev.years_experience >= 10 THEN 'Principal'
         WHEN dev.years_experience >= 5 THEN 'Senior'
         WHEN dev.years_experience >= 2 THEN 'Mid'
         ELSE 'Junior'
       END AS level
ORDER BY dev.years_experience DESC;

// io.thecodeforge: DISTINCT — remove duplicate results
MATCH (dev:Developer)-[:CONTRIBUTES_TO]->(proj:Project)
RETURN DISTINCT dev.name AS developer
ORDER BY dev.name;

// io.thecodeforge: Pagination with SKIP and LIMIT
// Page 1: first 10 results
MATCH (user:User)
RETURN user.name, user.email, user.created_at
ORDER BY user.created_at DESC
LIMIT 10;

// Page 2: skip first 10, return next 10
MATCH (user:User)
RETURN user.name, user.email, user.created_at
ORDER BY user.created_at DESC
SKIP 10 LIMIT 10;

// io.thecodeforge: RETURN relationships and their properties
MATCH (a:Person)-[r:FRIEND]->(b:Person)
RETURN a.name AS person,
       b.name AS friend,
       r.since AS friends_since,
       duration.between(r.since, date()).years AS years_of_friendship
ORDER BY r.since ASC;

// io.thecodeforge: Coalesce — provide defaults for missing properties
MATCH (product:Product)
RETURN product.name,
       coalesce(product.description, 'No description available') AS description,
       coalesce(product.rating, 0) AS rating
ORDER BY product.rating DESC;

Output

╒══════════════╤══════════════════╤════════════╤═════════╕

│"developer" │"primary_language"│"experience"│"level" │

╞══════════════╪══════════════════╪════════════╪═════════╡

│"Alice" │"Java" │12 │Principal│

│"Charlie" │"Java" │8 │Senior │

│"Dana" │"Python" │4 │Mid │

└──────────────┴──────────────────┴────────────┴─────────┘

💡Paginate With WHERE, Not SKIP:

SKIP 10000 LIMIT 10 forces the engine to process and discard 10,000 rows before returning 10. For large datasets, use keyset pagination instead: WHERE user.created_at < $last_seen_date ORDER BY user.created_at DESC LIMIT 10. This lets the engine use the index to jump directly to the right position. Same result, 1000x faster on large graphs.

SET and REMOVE: Updating Nodes and Relationships

SET adds or updates properties on nodes and relationships. REMOVE deletes properties or removes labels from nodes. These are the UPDATE and ALTER equivalents in Cypher.

SET can assign a single property, multiple properties at once, or copy properties from a map. It can also add labels to a node (SET n:NewLabel). REMOVE does the inverse — removes properties or labels.

Production pattern: combine MATCH + SET for targeted updates. Always filter by an indexed property (like an ID or email) to avoid scanning the entire graph. Use coalesce() with SET to preserve existing values when partial updates arrive from an API.

io/thecodeforge/graph/SetAndRemove.cypherCYPHER

// io.thecodeforge: SET — update a single property
MATCH (user:User {email: 'alice@thecodeforge.io'})
SET user.last_login = datetime()
RETURN user.name, user.last_login;

// io.thecodeforge: SET — update multiple properties at once
MATCH (order:Order {order_id: 'ORD-2024-001'})
SET order.status = 'shipped',
    order.shipped_at = datetime(),
    order.tracking_number = 'TRACK-99821'
RETURN order.order_id, order.status, order.tracking_number;

// io.thecodeforge: SET — update from a map (common in REST API handlers)
// The map comes from the request body in production
WITH {status: 'delivered', delivered_at: '2024-03-15T14:30:00Z'} AS updates
MATCH (order:Order {order_id: 'ORD-2024-001'})
SET order += updates  // += merges the map into existing properties
RETURN order.order_id, order.status, order.delivered_at;

// io.thecodeforge: SET — add a label to an existing node
MATCH (user:User {email: 'alice@thecodeforge.io'})
SET user:Admin
RETURN labels(user) AS labels;  // ['User', 'Admin']

// io.thecodeforge: REMOVE — delete a property
MATCH (user:User {email: 'alice@thecodeforge.io'})
REMOVE user.temporary_token
RETURN user.name, user.temporary_token;  // null — property is gone

// io.thecodeforge: REMOVE — remove a label from a node
MATCH (user:User {email: 'alice@thecodeforge.io'})
REMOVE user:Admin
RETURN labels(user) AS labels;  // ['User'] — Admin label removed

// io.thecodeforge: Production pattern — partial update with coalesce
// Preserve existing rating if the update doesn't include one
MATCH (product:Product {sku: 'WIDGET-PRO'})
SET product.name = coalesce($update_name, product.name),
    product.price = coalesce($update_price, product.price),
    product.updated_at = datetime()
RETURN product.name, product.price;

Output

╒══════════════╤══════════╤═════════════════════════╕

│"order.order_id"│"order.status"│"order.tracking_number"│

╞══════════════╪══════════╪═════════════════════════╡

│"ORD-2024-001"│"shipped" │"TRACK-99821" │

└──────────────┴──────────┴─────────────────────────┘

⚠ SET += Merges, SET = Replaces:

SET user += {name: 'Alice V2'} merges the map into existing properties — other properties stay. SET user = {name: 'Alice V2'} replaces ALL properties with just the ones in the map — everything else is deleted. In a REST PATCH endpoint, use +=. In a REST PUT endpoint, use =. Getting this wrong in production silently deletes user data.

DELETE and DETACH DELETE: Removing Data

DELETE removes nodes and relationships. DETACH DELETE removes a node and ALL its relationships in one operation. This distinction matters — you cannot delete a node that still has relationships. If you try, Neo4j throws an error. DETACH DELETE bypasses this by removing relationships first, then the node.

In production, soft deletes (setting a 'deleted: true' property) are almost always better than hard deletes. Hard deletes break referential integrity in the graph — other nodes that referenced the deleted node now have dangling relationships. Soft deletes preserve the graph structure while hiding the node from queries (WHERE n.deleted IS NULL OR n.deleted = false).

When you must hard delete, always use DETACH DELETE for nodes. Using plain DELETE on a node with relationships will fail.

io/thecodeforge/graph/DeleteOperations.cypherCYPHER

// io.thecodeforge: DELETE — remove a relationship only
// The nodes remain; only the connection is severed
MATCH (alice:Person {name: 'Alice'})-[r:FRIEND]->(bob:Person {name: 'Bob'})
DELETE r
RETURN 'Friendship removed' AS result;

// io.thecodeforge: DETACH DELETE — remove a node and ALL its relationships
// Use this when deleting an entity entirely
MATCH (user:User {email: 'spam@bot.net'})
DETACH DELETE user;

// io.thecodeforge: Soft delete — production pattern
// Mark as deleted instead of removing; preserves graph integrity
MATCH (order:Order {order_id: 'ORD-2024-999'})
SET order.deleted = true,
    order.deleted_at = datetime(),
    order.deleted_by = 'admin@thecodeforge.io'
RETURN order.order_id, order.deleted;

// io.thecodeforge: Filtering out soft-deleted nodes in queries
// Every MATCH in your app should include this filter
MATCH (order:Order)
WHERE order.deleted IS NULL
RETURN order.order_id, order.total
ORDER BY order.created_at DESC
LIMIT 20;

// io.thecodeforge: Delete all relationships of a specific type
// Useful for bulk cleanup during data migrations
MATCH ()-[r:TEMPORARY_LINK]->()
DELETE r
RETURN count(r) AS deleted_count;

// io.thecodeforge: Conditional delete — only delete if certain conditions met
MATCH (session:Session)
WHERE session.last_active < datetime() - duration('P30D')
  AND session.active = false
DETACH DELETE session
RETURN count(session) AS expired_sessions_deleted;

Output

╒═══════════════════════════╕

│"expired_sessions_deleted" │

╞═══════════════════════════╡

│147 │

└───────────────────────────┘

⚠ Never DELETE a Node Without DETACH:

MATCH (n:User {id: 'x'}) DELETE n will throw 'Cannot delete node, because it still has relationships.' You must either delete the relationships first or use DETACH DELETE. In production, forgetting DETACH is the #1 cause of delete failures in batch jobs. If you're unsure, always use DETACH DELETE — removing orphaned relationships is almost always the right behavior.

MERGE: Idempotent Create-or-Find

MERGE is CREATE + MATCH in one operation. It checks if a pattern exists — if yes, it matches it; if no, it creates it. This makes MERGE the idempotent alternative to CREATE, essential for scripts that might run multiple times, ETL pipelines, and any scenario where duplicate data is dangerous.

MERGE has two companion clauses: ON CREATE SET (properties to set only when creating a new node) and ON MATCH SET (properties to set only when matching an existing node). Together, these give you upsert behavior: insert if new, update if existing.

Critical gotcha: MERGE matches the ENTIRE pattern, including relationships. If you MERGE (a)-[:FRIEND]->(b) and the nodes exist but the relationship doesn't, MERGE creates the relationship. But if only (a) exists and (b) doesn't, MERGE creates both (a) and (b) — it doesn't look up (b) by name first. This is the most common source of MERGE confusion.

io/thecodeforge/graph/MergeOperations.cypherCYPHER

// io.thecodeforge: MERGE — find or create a node
// First run: creates the node. Second run: finds it.
MERGE (p:Project {name: 'ForgeAPI'})
ON CREATE SET p.created_at = datetime(),
              p.version = '1.0',
              p.status = 'active'
ON MATCH SET p.last_accessed = datetime()
RETURN p.name, p.version, p.status;

// io.thecodeforge: MERGE — idempotent relationship creation
// Prevents duplicate FRIEND relationships between the same two people
MATCH (alice:Person {name: 'Alice'})
MATCH (bob:Person {name: 'Bob'})
MERGE (alice)-[r:FRIEND]->(bob)
ON CREATE SET r.since = date(), r.source = 'manual'
RETURN alice.name, type(r) AS relationship, bob.name, r.since;

// io.thecodeforge: MERGE with unique constraint — production pattern
// Requires: CREATE CONSTRAINT unique_email FOR (u:User) REQUIRE u.email IS UNIQUE
MERGE (user:User {email: 'dana@thecodeforge.io'})
ON CREATE SET user.name = 'Dana',
              user.created_at = datetime(),
              user.signup_source = 'organic'
ON MATCH SET user.last_login = datetime()
RETURN user.name, user.email, user.created_at;

// io.thecodeforge: MERGE on a relationship path — create if not exists
// This is the upsert pattern for connection-based data
MATCH (product:Product {sku: 'WIDGET-PRO'})
MATCH (category:Category {name: 'Electronics'})
MERGE (product)-[r:IN_CATEGORY]->(category)
ON CREATE SET r.assigned_at = datetime(),
              r.assigned_by = 'system'
RETURN product.name, type(r), category.name;

// io.thecodeforge: MERGE for ETL — batch upsert from external data
// Load a list of users and upsert each one
UNWIND [
  {email: 'alice@thecodeforge.io', name: 'Alice', score: 95},
  {email: 'bob@thecodeforge.io', name: 'Bob', score: 87},
  {email: 'newuser@thecodeforge.io', name: 'Eve', score: 92}
] AS row
MERGE (u:User {email: row.email})
ON CREATE SET u.name = row.name,
              u.score = row.score,
              u.first_seen = datetime()
ON MATCH SET u.score = row.score,
             u.last_updated = datetime()
RETURN u.email, u.name, u.score;

Output

╒══════════════════════════╤══════════╤═════════╤═════════════════════╕

│"u.email" │"u.name" │"u.score"│"u.first_seen" │

╞══════════════════════════╪══════════╪═════════╪═════════════════════╡

│"alice@thecodeforge.io" │"Alice" │95 │2024-01-10T08:00:00Z │

│"bob@thecodeforge.io" │"Bob" │87 │2024-01-10T08:00:01Z │

│"newuser@thecodeforge.io" │"Eve" │92 │2024-03-15T14:30:00Z │

└──────────────────────────┴──────────┴─────────┴─────────────────────┘

⚠ MERGE Creates What Doesn't Exist — Be Specific:

MERGE (a:Person {name: 'Alice'})-[:FRIEND]->(b:Person {name: 'Bob'}) checks if the ENTIRE pattern exists. If Alice exists but Bob doesn't, MERGE creates a NEW Bob node — it doesn't look up Bob by name in the broader graph. To avoid phantom nodes, MATCH the nodes first, then MERGE the relationship: MATCH (a:Person {name: 'Alice'}) MATCH (b:Person {name: 'Bob'}) MERGE (a)-[:FRIEND]->(b).

WITH: Chaining Subqueries and Aggregation Pipelines

WITH is the pipe operator of Cypher — it passes results from one part of the query to the next. Think of it as the equivalent of a subquery boundary or a GROUP BY + HAVING in SQL. Every clause after WITH operates on the results of the clause before it.

WITH has three critical uses: 1. Aggregation pipelines — aggregate data, then filter or sort on the aggregated result. In SQL, you'd use HAVING. In Cypher, you aggregate in one WITH, then filter in the next WHERE. 2. Query chaining — break a complex query into readable stages. Each WITH acts as a checkpoint. 3. Limiting scope — pass only specific variables forward, which the engine uses to optimize the query plan.

The rule: any variable not listed in the WITH clause is no longer available downstream. This is deliberate — it lets the query planner know exactly what data flows forward, enabling optimizations.

io/thecodeforge/graph/WithClause.cypherCYPHER

// io.thecodeforge: WITH for aggregation — find prolific contributors
// Count projects per developer, then filter on the count
MATCH (dev:Developer)-[:CONTRIBUTES_TO]->(proj:Project)
WITH dev, count(proj) AS project_count
WHERE project_count >= 3
RETURN dev.name AS developer, project_count
ORDER BY project_count DESC;

// io.thecodeforge: WITH for query chaining — multi-stage filtering
// Stage 1: Find active projects
// Stage 2: Find developers on those projects
// Stage 3: Aggregate and filter
MATCH (proj:Project {status: 'active'})
WITH proj
MATCH (dev:Developer)-[:CONTRIBUTES_TO]->(proj)
WITH dev, collect(proj.name) AS projects, count(proj) AS num_projects
WHERE num_projects >= 2
RETURN dev.name, projects, num_projects
ORDER BY num_projects DESC;

// io.thecodeforge: WITH with LIMIT — process in batches
// Find the 100 most recent users, then do expensive operations on them
MATCH (user:User)
WITH user
ORDER BY user.created_at DESC
LIMIT 100
MATCH (user)-[:PLACED]->(order:Order)
WITH user, count(order) AS order_count, sum(order.total) AS total_spent
WHERE total_spent > 500
RETURN user.name, user.email, order_count, total_spent
ORDER BY total_spent DESC;

// io.thecodeforge: WITH as a scope boundary — only pass what you need
MATCH (dev:Developer)-[:CONTRIBUTES_TO]->(proj:Project)
WITH dev.name AS name, dev.language AS lang, count(proj) AS projects
// dev is no longer available here — only name, lang, projects
WHERE lang = 'Java'
RETURN name, lang, projects
ORDER BY projects DESC;

// io.thecodeforge: WITH + COLLECT — build lists per group
MATCH (cat:Category)<-[:IN_CATEGORY]-(product:Product)
WITH cat, collect(product.name) AS products, count(product) AS product_count
WHERE product_count >= 5
RETURN cat.name AS category, products, product_count
ORDER BY product_count DESC;

Output

╒══════════════╤══════════════════════════════════════════════╤═══════════════╕

│"category" │"products" │"product_count"│

╞══════════════╪══════════════════════════════════════════════╪═══════════════╡

│"Electronics" │["Widget Pro","Gadget X","Sensor V2",...] │12 │

│"Clothing" │["Tee Basic","Hoodie Dev","Cap Logo",...] │8 │

└──────────────┴──────────────────────────────────────────────┴───────────────┘

🔥WITH Is Mandatory After Aggregation:

You cannot chain a WHERE clause directly after an aggregation like count(). The aggregation result must pass through a WITH (or RETURN) first. MATCH (dev)-[:CONTRIBUTES_TO]->(proj) WITH dev, count(proj) AS cnt WHERE cnt > 3 works. MATCH (dev)-[:CONTRIBUTES_TO]->(proj) WHERE count(proj) > 3 does NOT work — it's a syntax error. This is the Cypher equivalent of SQL's HAVING vs WHERE distinction.

Aggregation Functions: COUNT, COLLECT, SUM, AVG, and Friends

Cypher supports the standard set of aggregation functions: count(), collect(), sum(), avg(), min(), max(), percentileCont(), percentileDisc(), and stDev(). The two you'll use most are count() and collect().

count() has two forms: count(n) counts non-null values of n, count() counts all rows regardless of nulls. This distinction matters — count(user) skips rows where user is null (like from an OPTIONAL MATCH), while count() includes them.

collect() is the most powerful aggregation function in Cypher — it gathers all values into a list. This is how you build nested results: group by a category, collect the items in that category into a list, return the category with its items. It replaces SQL's GROUP_CONCAT and subqueries for building nested JSON responses.

All aggregation functions ignore null values except count(*).

io/thecodeforge/graph/AggregationFunctions.cypherCYPHER

// io.thecodeforge: COUNT — two forms matter
// count(*) counts all rows including nulls
// count(order) counts only non-null orders
MATCH (user:User)
OPTIONAL MATCH (user)-[:PLACED]->(order:Order)
RETURN user.name,
       count(*) AS total_rows,           // includes users with no orders
       count(order) AS orders_placed      // excludes null orders
ORDER BY orders_placed DESC;

// io.thecodeforge: COLLECT — build nested lists
// Group products by category, return category with its products
MATCH (cat:Category)<-[:IN_CATEGORY]-(product:Product)
WITH cat.name AS category, collect(product) AS products
RETURN category, size(products) AS count, products
ORDER BY count DESC;

// io.thecodeforge: COLLECT with property extraction — cleaner output
// Collect just the names, not full node objects
MATCH (proj:Project)<-[:CONTRIBUTES_TO]-(dev:Developer)
WITH proj.name AS project, collect(dev.name) AS contributors
RETURN project, contributors
ORDER BY size(contributors) DESC;

// io.thecodeforge: SUM and AVG — numeric aggregations
MATCH (order:Order)-[:CONTAINS]->(item:OrderItem)
WITH order,
     sum(item.quantity * item.unit_price) AS order_total
RETURN avg(order_total) AS avg_order_value,
       min(order_total) AS min_order,
       max(order_total) AS max_order,
       sum(order_total) AS total_revenue,
       count(order) AS total_orders;

// io.thecodeforge: PERCENTILE — distribution analysis
MATCH (user:User)-[:PLACED]->(order:Order)
WITH user, count(order) AS order_count
RETURN percentileCont(order_count, 0.5) AS median_orders,
       percentileCont(order_count, 0.95) AS p95_orders,
       percentileCont(order_count, 0.99) AS p99_orders,
       avg(order_count) AS mean_orders;

// io.thecodeforge: COLLECT for top-N per group
// Top 3 products per category by rating
MATCH (cat:Category)<-[:IN_CATEGORY]-(product:Product)
WITH cat, product
ORDER BY product.rating DESC
WITH cat.name AS category, collect(product)[0..3] AS top_products
RETURN category, top_products;

Output

╒════════════════╤═════════════╤══════════╤══════════╤══════════════════╤═════════════╕

│"avg_order_value"│"min_order" │"max_order"│"total_revenue"│"total_orders" │

╞════════════════╪═════════════╪══════════╪══════════════════╪═══════════════╡

│73.42 │12.99 │892.50 │1,847,293.00 │25,162 │

└────────────────┴─────────────┴──────────┴──────────────────┴───────────────┘

💡collect()[0..N] for Top-N Per Group:

The pattern collect(x) ORDER BY x.score DESC then collect(x)[0..3] is Cypher's equivalent of a window function with ROW_NUMBER(). It gives you the top N items per group without a subquery. This is the standard pattern for leaderboards, top products per category, and most-active users per team.

Variable-Length Paths: Traversing Multiple Hops

Variable-length paths let you traverse a variable number of relationships in one pattern. Instead of writing out each hop explicitly, you specify a range: [1..3] means 1 to 3 hops, [] means any number of hops, [*3] means exactly 3 hops.

This is the feature that makes graph databases shine for social networks (friends of friends), supply chains (supplier of supplier), and recommendation engines (users who bought what you bought bought what?).

Critical warning: unbounded paths ([*] with no upper limit) will explore every reachable node in the graph and can cause out-of-memory errors on large graphs. ALWAYS set a maximum depth. For most use cases, 1-5 hops is sufficient. Beyond 6 hops, the results are usually meaningless (everyone on Facebook is within 4.7 hops of everyone else).

Use shortestPath() and allShortestPaths() when you specifically want the shortest route between two nodes, not all possible paths.

io/thecodeforge/graph/VariableLengthPaths.cypherCYPHER

// io.thecodeforge: Friends of friends — 2 hops exactly
MATCH (me:Person {name: 'Alice'})-[:FRIEND*2]->(fof)
WHERE fof <> me  // exclude Alice herself
RETURN DISTINCT fof.name AS friend_of_friend
ORDER BY fof.name;

// io.thecodeforge: Friends up to 3 levels deep — 1 to 3 hops
MATCH (me:Person {name: 'Alice'})-[:FRIEND*1..3]->(connection)
WHERE connection <> me
RETURN DISTINCT connection.name AS connection,
       min(length(shortestPath((me)-[:FRIEND*]->(connection)))) AS degrees
ORDER BY degrees, connection.name;

// io.thecodeforge: Supply chain — find suppliers of suppliers
MATCH (product:Product {name: 'Widget Pro'})-[:SUPPLIED_BY*1..4]->(supplier)
RETURN product.name AS product,
       supplier.name AS ultimate_supplier,
       supplier.country AS origin_country;

// io.thecodeforge: Shortest path between two people
MATCH path = shortestPath(
  (alice:Person {name: 'Alice'})-[:FRIEND*]-(bob:Person {name: 'Bob'})
)
RETURN [n IN nodes(path) | n.name] AS path_names,
       length(path) AS degrees_of_separation;

// io.thecodeforge: All shortest paths (if multiple equally-short routes exist)
MATCH path = allShortestPaths(
  (alice:Person {name: 'Alice'})-[:FRIEND*]-(bob:Person {name: 'Bob'})
)
RETURN [n IN nodes(path) | n.name] AS path_names,
       length(path) AS hops;

// io.thecodeforge: Bounded traversal with filtering at each hop
// Find paths where every intermediate person is over 25
MATCH (alice:Person {name: 'Alice'})-[:FRIEND*1..3]->(target:Person)
WHERE all(node IN nodes(path) WHERE node.age >= 25 OR node = alice OR node = target)
RETURN DISTINCT target.name, length(path) AS hops;

Output

╒════════════════════════╤═══════════════════════╕

│"path_names" │"degrees_of_separation"│

╞════════════════════════╪═══════════════════════╡

│["Alice","Charlie","Bob"]│2 │

└────────────────────────┴───────────────────────┘

⚠ Never Use Unbounded Paths in Production:

MATCH (a)-[]->(b) with no upper limit explores every reachable node from 'a'. On a graph with 1 million nodes and dense relationships, this can consume all available memory and crash the server. Always use [1..5] or similar bounded ranges. If you need to find all reachable nodes, use the APOC procedure apoc.path.expandConfig with a maxDepth parameter and a timeout.

UNWIND: Working with Lists and Expanding Collections

UNWIND takes a list and expands it into individual rows — the opposite of collect(). If collect() gathers rows into a list, UNWIND spreads a list into rows. This is essential for processing batch data, transforming aggregated results back into individual records, and working with JSON arrays passed from application code.

The most common production use: UNWIND a parameter list to create or merge multiple nodes in one query. Instead of executing 100 separate CREATE statements (100 round trips), you UNWIND a list of 100 items and create all nodes in one query (1 round trip).

UNWIND + MERGE is the standard ETL pattern for loading data into Neo4j.

io/thecodeforge/graph/UnwindOperations.cypherCYPHER

// io.thecodeforge: UNWIND — expand a list into rows
UNWIND ['Alice', 'Bob', 'Charlie'] AS name
MERGE (p:Person {name: name})
RETURN p.name;

// io.thecodeforge: UNWIND for batch creation — 1 query, 100 nodes
// In production, $users comes from a JSON request body
UNWIND $users AS user_data
MERGE (u:User {email: user_data.email})
ON CREATE SET u.name = user_data.name,
              u.created_at = datetime()
RETURN count(u) AS users_processed;

// io.thecodeforge: UNWIND to create relationships in batch
// Connect a list of skills to a developer in one query
MATCH (dev:Developer {name: 'Alice'})
UNWIND ['Java', 'Spring Boot', 'Neo4j', 'Kubernetes'] AS skill_name
MERGE (s:Skill {name: skill_name})
MERGE (dev)-[:HAS_SKILL {level: 'expert'}]->(s)
RETURN dev.name, collect(s.name) AS skills;

// io.thecodeforge: UNWIND after COLLECT — flatten and reprocess
// Find all categories, collect their products, then flatten for analysis
MATCH (cat:Category)<-[:IN_CATEGORY]-(product:Product)
WITH cat, collect(product) AS products
UNWIND products AS product
RETURN cat.name AS category,
       product.name AS product,
       product.price AS price
ORDER BY cat.name, product.price DESC;

// io.thecodeforge: UNWIND with index — track position in the list
UNWIND ['alpha', 'beta', 'gamma', 'delta'] AS item
WITH item, range(0, 3)[indexOf(['alpha','beta','gamma','delta'], item)] AS idx
MERGE (n:Item {name: item})
SET n.sort_order = idx
RETURN n.name, n.sort_order
ORDER BY n.sort_order;

// io.thecodeforge: UNWIND for data cleanup — split comma-separated values
MATCH (legacy:LegacyRecord)
WHERE legacy.tags IS NOT NULL
WITH legacy, split(legacy.tags, ',') AS tag_list
UNWIND tag_list AS tag
MERGE (t:Tag {name: trim(tag)})
MERGE (legacy)-[:TAGGED_WITH]->(t)
RETURN count(DISTINCT legacy) AS records_processed, count(DISTINCT t) AS tags_created;

Output

╒═══════════════════════╤═════════════════╕

│"records_processed" │"tags_created" │

╞═══════════════════════╪═════════════════╡

│4,291 │187 │

└───────────────────────┴─────────────────┘

💡UNWIND $list Is the Batch API Pattern:

Every Neo4j driver (Java, Python, JavaScript) supports parameterized queries. Pass a list as $items, UNWIND it in Cypher, and process all items in a single database round trip. This is 100x faster than executing individual queries in a loop from your application code. In Spring Data Neo4j, use @Query with UNWIND $batch AS row MERGE ... for batch operations.

Subqueries with CALL: Encapsulating Complex Logic

CALL introduces subqueries in Cypher — self-contained query blocks that execute independently and return results to the outer query. There are two forms: CALL {} for inline subqueries (like SQL subqueries) and CALL procedure.name() for invoking stored procedures (like stored procedures in SQL).

CALL {} subqueries are essential for isolating aggregation, avoiding variable scope conflicts, and breaking complex queries into readable chunks. Each CALL {} block has its own scope — variables must be explicitly passed in and out.

CALL procedure is how you invoke APOC procedures, Graph Data Science algorithms, and custom procedures. The syntax is CALL db.indexes() YIELD indexName, state RETURN indexName.

io/thecodeforge/graph/SubqueriesAndCall.cypherCYPHER

// io.thecodeforge: CALL {} — inline subquery for isolated aggregation
// Find developers, and for each, count their projects in a subquery
MATCH (dev:Developer)
CALL {
  WITH dev
  MATCH (dev)-[:CONTRIBUTES_TO]->(proj:Project)
  RETURN count(proj) AS project_count, collect(proj.name) AS project_names
}
RETURN dev.name, project_count, project_names
ORDER BY project_count DESC;

// io.thecodeforge: CALL {} — find the most recent order per user
MATCH (user:User)
CALL {
  WITH user
  MATCH (user)-[:PLACED]->(order:Order)
  RETURN order
  ORDER BY order.created_at DESC
  LIMIT 1
}
RETURN user.name, order.order_id, order.total, order.created_at
ORDER BY order.created_at DESC;

// io.thecodeforge: CALL db.indexes — list all indexes in the database
CALL db.indexes()
YIELD name, type, entityType, labelsOrTypes, properties, state
RETURN name, type, entityType, labelsOrTypes, properties, state
ORDER BY entityType, name;

// io.thecodeforge: CALL db.schema.visualization — see the graph schema
CALL db.schema.visualization()
YIELD nodes, relationships
RETURN nodes, relationships;

// io.thecodeforge: CALL apoc (if APOC is installed) — advanced procedures
// Export a subgraph to JSON for backup or migration
CALL apoc.export.json.query(
  'MATCH (u:User)-[:PLACED]->(o:Order) RETURN u, o LIMIT 100',
  null,
  {stream: true}
)
YIELD data
RETURN data;

// io.thecodeforge: CALL with UNION — combine results from multiple patterns
MATCH (dev:Developer {name: 'Alice'})
CALL {
  WITH dev
  MATCH (dev)-[:CONTRIBUTES_TO]->(proj:Project)
  RETURN proj.name AS item, 'project' AS type
  UNION
  WITH dev
  MATCH (dev)-[:HAS_SKILL]->(skill:Skill)
  RETURN skill.name AS item, 'skill' AS type
}
RETURN dev.name, collect({name: item, type: type}) AS contributions;

Output

╒══════════╤══════════════╤═══════════════════════════════════════════╕

│"dev.name"│"project_count"│"project_names" │

╞══════════╪══════════════╪═══════════════════════════════════════════╡

│"Alice" │4 │["ForgeAPI","DataPipeline","AuthSvc","..."│

│"Charlie" │3 │["DataPipeline","ETLRunner","Monitoring"] │

│"Dana" │1 │["ForgeAPI"] │

└──────────┴──────────────┴───────────────────────────────────────────┘

🔥CALL {} Isolates Scope — Use It to Avoid Variable Conflicts:

If your query has multiple MATCH clauses that reuse the same variable names, the query planner can get confused about which variable refers to what. Wrapping each section in CALL {} gives it a clean scope. This is especially important in complex queries with 4+ MATCH clauses — the readability improvement alone is worth the extra syntax.

Indexes and Constraints: Schema Management for Performance

Neo4j is often described as 'schema-optional,' but that doesn't mean you should skip indexing. Without indexes, every MATCH on a property scans every node with that label — O(n). With a range index, the same MATCH uses a B-tree lookup — O(log n). With a uniqueness constraint, you get both an index AND a guarantee that no duplicates can be created.

Three types of indexes: 1. Range index — the default, supports equality and range queries (=, <, >, IN). CREATE INDEX FOR (n:Label) ON (n.property) 2. Text index — optimized for CONTAINS, ENDS WITH, and full-text search. CREATE TEXT INDEX FOR (n:Label) ON (n.property) 3. Point index — for geospatial queries. CREATE POINT INDEX FOR (n:Label) ON (n.location)

Constraints enforce data integrity: 1. Uniqueness — no two nodes with the same label can have the same property value. CREATE CONSTRAINT FOR (n:User) REQUIRE n.email IS UNIQUE 2. Node key — uniqueness + existence (the property must exist and be unique). CREATE CONSTRAINT FOR (n:User) REQUIRE (n.email, n.tenant_id) IS NODE KEY 3. Existence — the property must exist (not null). CREATE CONSTRAINT FOR (n:User) REQUIRE n.email IS NOT NULL

Production rule: create uniqueness constraints on every entity's primary identifier before loading data. This prevents duplicates at the database level and gives you a free index for lookups.

io/thecodeforge/graph/IndexesAndConstraints.cypherCYPHER

// io.thecodeforge: Create a range index — for fast property lookups
CREATE INDEX user_email_index FOR (u:User) ON (u.email);
CREATE INDEX product_sku_index FOR (p:Product) ON (p.sku);
CREATE INDEX order_status_index FOR (o:Order) ON (o.status);

// io.thecodeforge: Create a composite index — for queries that filter on multiple props
CREATE INDEX user_tenant_email_index FOR (u:User) ON (u.tenant_id, u.email);

// io.thecodeforge: Create a text index — for CONTAINS and ENDS WITH queries
CREATE TEXT INDEX product_name_text_index FOR (p:Product) ON (p.name);

// io.thecodeforge: Create a uniqueness constraint — prevents duplicate entities
CREATE CONSTRAINT user_email_unique FOR (u:User) REQUIRE u.email IS UNIQUE;
CREATE CONSTRAINT product_sku_unique FOR (p:Product) REQUIRE p.sku IS UNIQUE;
CREATE CONSTRAINT order_id_unique FOR (o:Order) REQUIRE o.order_id IS UNIQUE;

// io.thecodeforge: Create a node key — composite uniqueness + existence
CREATE CONSTRAINT user_tenant_key FOR (u:User) REQUIRE (u.email, u.tenant_id) IS NODE KEY;

// io.thecodeforge: Create an existence constraint — property must exist
CREATE CONSTRAINT user_email_exists FOR (u:User) REQUIRE u.email IS NOT NULL;

// io.thecodeforge: List all indexes and their status
SHOW INDEXES YIELD name, type, entityType, labelsOrTypes, properties, state
ORDER BY entityType, name;

// io.thecodeforge: List all constraints
SHOW CONSTRAINTS YIELD name, type, entityType, labelsOrTypes, properties
ORDER BY entityType, name;

// io.thecodeforge: Drop an index (if you need to recreate it)
DROP INDEX user_email_index;

// io.thecodeforge: Drop a constraint
DROP CONSTRAINT user_email_unique;

Output

╒══════════════════════════╤═════════╤════════════╤═══════════════╤══════════════╤═════════╕

│"name" │"type" │"entityType"│"labelsOrTypes"│"properties" │"state" │

╞══════════════════════════╪═════════╪════════════╪═══════════════╪══════════════╪═════════╡

│"user_email_index" │"RANGE" │"NODE" │["User"] │["email"] │"ONLINE" │

│"product_sku_index" │"RANGE" │"NODE" │["Product"] │["sku"] │"ONLINE" │

│"order_status_index" │"RANGE" │"NODE" │["Order"] │["status"] │"ONLINE" │

└──────────────────────────┴─────────┴────────────┴───────────────┴──────────────┴─────────┘

⚠ Constraints Create Indexes — Don't Duplicate:

A uniqueness constraint automatically creates a range index on the same property. If you create both CREATE INDEX FOR (u:User) ON (u.email) and CREATE CONSTRAINT FOR (u:User) REQUIRE u.email IS UNIQUE, you have two indexes doing the same work — wasting memory and slowing down writes. Rule: if you need uniqueness, create the constraint (it includes an index). If you only need fast lookups without uniqueness, create the index.

The value of Cypher becomes clear when you see it applied to real problems. Here are three production patterns that demonstrate why graph databases outperform relational databases for connected data.

Recommendation Engine — 'Customers who bought X also bought Y.' In SQL, this requires a self-join on the purchases table with a GROUP BY, which gets exponentially slower as purchase history grows. In Cypher, it's a two-hop traversal: find everyone who bought the same product, then find what else they bought.

Fraud Detection — Find rings of accounts that share addresses, phone numbers, or devices. In SQL, this requires multiple joins across identity tables. In Cypher, you match a pattern where multiple accounts connect to the same shared resource.

Social Graph — Find the shortest path between two people, suggest connections based on mutual friends, identify communities. These are all natural graph traversals that Cypher handles natively.

io/thecodeforge/graph/RealWorldPatterns.cypherCYPHER

// io.thecodeforge: RECOMMENDATION ENGINE
// 'Customers who bought this also bought' — collaborative filtering
MATCH (customer:Customer)-[:PURCHASED]->(product:Product {sku: 'WIDGET-PRO'})
<-[:PURCHASED]-(other:Customer)-[:PURCHASED]->(recommended:Product)
WHERE recommended <> product  // exclude the product they already bought
  AND NOT (customer)-[:PURCHASED]->(recommended)  // exclude what they already own
RETURN recommended.name AS product,
       recommended.sku AS sku,
       count(other) AS bought_by_others
ORDER BY bought_by_others DESC
LIMIT 5;

// io.thecodeforge: FRAUD DETECTION — shared identity signals
// Find accounts that share a phone number OR address — potential fraud ring
MATCH (a1:Account)-[:HAS_PHONE]->(phone:Phone)<-[:HAS_PHONE]-(a2:Account)
WHERE a1 <> a2
WITH a1, a2, collect(DISTINCT phone.number) AS shared_phones
MATCH (a1)-[:HAS_ADDRESS]->(addr:Address)<-[:HAS_ADDRESS]-(a2)
WITH a1, a2, shared_phones, collect(DISTINCT addr.full_address) AS shared_addresses
WHERE size(shared_phones) >= 1 AND size(shared_addresses) >= 1
RETURN a1.account_id AS account_1,
       a2.account_id AS account_2,
       shared_phones,
       shared_addresses,
       size(shared_phones) + size(shared_addresses) AS risk_score
ORDER BY risk_score DESC
LIMIT 20;

// io.thecodeforge: SOCIAL GRAPH — mutual friends suggestion
MATCH (me:Person {name: 'Alice'})-[:FRIEND]->(mutual:Person)-[:FRIEND]->(suggestion:Person)
WHERE NOT (me)-[:FRIEND]->(suggestion)  // not already friends
  AND suggestion <> me                  // not yourself
RETURN suggestion.name AS suggested_friend,
       count(mutual) AS mutual_friends,
       collect(mutual.name) AS through
ORDER BY mutual_friends DESC
LIMIT 10;

// io.thecodeforge: SOCIAL GRAPH — community detection (simplified)
// Find groups of 3+ people who are all friends with each other
MATCH (a:Person)-[:FRIEND]->(b:Person)-[:FRIEND]->(c:Person)-[:FRIEND]->(a)
WHERE id(a) < id(b) AND id(b) < id(c)  // avoid counting same triangle multiple times
RETURN a.name, b.name, c.name AS triangle
ORDER BY a.name;

// io.thecodeforge: KNOWLEDGE GRAPH — find all entities connected to a concept
MATCH (concept:Concept {name: 'Microservices'})-[:RELATED_TO*1..2]->(related)
WHERE related <> concept
RETURN DISTINCT labels(related)[0] AS type,
       related.name AS entity,
       related.description AS description
ORDER BY type, entity
LIMIT 20;

Output

╒═══════════════════╤══════════╤═══════════════════════════════════════╕

│"product" │"sku" │"bought_by_others" │

╞═══════════════════╪══════════╪═══════════════════════════════════════╡

│"Gadget X" │"GADG-001"│847 │

│"Sensor V2" │"SENS-002"│623 │

│"Widget Mini" │"WIDM-003"│412 │

│"Power Pack" │"POWR-004"│298 │

│"Cable Pro" │"CABL-005"│156 │

└───────────────────┴──────────┴───────────────────────────────────────┘

💡The Recommendation Query Is Just Two Hops:

The entire 'customers who bought X also bought Y' feature is: (me)-[:PURCHASED]->(product)<-[:PURCHASED]-(other)-[:PURCHASED]->(rec). That's it. Four nodes, three relationships, one pattern. In SQL, this is a self-join with aggregation and exclusion subqueries — 15+ lines of SQL that gets slower as the purchase table grows. In Cypher, it's a single pattern that runs in milliseconds because the graph stores the relationships as direct pointers.

PROFILE and EXPLAIN: Query Performance Analysis

Every query you write for production should be profiled first. Neo4j provides two tools: EXPLAIN shows the execution plan without running the query (fast, safe for production). PROFILE runs the query and shows the execution plan with actual row counts and database hits (slower, shows real performance).

The execution plan is a tree of operators: NodeByLabelScan (scans all nodes with a label), NodeIndexSeek (uses an index), Expand (traverses a relationship), Filter (applies WHERE conditions), ProduceResult (returns data). Your goal: minimize NodeByLabelScan operations and maximize NodeIndexSeek operations.

The 'Eager' operator is a red flag — it means the engine had to load all intermediate results into memory before continuing. This happens with certain aggregation patterns and can cause memory issues on large graphs. If you see Eager, consider restructuring the query.

Database hits are the unit of work — fewer hits = faster query. A query with 100 database hits is fast. A query with 10 million database hits is slow, regardless of how many rows it returns.

io/thecodeforge/graph/QueryProfiling.cypherCYPHER

// io.thecodeforge: EXPLAIN — show plan without executing (safe for production)
EXPLAIN
MATCH (dev:Developer {language: 'Java'})-[:CONTRIBUTED_TO]->(proj:Project)
WHERE proj.status = 'active'
RETURN dev.name, proj.name;

// io.thecodeforge: PROFILE — execute and show real metrics
PROFILE
MATCH (dev:Developer {language: 'Java'})-[:CONTRIBUTED_TO]->(proj:Project)
WHERE proj.status = 'active'
RETURN dev.name, proj.name;

// io.thecodeforge: BAD QUERY — no label, forces full scan
// This scans every node in the database
EXPLAIN
MATCH (n)
WHERE n.email = 'alice@thecodeforge.io'
RETURN n;

// io.thecodeforge: GOOD QUERY — uses label + index
// This uses the User label index and the email property index
EXPLAIN
MATCH (u:User {email: 'alice@thecodeforge.io'})
RETURN u;

// io.thecodeforge: Identifying the Eager operator
// If you see Eager in the plan, the engine is buffering all results
EXPLAIN
MATCH (order:Order)
WITH order
ORDER BY order.created_at DESC
MATCH (order)-[:CONTAINS]->(item:OrderItem)
RETURN order, collect(item) AS items;

// io.thecodeforge: Checking if an index is being used
// Run this after creating an index to verify the plan uses it
EXPLAIN
MATCH (u:User {email: 'test@thecodeforge.io'})
RETURN u.name;
// If the plan shows NodeIndexSeek, the index is working.
// If it shows NodeByLabelScan, the index isn't being used — check your query.

Output

PROFILE output for the good query:

+------------------+----------------+---------+-----------+

+------------------+----------------+---------+-----------+

| ProduceResult | 1 | 0 | 0 |

| +NodeIndexSeek | 1 | 4 | 120 |

+------------------+----------------+---------+-----------+

Total database hits: 4 ← This is fast.

PROFILE output for the bad query:

+----------------------+----------------+-----------+---------------+

+----------------------+----------------+-----------+---------------+

| ProduceResult | 1 | 0 | 0 |

| +Filter | 1 | 2,847,293 | 480 |

| +NodeByLabelScan | 2,847,293 | 2,847,294 | 32,768 |

+----------------------+----------------+-----------+---------------+

Total database hits: 5,694,587 ← This is slow. Add an index.

🔥NodeIndexSeek Good, NodeByLabelScan Bad:

When reading a PROFILE or EXPLAIN output, look for these two operators. NodeIndexSeek means the engine used an index — fast. NodeByLabelScan means it scanned every node with that label — slow. If your query shows NodeByLabelScan where you expected NodeIndexSeek, check that the index exists (SHOW INDEXES) and that your query filters on the indexed property.

Who This S**t Is For (And Who Should Walk Away)

This page is for developers who have already written a few SQL JOINs and wonder why their relational model is melting under 200ms queries. You know query performance matters. You’ve shipped code that didn’t break in production — yet.

You’ll get maximum value if you’re building recommendation engines, fraud detection pipelines, or social graph features where traversal depth > table width. If you’ve never written a WHERE clause, go learn SQL first. Cypher won’t save you from not understanding data modeling.

What you won’t find: fluff, hand-holding, or Java Spring setup. That’s the framework’s problem, not yours. You’ll leave here able to write production-grade Cypher that doesn’t blow up your heap.

WhoShouldReadThis.sqlSQL

// io.thecodeforge — database tutorial

// Fool: runs this and wonders why 10M nodes crash
MATCH (n) RETURN n LIMIT 1000;

// Senior: knows to filter first
MATCH (n:User {status: 'active'})
RETURN n.name, n.created_at
LIMIT 1000;

Output

╒══════════╤══════════════╕

│n.name │n.created_at │

╞══════════╪══════════════╡

│'elena_m' │2024-03-15 │

│'bot_42' │2024-03-14 │

└──────────┴──────────────┘

⚠ Production Trap:

If you skip this section and start MERGE-ing blindly, you’ll create ghost nodes from misspelled labels. Filter first, then act.

🎯 Key Takeaway

Cypher punishes schema ignorance harder than SQL — know your data shape before you query.

What You Must Have Before You Touch Cypher

You need three things: a running Neo4j instance (4.x or later — the 3.x era is dead), a basic grip on graph theory (nodes, relationships, properties), and tolerance for failure. If you’re hoping Cypher will fix your broken data model, it won’t. Garbage in, garbage out — but faster.

Connect via browser at localhost:7474 or use the bolt protocol on port 7687 for apps. Install the APOC plugin from day one — it’s the standard library Cypher should have been born with.

Don’t spend time setting up Java Spring before you can write a single MATCH. Learn the query language first. Your ORM layer won’t abstract away bad Cypher; it will just hide the stack traces until production pager duty.

PrerequisitesCheck.sqlSQL

// io.thecodeforge — database tutorial

// Verify your setup before writing anything
RETURN neo4j.version() AS version,
       apoc.version() AS apoc_version;

// Expected output (yours will differ by minor version)
// ╒═════════════════╤════════════════╕
//│version           │apoc_version    │
//╞═════════════════╪════════════════╡
//│'5.21.0'          │'5.21.0'        │
//└─────────────────┴────────────────┘

Output

╒═════════════════╤════════════════╕

│version │apoc_version │

╞═════════════════╪════════════════╡

│'5.21.0' │'5.21.0' │

└─────────────────┴────────────────┘

🔥Senior Shortcut:

If APOC is missing, run 'CALL dbms.listConfig() YIELD name WHERE name CONTAINS "apoc"'. If empty, you’re about to write painful manual Cypher. Fix this before you start.

🎯 Key Takeaway

Neo4j 5.x + APOC is your baseline — anything older is tech debt waiting to happen.

Stop Hardcoding Parameters — Pass Them Like a Pro

Hardcoding query values is how you lose your job. Not today, but soon. Every string interpolation opens a door for injection attacks, forces query recompilation, and makes your codebase look like a script kiddie's notebook. Production systems pass parameters as separate objects — Cypher has native parameter syntax for this.

The WHY is simple: parameterized queries get cached. Neo4J compiles the plan once and reuses it for every distinct parameter set. That means faster execution, reduced memory pressure, and zero injection risk. The HOW is even simpler: use $paramName in your Cypher string, then pass a map with matching keys when you execute. Works the same way in Python, JavaScript, Java, and Go drivers.

You're not building a toy. Treat your queries like they'll see production traffic on Monday. Parameters are not optional — they're the minimum bar for professional database access.

ParameterizedQuery.cypherSQL

// io.thecodeforge — database tutorial

// BAD: string interpolation — don't do this
MATCH (u:User {name: 'Alice'})
RETURN u

// GOOD: parameterized query — do this
MATCH (u:User {name: $userName})
RETURN u

// Pass parameters as a map:
// { "userName": "Alice" }

Output

╒══════════════════════════════════════════╕

│u │

╞══════════════════════════════════════════╡

│(:User {name: "Alice", age: 34, city: │

│"Berlin"}) │

└──────────────────────────────────────────┘

⚠ Production Trap:

Passing parameters via string concatenation in any driver language (Python f-strings, JS template literals) bypasses Neo4J's built-in injection prevention. Use driver-level parameter objects — always.

🎯 Key Takeaway

Never interpolate values into Cypher strings. Use $paramName syntax and pass a parameters map.

Where to Go Next — Don't Stagnate on Basics

You've got nodes, relationships, indexes, and parameterization under your belt. That's the foundation — not the finish line. The gap between knowing Cypher syntax and writing performant graph queries at scale is wider than most juniors realize.

Your next moves: learn APOC — Neo4J's standard procedure library. It handles everything from date math to full-text search to CSV import. Master the Cypher planner: read EXPLAIN output like a profiler. Understand when to use USING INDEX, when to restructure your graph model, and when to reach for a graph-native algorithm like PageRank or Louvain community detection.

If you're serious, build something real: a recommendation engine that uses cosine similarity on embedding vectors. Or a fraud detection pattern with variable-length paths and time windows. The basics get you hired. The next level keeps you employed.

NextSteps.cypherSQL

// io.thecodeforge — database tutorial

// Install APOC (run once in Neo4J Browser or via config)
CALL dbms.listComponents() YIELD versions
WHERE name = 'APOC'
RETURN versions

// Example: use APOC for CSV import
LOAD CSV WITH HEADERS FROM 'file:///users.csv' AS row
CALL apoc.create.node(['User'], {id: row.id, name: row.name}) YIELD node
RETURN count(node)

Output

╒══════════════════════════════════════════╕

│versions │

╞══════════════════════════════════════════╡

│["5.23.0"] │

└──────────────────────────────────────────┘

╒══════════════════════════════════════════╕

│count(node) │

╞══════════════════════════════════════════╡

│1247 │

└──────────────────────────────────────────┘

🔥Senior Shortcut:

Before writing a complex native Cypher query, check if APOC has a procedure that does it in one line. apoc.periodic.commit alone will save you hours on batch updates.

🎯 Key Takeaway

Master APOC and the Cypher planner. The basics are the floor, not the ceiling.

Populating an AuraDB Instance with Football Data

You need real data to test queries at scale, not a two-node toy graph. AuraDB's free tier gives you a live Neo4j instance; loading football data from a CSV dump or a public dataset trains you on batch ingestion, constraints, and indexing. Start by spinning up an AuraDB instance and grabbing the connection URI. Download the European football dataset (matches, teams, players) as CSVs. Use LOAD CSV with a periodic commit to stream rows into nodes and relationships. Map each match to a Game node, each team to a Team, and create PLAYED_IN or SCORED relationships. Add uniqueness constraints on team IDs before ingesting to prevent duplicates. This hands-on workflow forces you to handle data types, missing fields, and relationship semantics—skills that translate directly to production graph pipelines.

load_football_data.cypherSQL

// io.thecodeforge — database tutorial

CREATE CONSTRAINT team_id_unique IF NOT EXISTS
FOR (t:Team) REQUIRE t.id IS UNIQUE;

LOAD CSV WITH HEADERS FROM 'https://example.com/football/teams.csv' AS row
WITH row
WHERE row.id IS NOT NULL
MERGE (t:Team {id: toInteger(row.id)})
SET t.name = row.name, t.country = row.country
RETURN count(t) AS teams_created;

LOAD CSV WITH HEADERS FROM 'https://example.com/football/matches.csv' AS row
WITH row
WHERE row.home_team_id IS NOT NULL
MATCH (home:Team {id: toInteger(row.home_team_id)})
MATCH (away:Team {id: toInteger(row.away_team_id)})
CREATE (g:Game {date: date(row.date), home_score: toInteger(row.home_score), away_score: toInteger(row.away_score)})
CREATE (home)-[:PLAYED_AT_HOME]->(g)
CREATE (away)-[:PLAYED_AWAY]->(g);

Output

teams_created: 32

Added 32 labels, created 32 nodes, set 96 properties, completed after 215ms.

⚠ Production Trap:

Using MERGE on every row with no primary key constraint creates duplicates under concurrent writes. Always enforce uniqueness with CREATE CONSTRAINT before bulk loading.

🎯 Key Takeaway

Always index or constrain your merge keys before bulk loading CSV data into AuraDB.

Conclusion and Next Steps

You now know how to create, read, update, and delete graph data, traverse variable-length paths, aggregate across relationships, and optimize queries with profiles and indexes. That foundation alone beats 80% of developers who treat Cypher like SQL with arrows. Your next move: pick a real dataset—your email inbox, GitHub commit history, or a public knowledge graph like Wikidata—and model it as nodes and relationships. Write queries that answer questions you actually have: "Who introduced me to my top collaborator?" or "Which code module has the highest bug cascade?" Then learn APOC and GDS libraries for advanced graph algorithms like PageRank or community detection. Finally, wrap your Cypher queries in a driver (Java, Python, Node.js) and build a CRUD API. That path turns basic pattern matching into a full-stack graph application.

next_steps_example.cypherSQL

// io.thecodeforge — database tutorial

// Find the shortest path between two people in a social graph
MATCH path = shortestPath(
  (alice:Person {name: 'Alice'})-[*..6]-(bob:Person {name: 'Bob'})
)
RETURN [node IN nodes(path) | node.name] AS chain,
       length(path) AS degrees_of_separation;

// Run PageRank on a collaboration network (requires GDS plugin)
CALL gds.pageRank.write({
  nodeProjection: 'Person',
  relationshipProjection: 'KNOWS',
  writeProperty: 'pagerank'
});

Output

chain: ['Alice', 'Charlie', 'Diana', 'Bob']

degrees_of_separation: 3

---

PageRank completed — wrote 'pagerank' property to 1502 nodes.

⚠ Production Trap:

ShortestPath with unlimited depth [*] on a dense graph can blow memory. Always cap traversal depth and use WHERE to filter early.

🎯 Key Takeaway

Master Cypher basics, then automate with a driver and layer on graph algorithms for real insight.

Aspect	Relational (SQL)	Graph (Cypher)
Data Model	Tables and Rows	Nodes and Relationships
Relationship Logic	Foreign Keys & Joins	Direct Traversal (Pointers)
Query Syntax	Verbose SELECT/JOIN	ASCII-Art Pattern Matching
Performance at Depth	Degrades exponentially with join depth	Constant for local traversals (index-free adjacency)
Schema	Rigid — ALTER TABLE for changes	Schema-optional — add properties without migration
Many-to-Many	Requires junction table	Native — just draw multiple relationships
Aggregation	GROUP BY + HAVING	WITH + WHERE (after aggregation)
Subqueries	Nested SELECT / CTE	CALL {} blocks
Batch Operations	INSERT INTO ... SELECT	UNWIND $list MERGE ...
Full-Text Search	Full-Text Index / LIKE '%x%'	TEXT INDEX + CONTAINS / apoc.text

⚙ Quick Reference

21 commands from this guide

File	Command / Code	Purpose
iothecodeforgegraphSyntaxFundamentals.cypher	CREATE (alice:Person {name: 'Alice', age: 32, city: 'London'})	Cypher Syntax Fundamentals
iothecodeforgegraphCreateNodes.cypher	CREATE (order:Order {	CREATE
iothecodeforgegraphMatchAndFilter.cypher	MATCH (dev:Developer)	MATCH and WHERE
iothecodeforgegraphReturnAndSort.cypher	MATCH (dev:Developer)	RETURN, ORDER BY, LIMIT, and DISTINCT
iothecodeforgegraphSetAndRemove.cypher	MATCH (user:User {email: 'alice@thecodeforge.io'})	SET and REMOVE
iothecodeforgegraphDeleteOperations.cypher	MATCH (alice:Person {name: 'Alice'})-[r:FRIEND]->(bob:Person {name: 'Bob'})	DELETE and DETACH DELETE
iothecodeforgegraphMergeOperations.cypher	MERGE (p:Project {name: 'ForgeAPI'})	MERGE
iothecodeforgegraphWithClause.cypher	MATCH (dev:Developer)-[:CONTRIBUTES_TO]->(proj:Project)	WITH
iothecodeforgegraphAggregationFunctions.cypher	MATCH (user:User)	Aggregation Functions
iothecodeforgegraphVariableLengthPaths.cypher	MATCH (me:Person {name: 'Alice'})-[:FRIEND*2]->(fof)	Variable-Length Paths
iothecodeforgegraphUnwindOperations.cypher	UNWIND ['Alice', 'Bob', 'Charlie'] AS name	UNWIND
iothecodeforgegraphSubqueriesAndCall.cypher	MATCH (dev:Developer)	Subqueries with CALL
iothecodeforgegraphIndexesAndConstraints.cypher	CREATE INDEX user_email_index FOR (u:User) ON (u.email);	Indexes and Constraints
iothecodeforgegraphRealWorldPatterns.cypher	MATCH (customer:Customer)-[:PURCHASED]->(product:Product {sku: 'WIDGET-PRO'})	Real-World Patterns
iothecodeforgegraphQueryProfiling.cypher	EXPLAIN	PROFILE and EXPLAIN
WhoShouldReadThis.sql	MATCH (n) RETURN n LIMIT 1000;	Who This S**t Is For (And Who Should Walk Away)
PrerequisitesCheck.sql	RETURN neo4j.version() AS version,	What You Must Have Before You Touch Cypher
ParameterizedQuery.cypher	MATCH (u:User {name: 'Alice'})	Stop Hardcoding Parameters
NextSteps.cypher	CALL dbms.listComponents() YIELD versions	Where to Go Next
load_football_data.cypher	CREATE CONSTRAINT team_id_unique IF NOT EXISTS	Populating an AuraDB Instance with Football Data
next_steps_example.cypher	MATCH path = shortestPath(	Conclusion and Next Steps

Key takeaways

Cypher is a pattern-based query language

you describe the SHAPE of your data using ASCII art, and the engine finds every instance of that shape in the graph. Think in nodes and paths, not tables and joins.

CREATE always creates new nodes, even if duplicates exist. MERGE checks for existence first and creates only if missing. Use MERGE for entities, CREATE for events.

Always use labels in your MATCH patterns

matching without a label forces a full database scan (O(n)), while matching with a label uses the index (O(log n)).

WITH is the pipe operator of Cypher

it chains query stages, enables aggregation filtering (the HAVING equivalent), and limits variable scope for query optimization.

Variable-length paths ([*1..5]) are Cypher's superpower for social networks, supply chains, and recommendation engines

but NEVER use unbounded paths ([*]) in production.

UNWIND expands lists into rows

use it with MERGE for batch operations that process thousands of records in a single database round trip.

Profile every production query with EXPLAIN or PROFILE. Look for NodeIndexSeek (good) vs NodeByLabelScan (bad). If you see Eager, restructure the query.

Create uniqueness constraints on every entity's primary identifier before loading data

this prevents duplicates at the database level and gives you a free index.

collect() gathers rows into a list, UNWIND spreads a list into rows. Together they're the foundation of nested result structures and batch processing in Cypher.

Soft deletes (setting a 'deleted' property) are almost always better than hard deletes (DETACH DELETE) in production

hard deletes break graph integrity.

Cypher's recommendation engine query is just two hops

(me)-[:PURCHASED]->(product)<-[:PURCHASED]-(other)-[:PURCHASED]->(rec). In SQL, this is a 15-line self-join. In Cypher, it's a single pattern.

Use keyset pagination (WHERE created_at < $cursor LIMIT 10) instead of SKIP-based pagination (SKIP 100000 LIMIT 10)

SKIP forces the engine to discard rows, keyset jumps directly to the right position.

INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

FAQ · 10 QUESTIONS

Frequently Asked Questions

Does Cypher support ACID transactions like SQL?

What is the difference between CREATE and MERGE?

How do I handle many-to-many relationships in Cypher?

What is the 'ASCII-art' syntax exactly?

Can I use Cypher to query non-graph data?

How do I import data from CSV into Neo4j?

How do I paginate results efficiently in Cypher?

What is the WITH clause and when do I need it?

How do I optimize a slow Cypher query?

Can I use Cypher with Spring Data Neo4j?

Naren Founder & Principal Engineer

20+ years shipping high-throughput database systems. Notes here come from systems that actually shipped.

✓ Verified

production tested

July 19, 2026

last updated

2,466

articles · all by Naren

🔥

That's Neo4j. Mark it forged?

13 min read · try the examples if you haven't

Cypher Query Language: 45s SQL to 12ms in Neo4j

Why Cypher Isn't Just SQL for Graphs

Cypher Syntax Fundamentals: Nodes, Relationships, and Patterns

CREATE: Building Nodes and Relationships

MATCH and WHERE: Reading and Filtering Data

RETURN, ORDER BY, LIMIT, and DISTINCT: Shaping Your Results

SET and REMOVE: Updating Nodes and Relationships

DELETE and DETACH DELETE: Removing Data

MERGE: Idempotent Create-or-Find

WITH: Chaining Subqueries and Aggregation Pipelines

Aggregation Functions: COUNT, COLLECT, SUM, AVG, and Friends

Variable-Length Paths: Traversing Multiple Hops

UNWIND: Working with Lists and Expanding Collections

Subqueries with CALL: Encapsulating Complex Logic

Indexes and Constraints: Schema Management for Performance

Real-World Patterns: Recommendation Engine, Fraud Detection, and Social Graph

PROFILE and EXPLAIN: Query Performance Analysis

Who This S**t Is For (And Who Should Walk Away)

What You Must Have Before You Touch Cypher

Stop Hardcoding Parameters — Pass Them Like a Pro

Where to Go Next — Don't Stagnate on Basics

Populating an AuraDB Instance with Football Data

Conclusion and Next Steps

Key takeaways

Interview Questions on This Topic

Frequently Asked Questions

That's Neo4j. Mark it forged?