Matrix wastes memory on sparse graphs; list dominates real-world apps
Performance insight: matrix BFS on 10K nodes scans 100M cells; list BFS traverses only edges
Production insight: wrong representation turns a seconds-long algorithm into hours
Biggest mistake: using matrix for sparse graph without checking E vs V²
✦ Definition~90s read
What is Graph Representation?
Graph representation is the foundational decision in graph algorithms—choosing between an adjacency matrix and an adjacency list determines whether your BFS finishes in 45 minutes or 45 seconds. An adjacency matrix stores edges in a V×V grid, giving O(1) edge lookups but O(V²) memory, which kills you at 10,000+ nodes (100 million entries, ~800 MB for ints).
★
Imagine a city map.
An adjacency list stores edges per vertex as linked lists or arrays, using O(V+E) memory—typically 10-100x less for sparse graphs—but costs O(degree) per neighbor iteration. This tradeoff is why production graph databases (Neo4j, Amazon Neptune) and frameworks (NetworkX, igraph) default to adjacency lists for real-world graphs, which are almost always sparse (E << V²).
The mistake that costs 45 minutes is using a matrix for a sparse graph with 50,000 nodes: you allocate 2.5 billion cells, thrash cache, and turn a linear BFS into a quadratic memory disaster. Conversely, for dense graphs (E ≈ V²/2) like social networks with 10,000 users where everyone knows everyone, the matrix wins on iteration speed.
The decision framework boils down to: if your graph fits in RAM as a matrix and you need constant-time edge checks, use it; otherwise, the list is your only scalable option.
Plain-English First
Imagine a city map. Every city is a dot, and every road connecting two cities is a line. A graph is exactly that — dots (called nodes) connected by lines (called edges). Now, how do you store that map inside a computer's memory? That's the whole game of graph representation. You're essentially choosing the best filing system for your map so that lookups, additions, and traversals are as fast as possible.
Graphs power more of your daily software than you'd guess. Google Maps finds your fastest route using graph traversal. Facebook's friend suggestions use graph connectivity. LinkedIn's 'second connection' feature? Graph theory. Every time you see a network of relationships — people, cities, servers, web pages — there's a graph underneath. Understanding how to store that graph isn't a minor detail; it's the decision that determines whether your algorithm runs in milliseconds or minutes.
The core problem is this: a graph is an abstract concept, but your computer only understands memory — arrays, pointers, contiguous blocks of data. You need a concrete data structure that faithfully represents which nodes exist and which nodes are connected, without wasting memory or making lookups unnecessarily slow. Two approaches dominate: the adjacency matrix and the adjacency list. Each is a completely different philosophy about the trade-off between memory usage and access speed, and choosing the wrong one for your problem is a classic, costly mistake.
By the end of this article you'll be able to build both representations from scratch in Java, explain in plain English why each one exists, look at a problem description and immediately know which structure to reach for, and walk into an interview and answer the follow-up questions that trip most candidates up.
Graph Representation: The Memory vs Speed Tradeoff You Can't Ignore
Graph representation is how you store relationships between nodes in memory. The two dominant strategies are adjacency matrices and adjacency lists. A matrix uses a V×V boolean or integer array where cell [i][j] is 1 if edge i→j exists. A list stores for each vertex a collection of its neighbors — typically an ArrayList or LinkedList. That's it. The choice determines whether your BFS runs in O(V²) memory or O(V+E) memory, and whether edge existence checks are O(1) or O(degree).
In practice, adjacency matrices waste space on sparse graphs: a social network with 10⁶ users and ~100 friends each would need 10¹² entries (terabytes) vs 10⁸ entries (megabytes) for lists. But matrices give constant-time edge queries, which matters for dense graphs like fully connected layers in neural networks or all-pairs shortest paths on small graphs (V < 1000). Lists dominate in real-world systems because most graphs are sparse — roads, web links, social connections.
Use adjacency lists for BFS/DFS traversal, shortest path on sparse graphs, and any algorithm that iterates neighbors. Use matrices only when V is small (under a few thousand) or you need O(1) edge existence checks and can afford the memory. The wrong choice here is the most common memory mistake in graph algorithms — picking a matrix for a sparse graph will crash your process or exhaust heap.
Sparse vs Dense: The 5% Rule
If your graph has fewer than 5% of possible edges, an adjacency matrix wastes at least 95% of its allocated memory — and that's before considering cache misses.
Production Insight
Teams building a recommendation engine stored user-item interactions in a 500k×500k adjacency matrix — 250 billion entries, 1 TB of memory. The symptom was repeated OutOfMemoryError during BFS-based collaborative filtering. The rule: if E < V²/10, use adjacency lists; if E > V²/2, consider a matrix.
Key Takeaway
Adjacency lists are the default for sparse graphs (most real-world graphs).
Adjacency matrices are only justified for dense graphs or when O(1) edge lookup is critical.
Memory footprint scales as O(V²) for matrix vs O(V+E) for list — choose based on edge density, not convenience.
thecodeforge.io
Graph Repr: Adjacency Matrix vs List Memory Mistake
Graph Representation
Worked Example — Building and Comparing Graph Representations
Check if 0-3 are connected: matrix[0][3] = 0 → No. O(1). Space: O(V^2) = O(16). For dense graphs this is fine; for 1000 nodes with 10 edges, this wastes 999,990 cells.
Check if 0-3 connected: scan 0's list [1,2] → not found. O(degree(0)). Space: O(V + E) = O(4 + 4) = O(8). Much better for sparse graphs.
Edge List: [(0,1),(0,2),(1,2),(2,3)]. Space: O(E). Used in Kruskal's MST algorithm where edges are sorted by weight.
Production Insight
The space difference in this tiny example is 2x (16 vs 8).
For a graph with 50K nodes and 100K edges, the difference is 2.5 billion cells vs 100K entries.
Rule: always estimate E before choosing — the matrix is a memory bomb for sparse graphs.
Key Takeaway
Small examples hide the real cost.
Always scale the space calculation to your actual problem size.
If V is large and E is small, adjacency list is the only viable option.
Adjacency Matrix — The 'Spreadsheet' Approach to Storing Edges
An adjacency matrix is a 2D grid where both the rows and columns represent nodes. If there's an edge between node A and node B, you put a 1 (or the edge weight) at position [A][B]. If there's no edge, you store 0. That's it.
Why would you choose a grid? Because checking whether two specific nodes are connected is O(1) — instantaneous. You just index directly into the array like matrix[3][7]. There's no searching, no traversal. That property is invaluable when your core operation is 'does this edge exist?'
The cost you pay is memory. If your graph has N nodes, you always allocate N × N space — even if your graph only has a handful of edges. A social network with 100 million users storing an adjacency matrix would need 10 quadrillion cells. That's why adjacency matrices are reserved for dense graphs — graphs where most possible edges actually exist — and for smaller graphs where the O(1) edge lookup is worth the memory price.
For weighted graphs (like road distances), you simply store the weight instead of 1. For unweighted undirected graphs, the matrix is always symmetric — matrix[i][j] always equals matrix[j][i].
AdjacencyMatrix.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
publicclassAdjacencyMatrix {
private final int[][] matrix; // The actual grid storing edge data
private final int nodeCount; // Total number of nodes in the graphpublicAdjacencyMatrix(int nodeCount) {
this.nodeCount = nodeCount;
// Allocate a nodeCount x nodeCount grid, all initialised to 0 (no edges)this.matrix = newint[nodeCount][nodeCount];
}
// Add an undirected edge between two nodes (both directions matter)
public void addEdge(int fromNode, int toNode) {\n // Validate to avoid silent index-out-of-bounds bugs in production\n if (fromNode < 0 || toNode < 0 || fromNode >= nodeCount || toNode >= nodeCount) {\n throw new IllegalArgumentException(\n \"Node index out of range: \" + fromNode + \", \" + toNode\n );\n }\n matrix[fromNode][toNode] = 1; // Mark edge in one direction\n matrix[toNode][fromNode] = 1; // Mark edge in the other (undirected)\n }\n\n // O(1) check — this is the matrix's superpower\n public boolean hasEdge(int fromNode, int toNode) {\n return matrix[fromNode][toNode] == 1;\n }\n\n // Print the full grid so we can visualise the structure\n public void printMatrix() {\n System.out.print(\" \");\n for (int col = 0; col < nodeCount; col++) {\n System.out.printf(\"%3d\", col); // Column headers\n }\n System.out.println();\n\n for (int row = 0; row < nodeCount; row++) {\n System.out.printf(\"%3d |\", row); // Row header\n for (int col = 0; col < nodeCount; col++) {\n System.out.printf(\"%3d\", matrix[row][col]);\n }\n System.out.println();\n }\n }\n\n public static void main(String[] args) {\n // Modelling a small city road network with 5 intersections (nodes 0-4)\n AdjacencyMatrix cityRoads = new AdjacencyMatrix(5);\n\n cityRoads.addEdge(0, 1); // Road between intersection 0 and 1\n cityRoads.addEdge(0, 4); // Road between intersection 0 and 4\n cityRoads.addEdge(1, 2); // Road between intersection 1 and 2\n cityRoads.addEdge(1, 3); // Road between intersection 1 and 3\n cityRoads.addEdge(3, 4); // Road between intersection 3 and 4\n\n cityRoads.printMatrix();\n\n System.out.println();\n // O(1) lookup — is there a direct road from 0 to 3?\n System.out.println(\"Direct road from 0 to 3? \" + cityRoads.hasEdge(0, 3));\n System.out.println(\"Direct road from 1 to 3? \" + cityRoads.hasEdge(1, 3));\n }\n}","output": " 0 1 2 3 4\n 0 | 0 1 0 0 1\n 1 | 1 0 1 1 0\n 2 | 0 1 0 0 0\n 3 | 0 1 0 0 1\n 4 | 1 0 0 1 0\n\nDirect road from 0 to 3? false\nDirect road from 1 to 3? true"
}
Adjacency List — The 'Contacts App' Approach That Scales
An adjacency list stores a graph differently: instead of a giant grid, each node gets its own personal list of the nodes it's directly connected to. If node 3 connects to nodes 1, 5, and 7, then node 3's list contains [1, 5, 7]. Nothing more.
Think of it like a contacts app. Each person (node) has a list of their direct friends (neighbours). You don't store a slot for every person on the planet that says 'not friends'. You only store the actual connections.
This is why adjacency lists dominate in the real world: most graphs are sparse — they have far fewer edges than the theoretical maximum. A social network with a billion users might have each user averaging 200 friends. An adjacency matrix would waste an astronomical amount of memory on zeros. An adjacency list stores only the real edges, giving you O(V + E) space where V is vertices and E is edges.
The trade-off: checking if a specific edge exists is now O(degree of node) — you have to scan the neighbour list. But for most graph algorithms like BFS and DFS, you're iterating over neighbours anyway, which is exactly what adjacency lists are optimised for.
AdjacencyList.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
publicclassAdjacencyList {
// Each node maps to a list of its direct neighbours// Using a Map allows nodes to have any label (String, Integer, etc.)privatefinalMap<Integer, List<Integer>> adjacencyList;
publicAdjacencyList() {
this.adjacencyList = newHashMap<>();
}
// Register a node so it exists in the graph even with no edgespublicvoidaddNode(int node) {
adjacencyList.putIfAbsent(node, newArrayList<>());
}
// Add an undirected edge — each side adds the other to its neighbour list
public void addEdge(int fromNode, int toNode) {\n addNode(fromNode); // Ensure both nodes exist before adding edges\n addNode(toNode);\n\n adjacencyList.get(fromNode).add(toNode); // fromNode now knows about toNode\n adjacencyList.get(toNode).add(fromNode); // toNode now knows about fromNode\n }// O(degree) check — scan the neighbour list of fromNodepublicbooleanhasEdge(int fromNode, int toNode) {\n List<Integer> neighbours = adjacencyList.getOrDefault(fromNode, newArrayList<>());\n return neighbours.contains(toNode);\n }
// Get all direct neighbours — this is the operation BFS/DFS uses constantlypublicList<Integer> getNeighbours(int node) {
return adjacencyList.getOrDefault(node, newArrayList<>());
}
// Print each node alongside its neighbour listpublicvoidprintList() {
for (Map.Entry<Integer, List<Integer>> entry : adjacencyList.entrySet()) {
System.out.println("Node " + entry.getKey() + " --> " + entry.getValue());
}
}
publicstaticvoidmain(String[] args) {
// Modelling a simplified version of a flight route networkAdjacencyList flightRoutes = newAdjacencyList();
flightRoutes.addEdge(0, 1); // London -- Paris
flightRoutes.addEdge(0, 4); // London -- Dubai
flightRoutes.addEdge(1, 2); // Paris -- New York
flightRoutes.addEdge(1, 3); // Paris -- Tokyo
flightRoutes.addEdge(3, 4); // Tokyo -- DubaiSystem.out.println("=== Flight Route Adjacency List ===");
flightRoutes.printList();
System.out.println();
System.out.println("Neighbours of node 1 (Paris): " + flightRoutes.getNeighbours(1));
System.out.println("Direct flight from 0 to 3? " + flightRoutes.hasEdge(0, 3));
System.out.println("Direct flight from 1 to 4? " + flightRoutes.hasEdge(1, 4));
}
}
Output
=== Flight Route Adjacency List ===
Node 0 --> [1, 4]
Node 1 --> [0, 2, 3]
Node 2 --> [1]
Node 3 --> [1, 4]
Node 4 --> [0, 3]
Neighbours of node 1 (Paris): [0, 2, 3]
Direct flight from 0 to 3? false
Direct flight from 1 to 4? false
Why BFS and DFS Love Adjacency Lists:
Both BFS and DFS need to repeatedly ask 'give me all neighbours of this node'. With an adjacency list, that operation is instant — you just return the pre-built list. With a matrix, you'd scan an entire row of N cells just to find the non-zero entries. For traversal-heavy algorithms, the list wins every time.
Production Insight
Adjacency list iteration is the bottleneck for graph algorithms.
If you store neighbours in an ArrayList, contains() is O(degree) and may be slow.
For frequent edge existence checks, use HashSet per node instead — O(1) but more memory.
Rule: pick the inner collection based on your dominant operation: list for iteration, set for existence.
Key Takeaway
Adjacency list = O(V+E) space, O(degree) neighbor access.
The default choice for 99% of real-world graphs.
Optimise the inner collection based on whether you traverse or check existence.
Directed and Weighted Graphs — When Edges Have Direction and Cost
So far the edges in our examples have been undirected — a road from A to B means you can also go from B to A. But real-world graphs often have direction. Twitter follows are directed: you following someone doesn't mean they follow you back. One-way streets are directed. Dependencies in a build system are directed.
A directed graph (digraph) changes one thing in the adjacency list: when you add an edge from A to B, you only add B to A's neighbour list — not the reverse. In the matrix, you only set matrix[A][B] = 1, not matrix[B][A].
Weighted graphs add a cost to each edge. Google Maps doesn't just care that roads exist — it cares how long each road takes. For an adjacency list, you swap the simple integer neighbour for an Edge object that holds both the destination and the weight. For a matrix, you store the weight value instead of 1.
The code below combines both concepts — a directed, weighted graph — which is the most realistic structure you'll implement in real interview problems and production systems.
DirectedWeightedGraph.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
publicclassDirectedWeightedGraph {
// Inner class representing a directed edge with a weight (travel cost)staticclassEdge {
int destination; // Where this edge points TO
int weight; // The cost to traverse this edge (e.g. distance in km)Edge(int destination, int weight) {\n this.destination = destination;\n this.weight = weight;\n }
@OverridepublicStringtoString() {
return"->" + destination + "(cost:" + weight + ")";
}
}
// Each node has a list of outgoing weighted edgesprivatefinalMap<Integer, List<Edge>> graph;
publicDirectedWeightedGraph() {
this.graph = newHashMap<>();
}
publicvoidaddNode(int node) {
graph.putIfAbsent(node, newArrayList<>());
}
// Directed: ONLY adds toNode to fromNode's list, not the reversepublicvoidaddDirectedEdge(int fromNode, int toNode, int weight) {
addNode(fromNode);
addNode(toNode);
graph.get(fromNode).add(newEdge(toNode, weight));
// Notice: we do NOT add fromNode to toNode's list here
}
publicList<Edge> getOutgoingEdges(int node) {
return graph.getOrDefault(node, newArrayList<>());
}
publicvoidprintGraph() {
for (Map.Entry<Integer, List<Edge>> entry : graph.entrySet()) {
System.out.print("Node " + entry.getKey() + " : ");
for (Edge edge : entry.getValue()) {
System.out.print(edge + " ");
}
System.out.println();
}
}
publicstaticvoidmain(String[] args) {
// Modelling a package delivery network where roads are one-directional// and weight represents delivery time in minutesDirectedWeightedGraph deliveryNetwork = newDirectedWeightedGraph();
deliveryNetwork.addDirectedEdge(0, 1, 10); // Warehouse -> Depot A (10 mins)
deliveryNetwork.addDirectedEdge(0, 2, 30); // Warehouse -> Depot B (30 mins)
deliveryNetwork.addDirectedEdge(1, 3, 15); // Depot A -> Customer X (15 mins)
deliveryNetwork.addDirectedEdge(2, 3, 5); // Depot B -> Customer X (5 mins)
deliveryNetwork.addDirectedEdge(1, 2, 8); // Depot A -> Depot B (8 mins)System.out.println("=== Delivery Network (Directed + Weighted) ===");
deliveryNetwork.printGraph();
System.out.println();
System.out.println("Routes out of Depot A (node 1): "
+ deliveryNetwork.getOutgoingEdges(1));
// Customer X (node 3) has no outgoing edges — it's a terminal nodeSystem.out.println("Routes out of Customer X (node 3): "
+ deliveryNetwork.getOutgoingEdges(3));
}
}
Output
=== Delivery Network (Directed + Weighted) ===
Node 0 : ->1(cost:10) ->2(cost:30)
Node 1 : ->3(cost:15) ->2(cost:8)
Node 2 : ->3(cost:5)
Node 3 :
Routes out of Depot A (node 1): [->3(cost:15), ->2(cost:8)]
Routes out of Customer X (node 3): []
Watch Out — Directed vs Undirected Bugs Are Silent:
If you accidentally add both directions when implementing a directed graph, your algorithm won't crash — it'll just give wrong answers. Dijkstra's on a directed graph with accidental reverse edges will find paths that don't actually exist. Always verify directionality before running any shortest-path algorithm.
Production Insight
Directed graph bugs are silent and deadly.
A missing reverse edge in an undirected graph is a bug; a missing reverse edge in a directed graph is correct.
The real production pitfall: accidentally adding both directions when the graph should be directed.
Rule: make addEdge() enforce directionality explicitly — don't rely on the caller to remember.
Key Takeaway
Directed graphs: addEdge(A,B) adds only A→B.
Always verify direction before running shortest-path algorithms.
Use Edge objects for weights; never store weight in a separate map — it breaks encapsulation.
Choosing the Right Representation: A Decision Framework
You don't need to memorise every trade-off — you need a mental checklist. Here's the decision framework used by senior engineers when starting a new graph problem:
Count V and E: If you don't know them, estimate. For most real problems, E is much less than V².
Check density: density = E / (V*(V-1)/2). If density > 20%, consider matrix. If density < 5%, definitely list.
Identify dominant operations:
- Edge existence checks: matrix wins.
- Iterating neighbors: list wins.
- Adding new nodes dynamically: list wins (matrix resizing is O(V²)).
UseUse adjacency matrix – the algorithm already accesses every pair.
IfFrequent dynamic node addition/deletion
→
UseUse adjacency list. Matrix resizes are expensive.
IfYou need to sort edges by weight (Kruskal's MST)
→
UseUse edge list. Neither matrix nor list gives you edges as a flat array.
Real-World Performance Benchmark: Matrix vs List on a 10,000 Node Graph
Let's compare both representations on a realistic scenario: a graph with 10,000 nodes.
Sparse case: 15,000 edges (average degree 3). - Matrix: 100 million cells (10,000 × 10,000). At 4 bytes per int = 400 MB. BFS visits each node, scanning 10,000 cells per visit = 100 million checks. - List: 15,000 edge entries plus 10,000 list headers ≈ 25,000 references ≈ 200 KB (plus node objects). BFS visits each neighbor directly, total 15,000 iterations.
Dense case: 50 million edges (density ~100%). - Matrix: 400 MB, same as before. BFS still scans 10,000 per node. - List: 50 million entries ≈ 400 MB (each entry a reference to node object). BFS does 50 million iterations. Here the memory is similar, but list still gives faster neighbor iteration for traversal algorithms.
Key insight: For traversal-heavy algorithms, adjacency list is faster regardless of density because you only touch actual neighbors. The matrix forces you to scan zeros.
For edge-existence-heavy algorithms on dense graphs, matrix wins — but only if you can fit it in memory.
GraphBenchmark.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
// Snippet: Time to find all neighbors of every node// Matrix approachlong start = System.nanoTime();
for (int i = 0; i < V; i++) {
List<Integer> neighbors = newArrayList<>();
for (int j = 0; j < V; j++) {
if (matrix[i][j] != 0) neighbors.add(j);
}
}
long matrixTime = System.nanoTime() - start;
// List approach
start = System.nanoTime();
for (int i = 0; i < V; i++) {
List<Integer> neighbors = adjList.get(i); // O(1)
}
long listTime = System.nanoTime() - start;
System.out.println("Matrix neighbor collection: " + matrixTime + " ns");
System.out.println("List neighbor collection: " + listTime + " ns");
Edge List — The Representation Nobody Talks About (Until Query Performance Bites You)
You've seen the matrix and the list. There's a third player that senior engineers keep in their back pocket: the edge list. It's exactly what it sounds like — a flat list of edges, each entry containing (source, destination, weight).
Why would you ever use something so primitive? Two reasons: serialization and graph construction. When you're building a graph from a CSV file, a database query, or a network stream, the data arrives as edges. You don't need a matrix or adjacency list yet. The edge list is your staging area.
More importantly, the edge list dominates when you need to iterate over every edge in the graph. In the matrix, that's O(V²) — you're checking cells that don't exist. In the adjacency list, you're still traversing per-vertex overhead. The edge list gives you O(E) iteration with the tightest possible memory footprint: 3E integers (or 2E for unweighted). For static analysis, batch processing, or graph algorithms that need edge-level parallelism, this is the default.
The catch? Checking if an edge exists between two vertices is O(E). That's prohibitive for interactive queries. But for offline computation? Edge list is faster than both alternatives.
EdgeListBuild.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
// io.thecodeforge — dsa tutorialimport java.util.*;
publicclassEdgeListBuild {
staticclassEdge {
int src, dest, weight;
Edge(int s, int d, int w) { src = s; dest = d; weight = w; }
}
publicstaticvoidmain(String[] args) {
// Build from CSV: nodeA,nodeB,weightString[] raw = {"2,5,3", "7,1,8", "3,9,2"};
List edges = newArrayList<>();
for (String line : raw) {
String[] parts = line.split(",");
edges.add(newEdge(
Integer.parseInt(parts[0]),
Integer.parseInt(parts[1]),
Integer.parseInt(parts[2])
));
}
// Iterate all edges — O(E) and no overheadint totalWeight = 0;
for (Edge e : edges) totalWeight += e.weight;
System.out.println("Edges parsed: " + edges.size());
System.out.println("Total weight: " + totalWeight);
}
}
Output
Edges parsed: 3
Total weight: 13
Senior Shortcut:
Edge lists serialize to JSON/CSV in one pass. Adjacency lists need nested loops. For any system that exchanges graph data, the edge list is your wire format — convert to matrix or list only when the query pattern demands it.
Key Takeaway
Use edge lists for construction and batch processing; convert to matrix or adjacency list only when queries need random access.
Compressed Sparse Row (CSR) — The Memory Monster's Worst Nightmare
Adjacency lists waste memory on pointer overhead. Each edge in a linked-list-based adjacency list needs a node object with two references and the destination. On a 64-bit JVM, that's 24-32 bytes per edge — just for the node, before you store the actual vertex ID.
Compressed Sparse Row (CSR) kills that overhead dead. It stores graph data in three flat arrays:
rowIndex[V+1] — the start and end offsets into columnIndex for each vertex
columnIndex[E] — the destination vertices, sorted by source vertex
value[E] — edge weights (optional for unweighted)
Lookup is O(log degree) thanks to sorted neighbors. Total memory for an unweighted graph: V+1 + E integers. That's it. No objects, no pointers, no cache misses from chasing references. CSR is the standard in high-performance graph frameworks like GraphX, SuiteSparse, and every production graph database worth its salt.
Building CSR from an edge list is a 3-step sort-and-compress. The construction cost is O(E log E), but you pay it once. Every traversal after that hits contiguous memory — your L1 cache thanks you.
For graphs with millions of edges, adjacency lists will page-fault you to death. CSR maps to a single malloc call. Your cloud bill will notice the difference.
CSRBuild.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
// io.thecodeforge — dsa tutorialimport java.util.*;
publicclassCSRBuild {
publicstaticvoidmain(String[] args) {
// Raw edges: (src, dest)int[][] edges = {{0,2}, {1,2}, {1,3}, {2,3}, {4,0}};
int V = 5, E = edges.length;
// Determine degree of each vertexint[] degree = newint[V];
for (int[] e : edges) degree[e[0]]++;
// Build rowIndex from prefix sum of degreesint[] rowIndex = newint[V + 1];
for (int i = 0; i < V; i++)
rowIndex[i + 1] = rowIndex[i] + degree[i];
// Fill columnIndexint[] columnIndex = newint[E];
int[] cursor = Arrays.copyOf(rowIndex, V); // temp pointersfor (int[] e : edges) {
int src = e[0], dest = e[1];
columnIndex[cursor[src]++] = dest;
}
// Query neighbors of vertex 1int start = rowIndex[1], end = rowIndex[2];
System.out.print("Neighbors of 1: ");
for (int i = start; i < end; i++)
System.out.print(columnIndex[i] + " ");
System.out.println();
long memory = (V + 1 + E) * 4L; // ints = 4 bytesSystem.out.println("CSR memory: " + memory + " bytes");
}
}
Output
Neighbors of 1: 2 3
CSR memory: 44 bytes
Production Trap:
CSR is read-optimized. Inserting edges requires rebuilding the entire structure. If your graph mutates at runtime, stick with adjacency lists. CSR is for static graphs that you query like crazy — social network analysis, road networks, static call graphs.
Key Takeaway
CSR gives you adjacency list speed with array-level memory density. Use it when your graph is static and query-heavy.
Topological Sorting — Dependency Hell, Solved in Linear Time
If your graph represents dependencies — build systems, course prerequisites, deployment pipelines — you need topological ordering. It's a linear sequence of vertices where every edge goes from earlier to later. No cycles allowed. If there's a cycle, your system deadlocks. Full stop.
Topological sort runs in O(V+E) using Kahn's algorithm. You count indegrees, push zero-in-degree nodes into a queue, process them, decrement neighbors. If the queue empties before you've processed all nodes, you've got a cycle. That's your alarm bell.
Production teams use this for scheduling job DAGs. Airflow, Makefile dependencies, Spark stages — all rely on topological ordering. If your graph doesn't have a valid topological order, your dependency graph is broken. Fix the cycle first, then deploy.
TopologicalSort.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
// io.thecodeforge — dsa tutorialimport java.util.*;
publicclassTopologicalSort {
publicstaticList<Integer> sort(int n, int[][] edges) {
List<Integer>[] g = newArrayList[n];
int[] indeg = newint[n];
for (int i = 0; i < n; i++) g[i] = newArrayList<>();
for (int[] e : edges) {
g[e[0]].add(e[1]);
indeg[e[1]]++;
}
Queue<Integer> q = newLinkedList<>();
for (int i = 0; i < n; i++) if (indeg[i] == 0) q.add(i);
List<Integer> order = newArrayList<>();
while (!q.isEmpty()) {
int u = q.poll();
order.add(u);
for (int v : g[u]) if (--indeg[v] == 0) q.add(v);
}
return order.size() == n ? order : List.of(); // cycle
}
publicstaticvoidmain(String[] args) {
int[][] edges = {{5,0},{4,0},{5,2},{2,3},{3,1},{4,1}};
System.out.println(sort(6, edges));
}
}
Output
[4, 5, 0, 2, 3, 1]
Production Trap:
Never trust a topological sort that doesn't return all nodes. An empty result means you ignored a cycle. Always validate the output length against vertex count before scheduling downstream tasks.
Key Takeaway
Topological order exists only in DAGs. If you can't order it, you've got a cycle — fix that before anything else.
Cycles — The Silent Killer of Every Graph-Based System
A cycle is a path of edges where the first and last vertex are the same. Sounds academic. In production, cycles mean deadlocks. Circular dependencies in your microservices. Infinite loops in state machines. Garbage collection root leaks.
Detecting cycles is cheap. For an undirected graph, run DFS and check if you hit a visited neighbor that isn't your parent. For a directed graph, track a recursion stack — gray set in DFS. If you back-edge to a node still in the recursion stack, you've found a cycle. That's O(V+E). No excuses.
Most teams bake cycle detection into their build pipeline. They refuse to deploy if the dependency graph has a cycle. Why? Because if your service depends on A, which depends on B, which depends on A, you can't start either one. Detection is free. Downtime is not.
CycleDetection.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
// io.thecodeforge — dsa tutorialimport java.util.*;
publicclassCycleDetection {
staticbooleanhasCycle(int n, int[][] edges) {
List<Integer>[] g = newArrayList[n];
for (int i = 0; i < n; i++) g[i] = newArrayList<>();
for (int[] e : edges) g[e[0]].add(e[1]);
int[] state = new int[n]; // 0=unvisited,1=visiting,2=donefor (int i = 0; i < n; i++) if (dfs(i, g, state)) returntrue;
returnfalse;
}
staticbooleandfs(int u, List<Integer>[] g, int[] state) {
if (state[u] == 1) returntrue;
if (state[u] == 2) returnfalse;
state[u] = 1;
for (int v : g[u]) if (dfs(v, g, state)) returntrue;
state[u] = 2;
returnfalse;
}
publicstaticvoidmain(String[] args) {
int[][] cyclic = {{0,1},{1,2},{2,0}};
int[][] acyclic = {{0,1},{1,2}};
System.out.println("Cyclic: " + hasCycle(3, cyclic));
System.out.println("Acyclic: " + hasCycle(3, acyclic));
}
}
Output
Cyclic: true
Acyclic: false
Senior Shortcut:
Use Union-Find for undirected cycle detection — simpler than DFS, no recursion depth issues. For directed graphs, stick with DFS + state array. Don't confuse the two; they're fundamentally different problems.
Key Takeaway
A cycle in production means a deadlock waiting to happen. Detect it at compile time, not at 3 AM during an outage.
● Production incidentPOST-MORTEMseverity: high
The Slow Graph Traversal Incident
Symptom
BFS traversal that should complete in seconds was taking over 45 minutes. Memory usage was at 99% of heap.
Assumption
The team assumed adjacency matrix would give fast edge lookups for shortest path calculations.
Root cause
The graph had 2 million nodes but only ~10 million edges (average degree 5). Matrix size: 4 trillion cells (2M x 2M). Even though each cell was a byte, that's 4 TB of memory. JVM couldn't allocate it, so they used a sparse matrix library that still caused O(n²) traversal cost.
Fix
Switched to adjacency list. Space dropped to ~80 MB. BFS completed in under 3 seconds.
Key lesson
Always estimate E vs V² before choosing representation. If E << V², adjacency list wins.
Never assume O(1) edge lookup is worth the memory cost until you compute the actual matrix size.
When memory is tight, adjacency list is the default. Only use matrix when density exceeds ~20%.
Production debug guideSymptom → Action for common graph representation problems in production5 entries
Symptom · 01
OutOfMemoryError when creating graph
→
Fix
Check if you're using adjacency matrix for a large graph. Calculate V² in bytes. If V > 10,000 and E << V², switch to adjacency list immediately.
Symptom · 02
BFS/DFS misses nodes that should be reachable
→
Fix
Verify undirected edges are added both ways. Print adjacency list for suspect nodes to see missing reverse edges.
Symptom · 03
Edge existence check returns false for existing edge
→
Fix
For adjacency list, check if you're using contains() on a list – that's O(n). Ensure equals() is correctly implemented if storing custom Edge objects.
Symptom · 04
Algorithms run significantly slower on 'small' graph
→
Fix
Count the actual number of edges. If matrix is used, BFS scans entire row per node – O(V²). Compare actual neighbors per node.
When would you choose an adjacency matrix over an adjacency list?
Q02JUNIOR
How do you implement a weighted adjacency list in Java?
Q03SENIOR
What is the space complexity of an adjacency matrix? What happens if the...
Q04SENIOR
How would you find all neighbours of a node using an adjacency matrix vs...
Q05SENIOR
Can you convert an adjacency list to a matrix efficiently? What's the ti...
Q01 of 05SENIOR
When would you choose an adjacency matrix over an adjacency list?
ANSWER
When the graph is dense (E close to V²), when the number of nodes is small enough to fit in memory (e.g., V < 10,000), and when the dominant operation is checking edge existence (O(1) vs O(degree)). Also for algorithms like Floyd-Warshall that inherently operate on the full matrix.
But in real-world production systems, adjacency list is almost always the safer default because most graphs are sparse. The matrix's O(V²) memory cost grows quadratically, and it's easy to underestimate.
Q02 of 05JUNIOR
How do you implement a weighted adjacency list in Java?
ANSWER
Create an Edge class with fields for destination and weight. Then use Map<Integer, List<Edge>> where each key is a node, and the list contains outgoing edges. For directed graphs, only add to the 'from' node's list. For undirected, add both. Example:
``java
class Edge {
int dest;
int weight;
Edge(int dest, int weight) { this.dest = dest; this.weight = weight; }
}
Map<Integer, List<Edge>> graph = new HashMap<>();
`
Then addEdge(a, b, w) does graph.get(a).add(new Edge(b, w)). Don't forget to initialize lists for each node. We often use computeIfAbsent` for that.
Q03 of 05SENIOR
What is the space complexity of an adjacency matrix? What happens if the graph is undirected?
ANSWER
Space complexity is O(V²) regardless of actual edges. For an undirected graph, the matrix is symmetric along the main diagonal, so you could technically store only the upper triangular half to halve space, but that complicates indexing. In practice, we allocate the full V×V array.
If V = 100,000, the matrix would contain 10 billion entries. At 4 bytes per int, that's 40 GB — completely impractical. That's why adjacency list is preferred for large or sparse graphs.
For undirected graphs, both matrix[i][j] and matrix[j][i] must be set to 1 (or weight) — forgetting the symmetric update is a common bug.
Q04 of 05SENIOR
How would you find all neighbours of a node using an adjacency matrix vs list? What are the time complexities?
ANSWER
With adjacency matrix: you must scan the entire row for that node — O(V). With adjacency list: you simply return or iterate the list for that node — O(degree). Since degree is often much smaller than V in real-world graphs, adjacency list is orders of magnitude faster for neighbor iteration.
For graph algorithms like BFS and DFS, which repeatedly ask for neighbors, the adjacency list wins every time. Matrix-based BFS on a graph with 10,000 nodes will scan 10,000 columns per node, even if the node has only 3 neighbors. That's 100 million operations vs 15,000.
Q05 of 05SENIOR
Can you convert an adjacency list to a matrix efficiently? What's the time complexity?
ANSWER
You can convert by first allocating a V×V matrix (O(V²) space). Then iterate over each node's adjacency list, setting matrix[from][neighbor] = 1 (or weight). That takes O(V + E) time. So total: O(V²) space and O(V² + E) time. The allocation dominates.
Similarly, matrix to list: scan all V² cells and for each non-zero entry add to the list. That's O(V²) time and O(V + E) space. This is why you should never start with a matrix if you might later need a list — matrix scans are painful.
01
When would you choose an adjacency matrix over an adjacency list?
SENIOR
02
How do you implement a weighted adjacency list in Java?
JUNIOR
03
What is the space complexity of an adjacency matrix? What happens if the graph is undirected?
SENIOR
04
How would you find all neighbours of a node using an adjacency matrix vs list? What are the time complexities?
SENIOR
05
Can you convert an adjacency list to a matrix efficiently? What's the time complexity?
SENIOR
FAQ · 5 QUESTIONS
Frequently Asked Questions
01
What is the main difference between adjacency matrix and adjacency list?
The adjacency matrix uses O(V²) space and gives O(1) edge existence checks. The adjacency list uses O(V+E) space and gives O(degree) neighbor access. The matrix is better for dense graphs with many edge checks; the list is better for sparse graphs and iterative traversal.
Was this helpful?
02
When should I use an edge list instead of adjacency list?
Use an edge list when you need to process edges globally without regard to node adjacency — for example, in Kruskal's Minimum Spanning Tree algorithm, where you sort all edges by weight. Edge lists are also useful for serialisation or when you need to store the graph in a file.
Was this helpful?
03
What happens if I use an adjacency matrix for a graph with 100,000 nodes and 150,000 edges?
The matrix would have 10 billion cells (100K × 100K). At 4 bytes per int, that's 40 GB — far beyond typical JVM heap limits. The adjacency list would need only about 150K edge entries plus headers, well under 10 MB. The algorithm would be unusably slow or crash with OutOfMemoryError.
Was this helpful?
04
Is it possible to store an adjacency matrix using less memory for undirected graphs?
Technically, since the matrix is symmetric for undirected graphs, you could store only the upper triangular half (including diagonal). That roughly halves memory to O(V²/2). However, code becomes more complex to index correctly, and in practice, most developers just allocate a full V×V array unless memory is extremely constrained.
Was this helpful?
05
How do I decide which representation to use in an interview?
Start by asking the interviewer: 'What is the range of V and E?' If they say it's a large or sparse graph, immediately say adjacency list. If they mention dense or small graph, consider matrix. Usually 80% of graph interview problems are solved with adjacency list. The decision framework in this article covers it systematically.