Floyd-Warshall — int Overflow Causes Negative Distances
Integer overflow in Floyd-Warshall's dist matrix yields negative distances—use long and INF=Long.
- Floyd-Warshall finds shortest paths between ALL pairs of nodes in O(V³) time
- Works with negative edge weights — fails if negative cycles exist
- Core idea: for each node k, treat it as a potential intermediate stop on every i→j path
- Space: O(V²) for the distance matrix — problematic beyond ~10k nodes
- Production trap: integer overflow when adding distances; use long or check bounds
- Best for dense graphs (E ≈ V²); for sparse graphs run Dijkstra per source instead
Imagine you're a travel agent with a map of every city and every flight route between them. A customer asks: 'What's the cheapest way to get from ANY city to ANY other city?' Instead of asking a pilot to fly every possible route one at a time, Floyd-Warshall says: 'Let me check — what if we're allowed to stop in City 1? Does that make any pair cheaper? Now what if we can also stop in City 2?' It asks that question for every city, one by one, until it's considered every possible layover — and at the end, you have the cheapest route between every pair of cities on the entire map.
Navigation apps, network routers, ride-sharing platforms, and game AI all share one brutal requirement: they don't just need the shortest path from A to B — they need the shortest path between every pair of nodes in the graph. Dijkstra's algorithm is brilliant for single-source shortest paths, but running it once per node in a dense graph gets expensive fast. Floyd-Warshall was built for exactly this problem, and it powers real infrastructure you use every day.
The core problem Floyd-Warshall solves is called the All-Pairs Shortest Path (APSP) problem. Given a weighted graph with V vertices, find the shortest distance between every possible (source, destination) pair — including graphs with negative edge weights, as long as there are no negative-weight cycles. It achieves this through an elegant three-nested-loop dynamic programming recurrence that's deceptively simple to implement yet surprisingly deep to understand correctly.
By the end of this article you'll understand exactly why the recurrence works (not just that it does), how to implement it with proper negative-cycle detection, when Floyd-Warshall beats alternatives like repeated Dijkstra or Bellman-Ford, what the real memory and time costs look like at scale, and the production-level gotchas that will catch you out in an interview or on the job.
What is Floyd-Warshall? — Plain English
Floyd-Warshall solves the All-Pairs Shortest Path (APSP) problem: given a weighted graph, find the shortest distance between every pair of nodes, not just from one source. Dijkstra solves single-source shortest paths; Floyd-Warshall solves ALL pairs in one pass. It handles negative edge weights (but not negative cycles).
The real-world analogy: imagine a travel agent with a complete distance table between every city. Instead of running a GPS route for each origin city one at a time, Floyd-Warshall fills the entire table simultaneously by asking: 'Would going through city k make the trip from city i to city j cheaper?' It asks this question for every possible intermediate city k, and the table is complete when every city has been considered as a potential stopover.
How Floyd-Warshall Works — Step by Step
Floyd-Warshall is a dynamic programming algorithm. Define dist[i][j] as the shortest distance from node i to node j.
Initialization: 1. Set dist[i][i] = 0 for all nodes. 2. For each edge (i, j) with weight w, set dist[i][j] = w. 3. For all other pairs, set dist[i][j] = INF (a large number).
Main loop — consider each node k as a potential intermediate stop: 4. For k from 0 to V-1: For i from 0 to V-1: For j from 0 to V-1: If dist[i][k] + dist[k][j] < dist[i][j]: Update dist[i][j] = dist[i][k] + dist[k][j]
After the triple loop, dist[i][j] holds the shortest path considering all possible intermediate nodes.
Negative cycle detection: after the main loop, if dist[i][i] < 0 for any node i, a negative cycle exists. Walking that cycle repeatedly makes costs arbitrarily small, so shortest paths are undefined.
continue) saves ~V³ iterations of addition when INF values dominate early.dist[k][j] == INF check might miss valid updates if INF is not handled. Our safe-guard uses continue.C++ Implementation with Overflow Guard
In C++, the default integer type is 32-bit int, and overflow when adding distances is a subtle but critical bug. Use 64-bit long long and set INF to a value that won't overflow when doubled. A common choice is LLONG_MAX / 2. The code below mirrors the Java version but with C++ syntax, including the INF guard that skips updates when either intermediate distance is INF.
This implementation also includes a safe_add that returns INF if the sum would overflow, though the guard approach already prevents overflow by not adding when either operand is INF. However, in graphs with large finite weights (e.g., 10^12), dist[i][k] + dist[k][j] could still overflow. The production-safe approach is to check before adding: if dist[i][k] > INF - dist[k][j], then the sum equals INF and should be skipped.
The template below uses std::vector<std::vector<long long>> for the matrix, which also uses dynamic allocation — important for large V.
-Wall -Wextra -Wconversion to catch implicit narrowing. The overflow check before adding is critical when edge weights are large (e.g., 10^9). In C++, LLONG_MAX is 9.22e18, so even two 10^12 weights sum to 2e12, safe. But if you accidentally use int INF = 1e9, adding two 1e9 values gives 2e9 which overflows signed int. Always use long long for distances.long long with LLONG_MAX/2 as INF. Add an explicit overflow guard before addition. This prevents undefined behavior the same way Java's long does.Worked Example — Tracing the Algorithm
Consider a graph with 4 nodes (0-3) and these directed edges: 0->1 weight 3, 0->3 weight 7, 1->0 weight 8, 1->2 weight 2, 2->0 weight 5, 2->3 weight 1, 3->0 weight 2
Initial dist matrix (INF = infinity): 0 1 2 3 0 [ 0, 3, INF, 7 ] 1 [ 8, 0, 2, INF] 2 [ 5, INF, 0, 1 ] 3 [ 2, INF, INF, 0 ]
After k=0 (using node 0 as intermediate): dist[1][3]: 8+7=15 < INF -> update to 15 dist[2][1]: 5+3=8 < INF -> update to 8 dist[2][3]: 5+7=12 > 1 -> no update dist[3][1]: 2+3=5 < INF -> update to 5 dist[3][2]: 2+INF -> no update
After k=1 (node 1 as intermediate): dist[0][2]: 3+2=5 < INF -> update to 5 dist[0][3]: 3+15=18> 7 -> no update dist[3][2]: 5+2=7 < INF -> update to 7
After k=2 (node 2 as intermediate): dist[0][3]: 5+1=6 < 7 -> update to 6 dist[1][3]: 2+1=3 < 15 -> update to 3 dist[3][3]: 7+1=8 > 0 -> no update
After k=3 (node 3 as intermediate): dist[0][0]: 6+2=8 > 0 -> no update dist[1][0]: 3+2=5 < 8 -> update to 5 dist[2][0]: 1+2=3 < 5 -> update to 3
Final dist matrix: 0 1 2 3 0 [ 0, 3, 5, 6 ] 1 [ 5, 0, 2, 3 ] 2 [ 3, 6, 0, 1 ] 3 [ 2, 5, 7, 0 ]
For example, shortest path 1->0 is 5 (via 1->2->3->0: weight 2+1+2=5).
next matrix alongside.Matrix Evolution: From A₀ to A₄
Each iteration of the outer loop over k produces a new distance matrix, denoted Aₖ, where Aₖ[i][j] is the shortest path from i to j that only uses intermediate nodes in {0..k-1}. The initial matrix A₀ uses no intermediate nodes (only direct edges). After k=0, A₁ considers node 0 as intermediate. After k=3, A₄ considers all nodes. Below is the evolution for the 4-node example from the worked example.
A₀ (direct edges only): [0, 3, ∞, 7] [8, 0, 2, ∞] [5, ∞, 0, 1] [2, ∞, ∞, 0]
A₁ (intermediate node 0 allowed): [0, 3, ∞, 7] [8, 0, 2, 15] [5, 8, 0, 1] (via 0: 5+3=8 for (2,1)) [2, 5, ∞, 0] (via 0: 2+3=5 for (3,1))
A₂ (intermediate nodes 0,1 allowed): [0, 3, 5, 7] (via 1: 3+2=5 for (0,2)) [8, 0, 2, 15] [5, 8, 0, 1] [2, 5, 7, 0] (via 1: 5+2=7 for (3,2))
A₃ (intermediate nodes 0,1,2 allowed): [0, 3, 5, 6] (via 2: 5+1=6 for (0,3)) [8, 0, 2, 3] (via 2: 2+1=3 for (1,3)) [5, 8, 0, 1] [2, 5, 7, 0]
A₄ (all nodes 0,1,2,3 allowed) — final: [0, 3, 5, 6] [5, 0, 2, 3] (via 3: 3+2=5 for (1,0)) [3, 6, 0, 1] (via 3: 1+2=3 for (2,0)) [2, 5, 7, 0]
Notice how each matrix builds on the previous one. The DP invariant is that after processing k, the matrix holds shortest paths using only nodes 0..k as intermediates. This phase evolution is exactly what makes Floyd-Warshall a dynamic programming algorithm.
Negative Cycle Detection and Handling
A negative weight cycle makes shortest paths undefined because you can loop infinitely to reduce cost. Floyd-Warshall can detect them easily.
After the main triple loop, check the diagonal of the distance matrix. If any dist[i][i] < 0, node i is part of a negative cycle. However, this only tells you that a cycle exists, not which nodes are affected. In fact, every node that can reach and be reached from that cycle also has undefined shortest paths (distances become -∞).
To fully identify affected nodes, run a reachability analysis: find all nodes that can reach the cycle and all nodes the cycle can reach. Those nodes' distances are effectively -∞.
In code, after detecting dist[i][i] < 0, mark node i as 'bad'. Then propagate: if dist[bad][j] < INF then j is also bad (can reach from bad). Also if dist[k][bad] < INF then k is bad (can reach bad). Combine.
Using Floyd-Warshall for Transitive Closure
Transitive closure asks: for every pair (i,j), is there a path from i to j? This is a boolean version of APSP. Floyd-Warshall can compute the transitive closure in O(V³) time by using a boolean reachability matrix instead of a distance matrix. Initialize reach[i][i] = true, and for each edge (u,v), set reach[u][v] = true. Then the core update becomes: reach[i][j] = reach[i][j] OR (reach[i][k] AND reach[k][j]).
This is exactly the same DP structure, but with boolean logic. It's faster to implement than a DFS/BFS from every node (O(V·(V+E))) and uses less code. However, for large graphs, you could use bitset optimisation (V³/wordsize) if the graph is dense.
One production trick: if you already have the shortest path matrix with INF, you can derive transitive closure by replacing INF with 0 and any finite distance with 1. But the dedicated boolean version saves memory and avoids overflow concerns.
When to Use Floyd-Warshall vs Alternatives
Floyd-Warshall is not always the answer. Here's when to pick it:
Use Floyd-Warshall when: - Graph is dense (E ~ V²) - V ≤ 1000 (otherwise memory becomes huge) - Negative edge weights are present (Dijkstra can't handle them) - All-pairs results are required (not just single source)
Use repeated Dijkstra when: - Graph is sparse (E << V²) - No negative edges - V is large (10k+)
Use Bellman-Ford per source when: - Negative edges exist but graph is very sparse - V ≤ 500
Use Johnson's algorithm when: - Sparse graph with negative edges and large V - It reweights via Bellman-Ford then runs Dijkstra per source
Real performance numbers: For V=1000, E=10000 (dense-ish), Floyd-Warshall is about 1 second in Java (JIT warmed). Repeated Dijkstra (binary heap) takes ~0.3 seconds. For V=5000, Floyd-Warshall takes ~100 seconds and 200 MB memory; Dijkstra takes ~2 seconds. The crossover point depends on graph density.
Advantages and Disadvantages of Floyd-Warshall
| Advantage | Disadvantage |
|---|---|
| Simple to implement: only three nested loops | O(V³) time complexity — too slow for V > 1000 in practice |
| Handles negative edge weights (unlike Dijkstra) | Fails on graphs with negative cycles (detectable) |
| Guarantees correct APSP after exactly V iterations of outer loop | O(V²) memory — cannot scale to graphs with 10k+ nodes |
| No need for complex data structures (heaps, adjacency lists) | On sparse graphs, repeated Dijkstra or Johnson's is faster |
| Can be adapted for transitive closure, minimax path, etc. | Not suitable for streaming or dynamic graphs (no incremental update) |
The most significant pro is simplicity: Floyd-Warshall is often the first all-pairs algorithm taught in algorithms courses. Its deterministic runtime (always V³) makes it predictable for small graphs. The biggest con is scalability — both time and memory explode with V. In production, a V=2000 graph with Floyd-Warshall takes 8 seconds (O(V³) = 8e9 operations) and 32 MB for a long matrix. V=5000 takes 125 seconds and 200 MB — already problematic. V=10000 would take 1000 seconds and 800 MB — infeasible for most real-time systems.
Handling Floating-Point Weights with EPS
When edge weights are floating-point numbers (e.g., travel time in hours), directly comparing < can produce wrong results due to tiny arithmetic errors. For example, after many updates, two paths may have distances 0.9999999 and 1.0000001 due to floating-point precision. A strict < comparison would treat the first as shorter, but the true difference is noise. The standard fix is to use an epsilon (EPS) when comparing distances. Typically, EPS = 1e-9 or 1e-12 for double-precision.
The production guard becomes: if (fabs(via - dist[i][j]) < EPS) then consider them equal and keep the existing distance to avoid oscillating updates. Or use via < dist[i][j] - EPS to consider only significant improvements. This prevents the algorithm from infinitely tightening distances due to floating-point drift.
Floating-point Floyd-Warshall is rare in competitive programming (where weights are integers), but common in real-world simulations, GIS, and physics engines. If all weights are integers, prefer integers to avoid EPS headaches entirely.
double, be aware that adding many small numbers can cause accumulation errors. Prefer long for integer weights whenever possible.Practice Problems
Solidify your understanding of Floyd-Warshall with these problems that test each aspect of the algorithm: basic implementation, negative cycle detection, transitive closure, and production-like scenarios.
- CSES Shortest Routes II — The classic Floyd-Warshall problem: compute all-pairs shortest paths for a directed graph with positive and negative weights. Must handle up to V=500, E=2500. Validate your implementation against INF guards and negative cycle detection. [Link to CSES](https://cses.fi/problemset/task/1674)
- Codeforces 295B: Greg and Graph — An offline version where nodes are removed one by one and you need the sum of all shortest paths after each removal. Reverse the process and build the graph by adding nodes in reverse order, applying Floyd-Warshall incrementally. Tests understanding of the DP phase property. [Link to Codeforces](https://codeforces.com/problemset/problem/295/B)
- UVA 104: Arbitrage — Use Floyd-Warshall (or Floyd-like) to find a sequence of currency exchanges that yields a profit (positive cycle). This is a classic application of detecting positive cycles using multiplication or logarithms. [Link to UVA](https://onlinejudge.org/index.php?option=com_onlinejudge&Itemid=8&category=24&page=show_problem&problem=40)
- SPOJ TRANSP — Transitive closure on a directed graph. Use the boolean Floyd-Warshall to determine if there is a path between every pair. Test your implementation of the OR-AND variation. [Link to SPOJ](https://www.spoj.com/problems/TRANSP/)
- LeetCode 1462: Course Schedule IV — Equivalent to transitive closure: given prerequisites (edges), answer queries whether a course is a prerequisite of another. Use Floyd-Warshall with boolean matrix. V ≤ 100. [Link to LeetCode](https://leetcode.com/problems/course-schedule-iv/)
- CSES Flight Routes — Given a flight network with costs, find the cheapest route between all city pairs. Must output the routes (path reconstruction) and use long to avoid overflow. [Link to CSES](https://cses.fi/problemset/task/1666)
- Codeforces 1140C: Playlist — While not directly APSP, this problem can be solved with a variant of Floyd-Warshall to compute maximum path attribute ; practice adapting the DP pattern. [Link to Codeforces](https://codeforces.com/problemset/problem/1140/C)
Integer Overflow in Shortest Path Server Took Down Routing
- Always use long for shortest path distances when edge weights can be >1 or V > 1000.
- Check for overflow before adding — not after. Post-fix detection is too late.
- Set INF to a value that won't overflow when doubled, e.g., Long.MAX_VALUE / 2.
Key takeaways
Common mistakes to avoid
4 patternsUsing Floyd-Warshall on a sparse graph
Not checking for integer overflow in the distance matrix
Reordering the loops (i,k,j or i,j,k)
Assuming all nodes are affected if a negative cycle exists
Interview Questions on This Topic
What is the time and space complexity of Floyd-Warshall?
Frequently Asked Questions
That's Graphs. Mark it forged?
11 min read · try the examples if you haven't