Newton-Raphson Divergence: Near-Zero Vega Kills Pricing
- x_{n+1} = x_n - f(x_n)/f'(x_n). Derived from Taylor series by zeroing the linear approximation. Geometrically: follow the tangent line to where it crosses zero.
- Quadratic convergence: error squares each iteration. 2 correct digits → 4 → 8 → 16. For 64-bit floating point precision (15 significant digits), ~5 iterations from a decent starting guess.
- Fails at: derivative zero or near-zero (division by zero), bad initial guess (diverges or finds wrong root), multiple roots (only linear convergence near roots where f'=0 too).
- Newton-Raphson finds roots by iteratively following the tangent line to zero.
- Quadratic convergence: error squares each iteration, reaching ~15 decimal digits in 5 steps.
- Requires derivative f'(x); fails when derivative is near zero, initial guess is poor, or multiple roots exist.
- Hardware square root and Quake III's fast inverse sqrt are built on Newton-Raphson.
- Production rule: always combine with a bracketing method (Brent's) for guaranteed convergence.
Newton-Raphson Quick Debug Cheat Sheet
No convergence after max iterations
print(f(x), df(x)) # log current iterateimport matplotlib.pyplot as plt; plt.plot(x_range, f(x_range)) # visualizeDivision by zero error
if abs(dfx) < 1e-12: raise ValueError('derivative zero')Use bisection step instead of Newton stepSlow convergence (linear)
print(f(x), df(x)) # both near zero?Use modified Newton: x_{n+1} = x_n - m*f(x_n)/f'(x_n) with multiplicity mProduction Incident
Production Debug GuideSymptom → Action checklist
The floating-point sqrt() function in your processor uses a variant of Newton-Raphson. Fast inverse square root — the infamous Quake III hack — is Newton-Raphson with a clever initial guess via bit manipulation. Modern calculator chips use Newton-Raphson for division, square root, and transcendental functions.
Beyond numerical computation, Newton-Raphson generalises to optimisation: find the minimum of f by finding the zero of f'. The generalisation to multiple variables gives Newton's optimisation method (second-order: uses the Hessian). Quasi-Newton methods like L-BFGS — the default optimiser for large-scale ML models — approximate the Hessian to avoid the O(n^3) cost of inverting it exactly. Every major machine learning framework's default optimizer traces back to Newton's 1669 method.
The Algorithm
Derive from Taylor series: f(x + h) ≈ f(x) + h·f'(x). Set this to zero: h = -f(x)/f'(x). The next iterate is x + h = x - f(x)/f'(x).
def newton_raphson(f, df, x0: float, tol: float = 1e-10, max_iter: int = 100) -> float: """Find root of f using Newton-Raphson method.""" x = x0 for i in range(max_iter): fx = f(x) if abs(fx) < tol: print(f'Converged in {i} iterations') return x dfx = df(x) if abs(dfx) < 1e-14: raise ValueError('Derivative near zero — method fails') x = x - fx / dfx raise ValueError(f'Did not converge after {max_iter} iterations') # Find sqrt(2): solve f(x) = x^2 - 2 = 0 import math root = newton_raphson( f = lambda x: x**2 - 2, df = lambda x: 2*x, x0 = 1.0 ) print(f'sqrt(2) = {root}') print(f'Error: {abs(root - math.sqrt(2)):.2e}')
sqrt(2) = 1.4142135623730951
Error: 0.00e+00
Quadratic Convergence
Newton-Raphson converges quadratically: the error at step n+1 is proportional to the square of the error at step n. If you have 2 correct decimal places, the next step gives ~4, then ~8, then ~16.
import math x = 1.0 # initial guess for sqrt(2) target = math.sqrt(2) print('Iteration x Error') for i in range(6): error = abs(x - target) print(f'{i:9} {x:.16f} {error:.2e}') x = x - (x**2 - 2) / (2*x)
0 1.0000000000000000 4.14e-01
1 1.5000000000000000 8.58e-02
2 1.4166666666666667 2.45e-03
3 1.4142156862745099 2.12e-06
4 1.4142135623746899 1.60e-12
5 1.4142135623730951 0.00e+00
When Newton-Raphson Fails
Newton-Raphson can fail in several ways:
Derivative is zero: f'(x)=0 causes division by zero. Happens at local extrema.
Oscillation: For some functions, x oscillates between two values without converging. f(x) = x^(1/3) at x=1 oscillates.
Divergence: If initial guess is too far from root, the method can diverge or find a different root.
Multiple roots: Convergence is only linear (not quadratic) at roots where f(x0)=f'(x0)=0 (multiple roots).
Applications in ML
Newton-Raphson generalises to optimisation: minimise f(x) by finding f'(x)=0, applying NR to f': x_{n+1} = x_n - f'(x_n)/f''(x_n)
In matrix form (Newton's method for optimisation): x_{n+1} = x_n - H^(-1) ∇f(x_n) where H is the Hessian matrix. This is the second-order optimisation method (Newton step) used in quasi-Newton methods like L-BFGS.
def fast_sqrt(n: float) -> float: """Fast square root using Newton-Raphson — how calculators work.""" if n < 0: raise ValueError if n == 0: return 0 x = n # initial guess while True: x_new = (x + n/x) / 2 # Newton step for f(x)=x^2-n if abs(x_new - x) < 1e-15 * x: return x_new x = x_new print(fast_sqrt(2)) # 1.4142135623730951 print(fast_sqrt(144)) # 12.0 print(fast_sqrt(0.5)) # 0.7071067811865476
12.0
0.7071067811865476
Practical Implementation and Safeguards
For production code, never rely on bare Newton-Raphson. Combine it with a bracketing method. Brent's method (scipy.optimize.brentq) is the gold standard: it uses inverse quadratic interpolation when possible, falls back to bisection when needed, and guarantees convergence for continuous functions provided a bracket exists.
- Always require a bracket [a,b] where f(a)*f(b) < 0 (ensures a root exists by intermediate value theorem).
- Detect near-zero derivative and switch to bisection step.
- Limit iterations and detect divergence (monitor f(x) increasing instead of decreasing).
- For multiple roots, use the modified Newton x_{n+1} = x_n
- m*f(x_n)/f'(x_n) where m is the multiplicity.
def safe_newton_brent(f, df, a: float, b: float, tol: float = 1e-10, max_iter: int = 100) -> float: """Brent's method fallback with Newton step when safe. Part of io.thecodeforge.numerical package.""" if f(a) * f(b) > 0: raise ValueError('No root in bracket: f(a)*f(b) must be negative') x = (a + b) / 2 # initial guess for i in range(max_iter): fx = f(x) if abs(fx) < tol: return x dfx = df(x) if abs(dfx) < 1e-12: # Fallback to bisection x = (a + b) / 2 else: x_new = x - fx / dfx # If Newton step jumps outside bracket, fallback to bisection if x_new < a or x_new > b: x = (a + b) / 2 else: x = x_new # Update bracket if f(a) * f(x) < 0: b = x else: a = x raise ValueError(f'Did not converge in {max_iter} iterations') # Example: find root with bracket result = safe_newton_brent(lambda x: x**2 - 2, lambda x: 2*x, 1.0, 2.0) print(f'sqrt(2) = {result}')
| Method | Convergence Rate | Requires Derivative? | Bracket Needed? | Guaranteed Convergence? |
|---|---|---|---|---|
| Bisection | Linear (1 bit/iteration) | No | Yes | Yes |
| Newton-Raphson | Quadratic | Yes | No | No |
| Secant | Superlinear (~1.618) | No (uses secant) | No | No |
| Brent | Superlinear (hybrid) | No (interpolation) | Yes | Yes |
🎯 Key Takeaways
- x_{n+1} = x_n - f(x_n)/f'(x_n). Derived from Taylor series by zeroing the linear approximation. Geometrically: follow the tangent line to where it crosses zero.
- Quadratic convergence: error squares each iteration. 2 correct digits → 4 → 8 → 16. For 64-bit floating point precision (15 significant digits), ~5 iterations from a decent starting guess.
- Fails at: derivative zero or near-zero (division by zero), bad initial guess (diverges or finds wrong root), multiple roots (only linear convergence near roots where f'=0 too).
- The Babylonian square root algorithm (x_{n+1} = (x + n/x) / 2) is Newton-Raphson applied to f(x) = x^2 - n. This is how hardware sqrt is implemented.
- For robust root finding in production: scipy.optimize.brentq combines bisection (guaranteed convergence) with Newton (fast near the root). Never implement raw Newton-Raphson for production numerical code.
- In production numerical code, always bracket the root and use a fallback method (Brent's) to handle edge cases gracefully.
⚠ Common Mistakes to Avoid
Interview Questions on This Topic
- QDerive the Newton-Raphson update rule from the Taylor series.Mid-levelReveal
- QWhy does Newton-Raphson converge quadratically?SeniorReveal
- QWhat are three situations where Newton-Raphson can fail?Mid-levelReveal
- QHow is the Babylonian square root algorithm related to Newton-Raphson?JuniorReveal
- QExplain why Newton's method for optimization uses the Hessian and how quasi-Newton methods approximate it.SeniorReveal
Frequently Asked Questions
How does Newton-Raphson compare to the bisection method?
Newton-Raphson converges quadratically (much faster) but requires differentiability and a good starting point. Bisection converges linearly but is guaranteed to converge for any continuous function on a bracketed interval. Brent's method combines both for robust practical root-finding.
What is the difference between Newton-Raphson and the secant method?
The secant method approximates the derivative using a finite difference, so it does not require an explicit f'(x). Its convergence is superlinear (order ~1.618) instead of quadratic. It still needs a good initial guess but avoids computing derivatives.
Can Newton-Raphson be used to find complex roots?
Yes, if f and f' are complex differentiable (analytic). The method extends to complex numbers, but convergence basins can be fractal-like (see Newton fractals). Initial guess selection becomes even more critical.
Why does Newton-Raphson fail for multiple roots?
At a root where f(r)=0 and f'(r)=0, the Taylor expansion lacks the linear term; convergence becomes linear (not quadratic). The modified Newton method x_{n+1} = x_n - m f(x_n)/f'(x_n) restores quadratic convergence if the multiplicity m is known.
Developer and founder of TheCodeForge. I built this site because I was tired of tutorials that explain what to type without explaining why it works. Every article here is written to make concepts actually click.