Newton-Raphson Divergence: Near-Zero Vega Kills Pricing
Deep OTM options with <1 day expiry cause near-zero vega, making Newton-Raphson diverge.
- Newton-Raphson finds roots by iteratively following the tangent line to zero.
- Quadratic convergence: error squares each iteration, reaching ~15 decimal digits in 5 steps.
- Requires derivative f'(x); fails when derivative is near zero, initial guess is poor, or multiple roots exist.
- Hardware square root and Quake III's fast inverse sqrt are built on Newton-Raphson.
- Production rule: always combine with a bracketing method (Brent's) for guaranteed convergence.
The floating-point sqrt() function in your processor uses a variant of Newton-Raphson. Fast inverse square root — the infamous Quake III hack — is Newton-Raphson with a clever initial guess via bit manipulation. Modern calculator chips use Newton-Raphson for division, square root, and transcendental functions.
Beyond numerical computation, Newton-Raphson generalises to optimisation: find the minimum of f by finding the zero of f'. The generalisation to multiple variables gives Newton's optimisation method (second-order: uses the Hessian). Quasi-Newton methods like L-BFGS — the default optimiser for large-scale ML models — approximate the Hessian to avoid the O(n^3) cost of inverting it exactly. Every major machine learning framework's default optimizer traces back to Newton's 1669 method.
The Algorithm
Derive from Taylor series: f(x + h) ≈ f(x) + h·f'(x). Set this to zero: h = -f(x)/f'(x). The next iterate is x + h = x - f(x)/f'(x).
Quadratic Convergence
Newton-Raphson converges quadratically: the error at step n+1 is proportional to the square of the error at step n. If you have 2 correct decimal places, the next step gives ~4, then ~8, then ~16.
When Newton-Raphson Fails
Newton-Raphson can fail in several ways:
Derivative is zero: f'(x)=0 causes division by zero. Happens at local extrema.
Oscillation: For some functions, x oscillates between two values without converging. f(x) = x^(1/3) at x=1 oscillates.
Divergence: If initial guess is too far from root, the method can diverge or find a different root.
Multiple roots: Convergence is only linear (not quadratic) at roots where f(x0)=f'(x0)=0 (multiple roots).
Applications in ML
Newton-Raphson generalises to optimisation: minimise f(x) by finding f'(x)=0, applying NR to f': x_{n+1} = x_n - f'(x_n)/f''(x_n)
In matrix form (Newton's method for optimisation): x_{n+1} = x_n - H^(-1) ∇f(x_n) where H is the Hessian matrix. This is the second-order optimisation method (Newton step) used in quasi-Newton methods like L-BFGS.
Practical Implementation and Safeguards
For production code, never rely on bare Newton-Raphson. Combine it with a bracketing method. Brent's method (scipy.optimize.brentq) is the gold standard: it uses inverse quadratic interpolation when possible, falls back to bisection when needed, and guarantees convergence for continuous functions provided a bracket exists.
- Always require a bracket [a,b] where f(a)*f(b) < 0 (ensures a root exists by intermediate value theorem).
- Detect near-zero derivative and switch to bisection step.
- Limit iterations and detect divergence (monitor f(x) increasing instead of decreasing).
- For multiple roots, use the modified Newton x_{n+1} = x_n
- m*f(x_n)/f'(x_n) where m is the multiplicity.
| Method | Convergence Rate | Requires Derivative? | Bracket Needed? | Guaranteed Convergence? |
|---|---|---|---|---|
| Bisection | Linear (1 bit/iteration) | No | Yes | Yes |
| Newton-Raphson | Quadratic | Yes | No | No |
| Secant | Superlinear (~1.618) | No (uses secant) | No | No |
| Brent | Superlinear (hybrid) | No (interpolation) | Yes | Yes |
Key Takeaways
- x_{n+1} = x_n - f(x_n)/f'(x_n). Derived from Taylor series by zeroing the linear approximation. Geometrically: follow the tangent line to where it crosses zero.
- Quadratic convergence: error squares each iteration. 2 correct digits → 4 → 8 → 16. For 64-bit floating point precision (15 significant digits), ~5 iterations from a decent starting guess.
- Fails at: derivative zero or near-zero (division by zero), bad initial guess (diverges or finds wrong root), multiple roots (only linear convergence near roots where f'=0 too).
- The Babylonian square root algorithm (x_{n+1} = (x + n/x) / 2) is Newton-Raphson applied to f(x) = x^2 - n. This is how hardware sqrt is implemented.
- For robust root finding in production: scipy.optimize.brentq combines bisection (guaranteed convergence) with Newton (fast near the root). Never implement raw Newton-Raphson for production numerical code.
- In production numerical code, always bracket the root and use a fallback method (Brent's) to handle edge cases gracefully.
Common Mistakes to Avoid
- Not checking derivative near zero
Symptom: Division by zero or overflow in production code
Fix: Add check: if abs(dfx) < epsilon: switch to bisection or fallback. - Using a fixed initial guess for all inputs
Symptom: Method diverges for some inputs, works for others
Fix: Use adaptive initial guess (e.g., from function range) or bracket first. - No iteration limit
Symptom: Infinite loop in production, CPU spike
Fix: Always set max_iter and raise error after limit. - Assuming quadratic convergence near multiple roots
Symptom: Converges slowly but still works for simple examples
Fix: Detect root multiplicity by checking f'(x) near root, use modified Newton.
Interview Questions on This Topic
- QDerive the Newton-Raphson update rule from the Taylor series.Mid-levelReveal
- QWhy does Newton-Raphson converge quadratically?SeniorReveal
- QWhat are three situations where Newton-Raphson can fail?Mid-levelReveal
- QHow is the Babylonian square root algorithm related to Newton-Raphson?JuniorReveal
- QExplain why Newton's method for optimization uses the Hessian and how quasi-Newton methods approximate it.SeniorReveal
Frequently Asked Questions
How does Newton-Raphson compare to the bisection method?
Newton-Raphson converges quadratically (much faster) but requires differentiability and a good starting point. Bisection converges linearly but is guaranteed to converge for any continuous function on a bracketed interval. Brent's method combines both for robust practical root-finding.
What is the difference between Newton-Raphson and the secant method?
The secant method approximates the derivative using a finite difference, so it does not require an explicit f'(x). Its convergence is superlinear (order ~1.618) instead of quadratic. It still needs a good initial guess but avoids computing derivatives.
Can Newton-Raphson be used to find complex roots?
Yes, if f and f' are complex differentiable (analytic). The method extends to complex numbers, but convergence basins can be fractal-like (see Newton fractals). Initial guess selection becomes even more critical.
Why does Newton-Raphson fail for multiple roots?
At a root where f(r)=0 and f'(r)=0, the Taylor expansion lacks the linear term; convergence becomes linear (not quadratic). The modified Newton method x_{n+1} = x_n - m f(x_n)/f'(x_n) restores quadratic convergence if the multiplicity m is known.
That's Numerical Analysis. Mark it forged?
3 min read · try the examples if you haven't