Intermediate 3 min · March 24, 2026

Newton-Raphson Divergence: Near-Zero Vega Kills Pricing

Deep OTM options with <1 day expiry cause near-zero vega, making Newton-Raphson diverge.

N
Naren · Founder
Plain-English first. Then code. Then the interview question.
About
Quick Answer
  • Newton-Raphson finds roots by iteratively following the tangent line to zero.
  • Quadratic convergence: error squares each iteration, reaching ~15 decimal digits in 5 steps.
  • Requires derivative f'(x); fails when derivative is near zero, initial guess is poor, or multiple roots exist.
  • Hardware square root and Quake III's fast inverse sqrt are built on Newton-Raphson.
  • Production rule: always combine with a bracketing method (Brent's) for guaranteed convergence.

The floating-point sqrt() function in your processor uses a variant of Newton-Raphson. Fast inverse square root — the infamous Quake III hack — is Newton-Raphson with a clever initial guess via bit manipulation. Modern calculator chips use Newton-Raphson for division, square root, and transcendental functions.

Beyond numerical computation, Newton-Raphson generalises to optimisation: find the minimum of f by finding the zero of f'. The generalisation to multiple variables gives Newton's optimisation method (second-order: uses the Hessian). Quasi-Newton methods like L-BFGS — the default optimiser for large-scale ML models — approximate the Hessian to avoid the O(n^3) cost of inverting it exactly. Every major machine learning framework's default optimizer traces back to Newton's 1669 method.

The Algorithm

Derive from Taylor series: f(x + h) ≈ f(x) + h·f'(x). Set this to zero: h = -f(x)/f'(x). The next iterate is x + h = x - f(x)/f'(x).

Quadratic Convergence

Newton-Raphson converges quadratically: the error at step n+1 is proportional to the square of the error at step n. If you have 2 correct decimal places, the next step gives ~4, then ~8, then ~16.

When Newton-Raphson Fails

Derivative is zero: f'(x)=0 causes division by zero. Happens at local extrema.

Oscillation: For some functions, x oscillates between two values without converging. f(x) = x^(1/3) at x=1 oscillates.

Divergence: If initial guess is too far from root, the method can diverge or find a different root.

Multiple roots: Convergence is only linear (not quadratic) at roots where f(x0)=f'(x0)=0 (multiple roots).

Applications in ML

Newton-Raphson generalises to optimisation: minimise f(x) by finding f'(x)=0, applying NR to f': x_{n+1} = x_n - f'(x_n)/f''(x_n)

In matrix form (Newton's method for optimisation): x_{n+1} = x_n - H^(-1) ∇f(x_n) where H is the Hessian matrix. This is the second-order optimisation method (Newton step) used in quasi-Newton methods like L-BFGS.

Practical Implementation and Safeguards

For production code, never rely on bare Newton-Raphson. Combine it with a bracketing method. Brent's method (scipy.optimize.brentq) is the gold standard: it uses inverse quadratic interpolation when possible, falls back to bisection when needed, and guarantees convergence for continuous functions provided a bracket exists.

Key implementation rules
  • Always require a bracket [a,b] where f(a)*f(b) < 0 (ensures a root exists by intermediate value theorem).
  • Detect near-zero derivative and switch to bisection step.
  • Limit iterations and detect divergence (monitor f(x) increasing instead of decreasing).
  • For multiple roots, use the modified Newton x_{n+1} = x_n
  • m*f(x_n)/f'(x_n) where m is the multiplicity.
Newton-Raphson vs Other Root-Finding Methods
MethodConvergence RateRequires Derivative?Bracket Needed?Guaranteed Convergence?
BisectionLinear (1 bit/iteration)NoYesYes
Newton-RaphsonQuadraticYesNoNo
SecantSuperlinear (~1.618)No (uses secant)NoNo
BrentSuperlinear (hybrid)No (interpolation)YesYes

Key Takeaways

  • x_{n+1} = x_n - f(x_n)/f'(x_n). Derived from Taylor series by zeroing the linear approximation. Geometrically: follow the tangent line to where it crosses zero.
  • Quadratic convergence: error squares each iteration. 2 correct digits → 4 → 8 → 16. For 64-bit floating point precision (15 significant digits), ~5 iterations from a decent starting guess.
  • Fails at: derivative zero or near-zero (division by zero), bad initial guess (diverges or finds wrong root), multiple roots (only linear convergence near roots where f'=0 too).
  • The Babylonian square root algorithm (x_{n+1} = (x + n/x) / 2) is Newton-Raphson applied to f(x) = x^2 - n. This is how hardware sqrt is implemented.
  • For robust root finding in production: scipy.optimize.brentq combines bisection (guaranteed convergence) with Newton (fast near the root). Never implement raw Newton-Raphson for production numerical code.
  • In production numerical code, always bracket the root and use a fallback method (Brent's) to handle edge cases gracefully.

Common Mistakes to Avoid

  • Not checking derivative near zero
    Symptom: Division by zero or overflow in production code
    Fix: Add check: if abs(dfx) < epsilon: switch to bisection or fallback.
  • Using a fixed initial guess for all inputs
    Symptom: Method diverges for some inputs, works for others
    Fix: Use adaptive initial guess (e.g., from function range) or bracket first.
  • No iteration limit
    Symptom: Infinite loop in production, CPU spike
    Fix: Always set max_iter and raise error after limit.
  • Assuming quadratic convergence near multiple roots
    Symptom: Converges slowly but still works for simple examples
    Fix: Detect root multiplicity by checking f'(x) near root, use modified Newton.

Interview Questions on This Topic

  • QDerive the Newton-Raphson update rule from the Taylor series.Mid-levelReveal
    Start with first-order Taylor approximation: f(x + h) ≈ f(x) + h f'(x). Set f(x + h) = 0 to find h, giving h = -f(x)/f'(x). Then update x_{new} = x + h = x - f(x)/f'(x).
  • QWhy does Newton-Raphson converge quadratically?SeniorReveal
    Assume e_n = x_n - r is the error. Using Taylor expansion around r, one can show e_{n+1} ≈ (f''(r) / (2f'(r))) e_n^2, so the error is squared each step, provided f'(r) ≠ 0 and the initial guess is sufficiently close.
  • QWhat are three situations where Newton-Raphson can fail?Mid-levelReveal
    1. Derivative zero at a point (division by zero). 2. Poor initial guess leading to divergence or convergence to a different root. 3. Multiple roots where convergence degrades to linear.
  • QHow is the Babylonian square root algorithm related to Newton-Raphson?JuniorReveal
    The Babylonian method x_{new} = (x + n/x)/2 is exactly Newton-Raphson applied to f(x) = x^2 - n, where f'(x) = 2x.
  • QExplain why Newton's method for optimization uses the Hessian and how quasi-Newton methods approximate it.SeniorReveal
    Newton's method for optimization solves f'(x)=0 by applying Newton-Raphson to f'. The update becomes x_{new} = x - f'(x)/f''(x) or in vector case x_{new} = x - H^{-1}∇f. Quasi-Newton methods like L-BFGS approximate the inverse Hessian using rank-1 updates from gradient differences, avoiding O(n^3) inversion.

Frequently Asked Questions

How does Newton-Raphson compare to the bisection method?

Newton-Raphson converges quadratically (much faster) but requires differentiability and a good starting point. Bisection converges linearly but is guaranteed to converge for any continuous function on a bracketed interval. Brent's method combines both for robust practical root-finding.

What is the difference between Newton-Raphson and the secant method?

The secant method approximates the derivative using a finite difference, so it does not require an explicit f'(x). Its convergence is superlinear (order ~1.618) instead of quadratic. It still needs a good initial guess but avoids computing derivatives.

Can Newton-Raphson be used to find complex roots?

Yes, if f and f' are complex differentiable (analytic). The method extends to complex numbers, but convergence basins can be fractal-like (see Newton fractals). Initial guess selection becomes even more critical.

Why does Newton-Raphson fail for multiple roots?

At a root where f(r)=0 and f'(r)=0, the Taylor expansion lacks the linear term; convergence becomes linear (not quadratic). The modified Newton method x_{n+1} = x_n - m f(x_n)/f'(x_n) restores quadratic convergence if the multiplicity m is known.

🔥

That's Numerical Analysis. Mark it forged?

3 min read · try the examples if you haven't

Previous
Arithmetic Coding — Beyond Huffman
1 / 3 · Numerical Analysis
Next
Bisection Method — Numerical Root Finding