Newton-Raphson Method β Root Finding Algorithm
- x_{n+1} = x_n - f(x_n)/f'(x_n). Derived from Taylor series by zeroing the linear approximation. Geometrically: follow the tangent line to where it crosses zero.
- Quadratic convergence: error squares each iteration. 2 correct digits β 4 β 8 β 16. For 64-bit floating point precision (15 significant digits), ~5 iterations from a decent starting guess.
- Fails at: derivative zero or near-zero (division by zero), bad initial guess (diverges or finds wrong root), multiple roots (only linear convergence near roots where f'=0 too).
The floating-point sqrt() function in your processor uses a variant of Newton-Raphson. Fast inverse square root β the infamous Quake III hack β is Newton-Raphson with a clever initial guess via bit manipulation. Modern calculator chips use Newton-Raphson for division, square root, and transcendental functions.
Beyond numerical computation, Newton-Raphson generalises to optimisation: find the minimum of f by finding the zero of f'. The generalisation to multiple variables gives Newton's optimisation method (second-order: uses the Hessian). Quasi-Newton methods like L-BFGS β the default optimiser for large-scale ML models β approximate the Hessian to avoid the O(n^3) cost of inverting it exactly. Every major machine learning framework's default optimizer traces back to Newton's 1669 method.
The Algorithm
Derive from Taylor series: f(x + h) β f(x) + hΒ·f'(x). Set this to zero: h = -f(x)/f'(x). The next iterate is x + h = x - f(x)/f'(x).
def newton_raphson(f, df, x0: float, tol: float = 1e-10, max_iter: int = 100) -> float: """Find root of f using Newton-Raphson method.""" x = x0 for i in range(max_iter): fx = f(x) if abs(fx) < tol: print(f'Converged in {i} iterations') return x dfx = df(x) if abs(dfx) < 1e-14: raise ValueError('Derivative near zero β method fails') x = x - fx / dfx raise ValueError(f'Did not converge after {max_iter} iterations') # Find sqrt(2): solve f(x) = x^2 - 2 = 0 import math root = newton_raphson( f = lambda x: x**2 - 2, df = lambda x: 2*x, x0 = 1.0 ) print(f'sqrt(2) = {root}') print(f'Error: {abs(root - math.sqrt(2)):.2e}')
sqrt(2) = 1.4142135623730951
Error: 0.00e+00
Quadratic Convergence
Newton-Raphson converges quadratically: the error at step n+1 is proportional to the square of the error at step n. If you have 2 correct decimal places, the next step gives ~4, then ~8, then ~16.
import math x = 1.0 # initial guess for sqrt(2) target = math.sqrt(2) print('Iteration x Error') for i in range(6): error = abs(x - target) print(f'{i:9} {x:.16f} {error:.2e}') x = x - (x**2 - 2) / (2*x)
0 1.0000000000000000 4.14e-01
1 1.5000000000000000 8.58e-02
2 1.4166666666666667 2.45e-03
3 1.4142156862745099 2.12e-06
4 1.4142135623746899 1.60e-12
5 1.4142135623730951 0.00e+00
When Newton-Raphson Fails
Newton-Raphson can fail in several ways:
Derivative is zero: f'(x)=0 causes division by zero. Happens at local extrema.
Oscillation: For some functions, x oscillates between two values without converging. f(x) = x^(1/3) at x=1 oscillates.
Divergence: If initial guess is too far from root, the method can diverge or find a different root.
Multiple roots: Convergence is only linear (not quadratic) at roots where f(x0)=f'(x0)=0 (multiple roots).
Applications in ML
Newton-Raphson generalises to optimisation: minimise f(x) by finding f'(x)=0, applying NR to f': x_{n+1} = x_n - f'(x_n)/f''(x_n)
In matrix form (Newton's method for optimisation): x_{n+1} = x_n - H^(-1) βf(x_n) where H is the Hessian matrix. This is the second-order optimisation method (Newton step) used in quasi-Newton methods like L-BFGS.
def fast_sqrt(n: float) -> float: """Fast square root using Newton-Raphson β how calculators work.""" if n < 0: raise ValueError if n == 0: return 0 x = n # initial guess while True: x_new = (x + n/x) / 2 # Newton step for f(x)=x^2-n if abs(x_new - x) < 1e-15 * x: return x_new x = x_new print(fast_sqrt(2)) # 1.4142135623730951 print(fast_sqrt(144)) # 12.0 print(fast_sqrt(0.5)) # 0.7071067811865476
12.0
0.7071067811865476
π― Key Takeaways
- x_{n+1} = x_n - f(x_n)/f'(x_n). Derived from Taylor series by zeroing the linear approximation. Geometrically: follow the tangent line to where it crosses zero.
- Quadratic convergence: error squares each iteration. 2 correct digits β 4 β 8 β 16. For 64-bit floating point precision (15 significant digits), ~5 iterations from a decent starting guess.
- Fails at: derivative zero or near-zero (division by zero), bad initial guess (diverges or finds wrong root), multiple roots (only linear convergence near roots where f'=0 too).
- The Babylonian square root algorithm (x_{n+1} = (x + n/x) / 2) is Newton-Raphson applied to f(x) = x^2 - n. This is how hardware sqrt is implemented.
- For robust root finding in production: scipy.optimize.brentq combines bisection (guaranteed convergence) with Newton (fast near the root). Never implement raw Newton-Raphson for production numerical code.
Interview Questions on This Topic
- QDerive the Newton-Raphson update rule from the Taylor series.
- QWhy does Newton-Raphson converge quadratically?
- QWhat are three situations where Newton-Raphson can fail?
- QHow is the Babylonian square root algorithm related to Newton-Raphson?
Frequently Asked Questions
How does Newton-Raphson compare to the bisection method?
Newton-Raphson converges quadratically (much faster) but requires differentiability and a good starting point. Bisection converges linearly but is guaranteed to converge for any continuous function on a bracketed interval. Brent's method combines both for robust practical root-finding.
Developer and founder of TheCodeForge. I built this site because I was tired of tutorials that explain what to type without explaining why it works. Every article here is written to make concepts actually click.