Home DSA Simulated Annealing — Probabilistic Optimisation

Simulated Annealing — Probabilistic Optimisation

Where developers are forged. · Structured learning · Free forever.
📍 Part of: Genetic Algorithms → Topic 2 of 2
Learn simulated annealing — how accepting worse solutions with decreasing probability escapes local optima.
⚙️ Intermediate — basic DSA knowledge assumed
In this tutorial, you'll learn:
  • SA accepts worse solutions with probability exp(-Δcost/T). High T: explore freely. Low T: only accept improvements.
  • Cooling schedule is the critical parameter — too fast: trapped in local optima. Too slow: computationally expensive.
  • Advantage over hill climbing: escapes local optima. Advantage over GA: simpler implementation, single solution.
✦ Plain-English analogy ✦ Real code with output ✦ Interview questions
⚡ Quick Answer
When you heat metal and let it cool slowly (annealing), atoms arrange into a low-energy crystalline structure — the global minimum. Simulated annealing applies this to optimisation: start hot (accept bad moves freely), cool slowly (accept bad moves less and less often), freeze (only accept improvements). The random acceptance at high temperature lets the algorithm escape local optima that trap pure hill-climbing.

Simulated annealing (SA) was introduced by Kirkpatrick, Gelatt, and Vecchi in 1983 as a method for solving combinatorial optimisation problems. The key insight from statistical mechanics: systems in thermal equilibrium at temperature T have state energies distributed according to the Boltzmann distribution — higher energy states are possible but less probable, with probability proportional to exp(-ΔE/kT).

Applied to optimisation: at high temperature T, accept moves that make the objective worse with probability exp(-Δcost/T). As T decreases, only improvements are accepted. This controlled randomness escapes local optima that would trap pure greedy search.

Simulated Annealing Algorithm

simulated_annealing.py · PYTHON
123456789101112131415161718192021222324252627282930313233343536
import math, random

def simulated_annealing(initial_solution, neighbour_fn, cost_fn,
                         T_start=1000.0, T_end=0.1,
                         cooling_rate=0.995, iterations_per_temp=100):
    """
    General SA framework.
    neighbour_fn(solution) -> new solution (small random change)
    cost_fn(solution) -> float (lower = better)
    """
    current = initial_solution
    current_cost = cost_fn(current)
    best = current
    best_cost = current_cost
    T = T_start
    while T > T_end:
        for _ in range(iterations_per_temp):
            candidate = neighbour_fn(current)
            candidate_cost = cost_fn(candidate)
            delta = candidate_cost - current_cost
            # Accept if better, or with probability exp(-delta/T) if worse
            if delta < 0 or random.random() < math.exp(-delta / T):
                current = candidate
                current_cost = candidate_cost
                if current_cost < best_cost:
                    best = current
                    best_cost = current_cost
        T *= cooling_rate
    return best, best_cost

# Minimise a simple function with many local minima
def rastrigin(x): return x**2 + 10 - 10*math.cos(2*math.pi*x)
def neighbour(x): return x + random.uniform(-0.5, 0.5)

best_x, best_val = simulated_annealing(5.0, neighbour, rastrigin)
print(f'Best x: {best_x:.4f}, f(x): {best_val:.4f}')  # Near x=0, f=0
▶ Output
Best x: 0.0012, f(x): 0.0001

Cooling Schedules

The cooling schedule determines convergence quality vs speed:

Geometric cooling: T *= α (where α ≈ 0.9-0.99). Simple, widely used. Too fast → gets trapped. Too slow → computationally expensive.

Linear cooling: T -= ΔT. Rarely used — spends too much time at low temperatures.

Logarithmic cooling: T = T0/log(1+t). Theoretically guarantees convergence to global optimum but impractically slow.

Adaptive: Adjust cooling rate based on acceptance rate — target ~50% acceptance at each temperature.

🔥
SA in ProductionSimulated annealing is used in VLSI chip design (placing transistors to minimise wire length), protein folding (finding minimum energy 3D structure), and circuit board layout optimisation. The original 1983 paper applied it to VLSI placement at IBM — a trillion-dollar application area.

🎯 Key Takeaways

  • SA accepts worse solutions with probability exp(-Δcost/T). High T: explore freely. Low T: only accept improvements.
  • Cooling schedule is the critical parameter — too fast: trapped in local optima. Too slow: computationally expensive.
  • Advantage over hill climbing: escapes local optima. Advantage over GA: simpler implementation, single solution.
  • Guarantee: with logarithmic cooling, converges to global optimum — but impractically slow in practice.
  • Applications: VLSI placement, TSP, protein folding, circuit routing, timetabling.

Interview Questions on This Topic

  • QWhy does simulated annealing escape local optima while hill climbing cannot?
  • QWhat is the Boltzmann acceptance criterion and what does the temperature parameter control?
  • QHow does the cooling rate affect solution quality vs computation time?

Frequently Asked Questions

How do you choose the initial temperature?

Rule of thumb: T_start should be high enough that ~80% of random moves are accepted initially. To find it: sample random moves, compute average cost increase Δcost_avg, set T_start = -Δcost_avg / ln(0.8). This ensures the algorithm starts in 'exploration mode' rather than immediately hill-climbing.

🔥
Naren Founder & Author

Developer and founder of TheCodeForge. I built this site because I was tired of tutorials that explain what to type without explaining why it works. Every article here is written to make concepts actually click.

← PreviousGenetic Algorithms — Evolution-Based Optimisation
Forged with 🔥 at TheCodeForge.io — Where Developers Are Forged