Simulated Annealing — Cooling Schedule Caused Unrouted Nets
Production VLSI layout faced 15% wire-length over budget due to aggressive cooling frozen search.
- Simulated annealing (SA) escapes local optima by accepting worse moves with probability exp(-ΔE/T)
- Starts at high temperature (exploration) and slowly cools to greedy (exploitation)
- Cooling schedule is the critical knob: too fast traps you, too slow wastes compute
- Geometric cooling (T *= 0.995) is the workhorse in production — simple and effective
- Real-world cost: each temperature stage runs ~100 iterations, total ~10,000 evaluations for medium problems
- Production gotcha: SA needs a good initial temperature — set it so ~80% of moves are accepted at start
When you heat metal and let it cool slowly (annealing), atoms arrange into a low-energy crystalline structure — the global minimum. Simulated annealing applies this to optimisation: start hot (accept bad moves freely), cool slowly (accept bad moves less and less often), freeze (only accept improvements). The random acceptance at high temperature lets the algorithm escape local optima that trap pure hill-climbing.
Simulated annealing (SA) was introduced by Kirkpatrick, Gelatt, and Vecchi in 1983 as a method for solving combinatorial optimisation problems. The key insight from statistical mechanics: systems in thermal equilibrium at temperature T have state energies distributed according to the Boltzmann distribution — higher energy states are possible but less probable, with probability proportional to exp(-ΔE/kT).
Applied to optimisation: at high temperature T, accept moves that make the objective worse with probability exp(-Δcost/T). As T decreases, only improvements are accepted. This controlled randomness escapes local optima that would trap pure greedy search.
Simulated Annealing Algorithm
Simulated annealing generalises gradient-free optimization. At each step, you generate a neighbour of the current solution. If the neighbour improves the cost (ΔE < 0), accept it always. If it worsens the cost, accept it with probability exp(-ΔE/T). This 'case acceptance' at high temperature lets you explore globally.
Here's a production-ready implementation in Python. It uses geometric cooling and tracks the best solution seen (not just the current one) because you can wander into a bad region and still remember the best discovered peak.
- At high T: you're a drunkard wandering the solution space — every move is possible
- At medium T: you're a tipsy climber — you mostly go uphill but sometimes take a detour
- At low T: you're a sober mountaineer — you only move to higher ground
- The cooling schedule dictates how long you stay in each state of sobriety
neighbour() must generate a small perturbation — like swapping two cities. A neighbour that's too large (random permutation) breaks SA's local structure and kills convergence.Cooling Schedules: Which One for Production?
The cooling schedule determines how fast the temperature falls. The four main types:
Geometric cooling: T ← T * α, where α ∈ (0.9, 0.999). Simple and efficient. α close to 0.999 gives near-logarithmic behaviour but with constant computational cost per step.
Logarithmic cooling: T = T₀ / log(1 + t). Theoretically guarantees convergence to global optimum (Hajek, 1988) but needs trillions of steps for complex problems — impractical.
Adaptive cooling: Adjust cooling rate based on the observed acceptance rate. Target ~50% acceptance at each temperature. If acceptance > 50%, cool faster; if < 20%, reheat slightly. More robust for unknown problem landscapes.
Exponential cooling (fast): T = T₀ * exp(-β t). Used when you need a quick answer — converges fast but may get stuck.
Simulated Annealing vs Hill Climbing
Both algorithms start from an initial solution and iteratively improve it. The critical difference: hill climbing only moves to better solutions. This greediness gets it stuck in the first local optimum. SA accepts worse moves with decreasing probability, letting it climb out of local pits.
When hill climbing works: convex or near-convex landscapes, small search spaces, where the first local optimum is good enough.
When SA wins: multimodal landscapes with many local optima, large search spaces (TSP, VLSI, protein folding), when quality matters more than speed.
Hybrid approach: Run SA for global search, then hill climb from the best SA solution to fine-tune. This is common in production — hill climbing is cheap, and it tightens the final solution.
Convergence Guarantees and Practical Reality
Hajek's theorem (1988) proves that SA converges to the global optimum if the cooling schedule satisfies: T(t) ≥ c / log(t+2) for some c > d, where d is the maximum 'depth' of a local optimum. This is logarithmic cooling.
In practice, you cannot run infinitely many steps. Instead, you estimate the critical depth d* by running SA multiple times with different random seeds and check if the final cost variance is small. A variance < 2% across 10 runs suggests you're close to the global optimum.
Practical stopping criteria: - Number of iterations (e.g., 1000 problem size) - No improvement in best cost for n consecutive temperatures (e.g., 20 steps) - Temperature drops below a threshold (T < 0.001 T_start) - Time budget (e.g., 30 minutes)
Applications and Where SA Fails
Simulated annealing shines in combinatorial optimisation with rugged landscapes. Key applications:
- VLSI placement (original 1983 paper): minimise wire length in chip design
- Traveling Salesman Problem: standard benchmark — SA can find near-optimal tours for up to 10⁵ cities
- Protein folding: find minimum energy 3D structure (slow, but better bounds than greedy)
- Circuit board routing: place tracks to minimise area and crosstalk
- Machine learning: hyperparameter tuning for expensive models (e.g., neural architecture search)
Where SA fails: - Problems with long-range dependencies (e.g., graph partitioning where swaps affect many edges). The neighbour function becomes too expensive to evaluate. - Problems where you need a deterministic answer (healthcare, finance audit). Use branch-and-bound or integer programming instead. - Problems with many equivalent global optima (flat plateaus). SA wastes time jumping between equally good solutions — use a hill climber from a good initial point instead.
VLSI Chip Layout: The Unrouted Net Debacle
- Cooling schedule is the single most impactful parameter — more than iterations per temperature
- Logarithmic cooling guarantees convergence to global optimum but is impractically slow; hybrid schedules are the production standard
- Always verify with random restarts: if the final cost varies by more than 5% across runs, your schedule is too fast
Key takeaways
Common mistakes to avoid
5 patternsStarting temperature too low
Forgetting to track the best solution seen
Using overly large neighbour steps
Not running multiple restarts to assess variability
Using logarithmic cooling for a production problem
Interview Questions on This Topic
Why does simulated annealing escape local optima while hill climbing cannot?
Frequently Asked Questions
That's Genetic Algorithms. Mark it forged?
3 min read · try the examples if you haven't