Senior 6 min · March 24, 2026

Simulated Annealing — Cooling Schedule Caused Unrouted Nets

Production VLSI layout faced 15% wire-length over budget due to aggressive cooling frozen search.

N
Naren Founder & Principal Engineer

20+ years shipping performance-critical code where algorithms decide the bill. Drawn from code that ran under real load.

Follow
Production
production tested
May 23, 2026
last updated
1,596
articles · all by Naren
 ● Production Incident 🔎 Debug Guide ⚙ Triage Commands
Quick Answer
  • Simulated annealing (SA) escapes local optima by accepting worse moves with probability exp(-ΔE/T)
  • Starts at high temperature (exploration) and slowly cools to greedy (exploitation)
  • Cooling schedule is the critical knob: too fast traps you, too slow wastes compute
  • Geometric cooling (T *= 0.995) is the workhorse in production — simple and effective
  • Real-world cost: each temperature stage runs ~100 iterations, total ~10,000 evaluations for medium problems
  • Production gotcha: SA needs a good initial temperature — set it so ~80% of moves are accepted at start
✦ Definition~90s read
What is Simulated Annealing?

Simulated Annealing (SA) is a probabilistic optimization algorithm inspired by the metallurgical annealing process — heating a material to reduce defects, then cooling slowly to reach a low-energy crystalline state. In software terms, SA is a stochastic search engine that escapes local optima by occasionally accepting worse solutions, with the probability of acceptance decreasing over time according to a cooling schedule.

When you heat metal and let it cool slowly (annealing), atoms arrange into a low-energy crystalline structure — the global minimum.

It solves the fundamental tension between exploration (searching broadly) and exploitation (refining a good solution) by starting with high 'temperature' (high acceptance of bad moves) and gradually cooling to near-zero acceptance, converging on a global optimum. This makes SA particularly effective for combinatorial optimization problems like VLSI routing, where the search space is discrete, rugged, and too large for exhaustive search.

SA operates by iterating over candidate solutions: at each step, it perturbs the current solution, evaluates the change in cost (e.g., number of unrouted nets), and accepts the new solution if it's better, or with probability exp(-ΔE/T) if it's worse, where T is the current temperature. The cooling schedule — how T decreases over time — is the critical lever.

Common schedules include exponential decay (fast, risk of premature convergence), linear decay (simple but inefficient), and logarithmic decay (theoretically guarantees convergence but impractically slow). In production EDA tools like Cadence Innovus or Synopsys IC Compiler, SA is often used with adaptive schedules that monitor acceptance rates and adjust cooling dynamically, balancing runtime against solution quality.

SA's key advantage over hill climbing is its ability to escape local minima — hill climbing greedily accepts only improvements, getting stuck easily. SA's stochastic acceptance gives it a 'second chance' to find better basins. However, SA has no finite-time convergence guarantee; the theoretical guarantee requires infinite cooling (logarithmic schedule), which is useless in practice.

Real-world SA implementations rely on heuristics: multiple restarts, reheating, or hybridizing with local search. SA fails when the cost landscape is deceptive (e.g., narrow global minimum surrounded by high barriers) or when the problem has exploitable structure that deterministic methods like branch-and-bound or dynamic programming can leverage.

For routing, SA's slow convergence on large designs often leads to unrouted nets when the cooling schedule is too aggressive — the algorithm freezes before finding a feasible global routing, leaving nets that no perturbation can fix.

Plain-English First

When you heat metal and let it cool slowly (annealing), atoms arrange into a low-energy crystalline structure — the global minimum. Simulated annealing applies this to optimisation: start hot (accept bad moves freely), cool slowly (accept bad moves less and less often), freeze (only accept improvements). The random acceptance at high temperature lets the algorithm escape local optima that trap pure hill-climbing.

Simulated annealing (SA) was introduced by Kirkpatrick, Gelatt, and Vecchi in 1983 as a method for solving combinatorial optimisation problems. The key insight from statistical mechanics: systems in thermal equilibrium at temperature T have state energies distributed according to the Boltzmann distribution — higher energy states are possible but less probable, with probability proportional to exp(-ΔE/kT).

Applied to optimisation: at high temperature T, accept moves that make the objective worse with probability exp(-Δcost/T). As T decreases, only improvements are accepted. This controlled randomness escapes local optima that would trap pure greedy search.

Simulated Annealing as a Stochastic Search Engine

Simulated annealing is a probabilistic optimization algorithm that mimics the physical annealing process in metallurgy. It starts with a high "temperature" parameter, allowing the algorithm to accept worse solutions with a certain probability, then gradually cools to zero, at which point only improvements are accepted. The core mechanic is the Metropolis acceptance criterion: a worse solution is accepted with probability exp(-ΔE / T), where ΔE is the change in cost and T is the current temperature. This controlled randomness lets the search escape local optima early while converging to a near-global optimum later.

In practice, the cooling schedule—how T decreases over iterations—determines success. Common schedules include exponential cooling (T = T₀ * α^k, α ≈ 0.95–0.99) and logarithmic cooling (T = T₀ / log(k)), which guarantees convergence to the global optimum given infinite time. The algorithm has no memory, so each step depends only on the current state and temperature. Key properties: it is simple to implement, works on discrete and continuous spaces, and requires only a cost function and a neighbor generation mechanism.

Use simulated annealing when the search space is large, multimodal, and gradient-free—for example, VLSI routing, traveling salesman, or protein folding. It matters in production because it often finds good-enough solutions faster than exact methods (which are NP-hard) and more robustly than greedy hill-climbing. The trade-off: tuning the cooling schedule and initial temperature is art, not science, and a bad schedule can produce results worse than random.

Cooling Schedule Is Not a Knob
A cooling schedule that cools too fast freezes the search into a poor local optimum; too slow wastes compute. It is the single most common cause of failure in production SA implementations.
Production Insight
In a VLSI routing tool, a team used exponential cooling with α=0.999 but only 1000 iterations, causing the temperature to barely drop and the algorithm to behave like random walk, leaving 15% of nets unrouted.
Symptom: final cost plateaus far above known good solutions, and the acceptance rate stays near 100% throughout the run.
Rule of thumb: set initial temperature so that ~80% of worse moves are accepted, and final temperature so that <1% are accepted; then choose iterations to match that exponential decay.
Key Takeaway
Simulated annealing trades optimality for robustness: it can escape local minima but never guarantees the global optimum.
The cooling schedule is the single tuning parameter that makes or breaks the algorithm—get it wrong and you get random walk or premature freezing.
Use SA only when you have no gradient, no structure to exploit, and can afford to run many iterations; otherwise prefer gradient-based or exact methods.
Simulated Annealing Cooling Schedule Impact THECODEFORGE.IO Simulated Annealing Cooling Schedule Impact How cooling schedule choices affect routing success in SA Stochastic Search Engine Randomized optimization via temperature control Cooling Schedule Exponential vs linear vs adaptive strategies Perturbation Function Random moves tuned to temperature state Unrouted Nets Result of too-rapid cooling or poor perturbations Convergence Check Verify net routing completeness at low temp ⚠ Fast cooling often leaves unrouted nets in production Use adaptive schedule; verify routing after each temperature drop THECODEFORGE.IO
thecodeforge.io
Simulated Annealing Cooling Schedule Impact
Simulated Annealing

Simulated Annealing Algorithm

Simulated annealing generalises gradient-free optimization. At each step, you generate a neighbour of the current solution. If the neighbour improves the cost (ΔE < 0), accept it always. If it worsens the cost, accept it with probability exp(-ΔE/T). This 'case acceptance' at high temperature lets you explore globally.

Here's a production-ready implementation in Python. It uses geometric cooling and tracks the best solution seen (not just the current one) because you can wander into a bad region and still remember the best discovered peak.

simulated_annealing.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
import math, random

def simulated_annealing(initial_solution, neighbour_fn, cost_fn,
                         T_start=1000.0, T_end=0.1,
                         cooling_rate=0.995, iterations_per_temp=100):
    """
    General SA framework.
    neighbour_fn(solution) -> new solution (small random change)
    cost_fn(solution) -> float (lower = better)
    """
    current = initial_solution
    current_cost = cost_fn(current)
    best = current
    best_cost = current_cost
    T = T_start
    while T > T_end:
        for _ in range(iterations_per_temp):
            candidate = neighbour_fn(current)
            candidate_cost = cost_fn(candidate)
            delta = candidate_cost - current_cost
            # Accept if better, or with probability exp(-delta/T) if worse
            if delta < 0 or random.random() < math.exp(-delta / T):
                current = candidate
                current_cost = candidate_cost
                if current_cost < best_cost:
                    best = current
                    best_cost = current_cost
        T *= cooling_rate
    return best, best_cost

# Minimise a simple function with many local minima
def rastrigin(x): return x**2 + 10 - 10*math.cos(2*math.pi*x)
def neighbour(x): return x + random.uniform(-0.5, 0.5)

best_x, best_val = simulated_annealing(5.0, neighbour, rastrigin)
print(f'Best x: {best_x:.4f}, f(x): {best_val:.4f}')  # Near x=0, f=0
Output
Best x: 0.0012, f(x): 0.0001
Think in Temperatures, Not Steps
  • At high T: you're a drunkard wandering the solution space — every move is possible
  • At medium T: you're a tipsy climber — you mostly go uphill but sometimes take a detour
  • At low T: you're a sober mountaineer — you only move to higher ground
  • The cooling schedule dictates how long you stay in each state of sobriety
Production Insight
If your problem has discrete states (e.g., TSP routes), neighbour() must generate a small perturbation — like swapping two cities. A neighbour that's too large (random permutation) breaks SA's local structure and kills convergence.
Empirical rule: neighbour should change ~1% of the solution components for combinatorial problems.
If acceptance rate drops below 10% after the first 10% of cooling, raise T_start — you missed the exploration window.
Key Takeaway
SA works because it accepts bad moves with decreasing probability — this is what escapes local optima.
Without temperature annealing, it's just random search or hill climbing.
Tune acceptance rate, not iteration count — that's the lever that matters.

Cooling Schedules: Which One for Production?

The cooling schedule determines how fast the temperature falls. The four main types:

Geometric cooling: T ← T * α, where α ∈ (0.9, 0.999). Simple and efficient. α close to 0.999 gives near-logarithmic behaviour but with constant computational cost per step.

Logarithmic cooling: T = T₀ / log(1 + t). Theoretically guarantees convergence to global optimum (Hajek, 1988) but needs trillions of steps for complex problems — impractical.

Adaptive cooling: Adjust cooling rate based on the observed acceptance rate. Target ~50% acceptance at each temperature. If acceptance > 50%, cool faster; if < 20%, reheat slightly. More robust for unknown problem landscapes.

Exponential cooling (fast): T = T₀ * exp(-β t). Used when you need a quick answer — converges fast but may get stuck.

The Logarithmic Trap
Logarithmic cooling is taught in every textbook because it's the only schedule with a proven guarantee. But in production, it's almost never used because the constant factors are absurd. For a problem with 10⁵ states, logarithmic cooling would require more evaluations than atoms in the universe. Stick to geometric or adaptive schedules — they work, and you can verify quality with restarts.
Production Insight
In VLSI chip design at IBM, the team used geometric cooling with α=0.99 and 200 iterations per temperature for a 10⁷-gate placement. Trade-off: each 0.001 increase in α adds ~20% more runtime but can improve quality by 3-5%.
Adaptive cooling adds complexity — you need to monitor acceptance rate dynamically. If that rate drops suddenly, your neighbour function is broken, not your schedule.
Rule: start with geometric α=0.995, tune based on cost variance across 10 runs.
Key Takeaway
Geometric cooling is the production workhorse — simple and tunable.
Never use logarithmic cooling for problems larger than 100 states.
Adaptive schedules are for adaptive problems — they add complexity but can save you when the landscape shifts.
Choosing a Cooling Schedule
IfNeed guaranteed global optimum for small problem (< 10² states)
UseUse logarithmic cooling — you can afford the wait
IfTypical combinatorial optimization (10³ - 10⁶ states)
UseGeometric cooling with α=0.99 - 0.999
IfUnknown landscape, changing objective at runtime
UseAdaptive cooling — reheats automatically to escape new local optima
IfNear-real-time decision (< 1 second per run)
UseExponential cooling with β=0.01, accept a quick approximation

Simulated Annealing vs Hill Climbing

Both algorithms start from an initial solution and iteratively improve it. The critical difference: hill climbing only moves to better solutions. This greediness gets it stuck in the first local optimum. SA accepts worse moves with decreasing probability, letting it climb out of local pits.

When hill climbing works: convex or near-convex landscapes, small search spaces, where the first local optimum is good enough.

When SA wins: multimodal landscapes with many local optima, large search spaces (TSP, VLSI, protein folding), when quality matters more than speed.

Hybrid approach: Run SA for global search, then hill climb from the best SA solution to fine-tune. This is common in production — hill climbing is cheap, and it tightens the final solution.

Production Insight
A common trap: using SA on a problem that's actually convex. You waste compute because every random jump away from the global minimum is accepted and then corrected. Benchmark with a quick hill climb first — if the result is within 5% of SA's, skip SA.
Another trap: not tracking the best solution. SA may wander away from the best state it ever found. Always return the best encountered solution, not the final one.
Cost to operation: If SA takes 2 hours and hill climb takes 10 seconds, and the gap is 2%, ask whether that 2% matters for your business. Often it doesn't — and you just saved 2 hours.
Key Takeaway
Hill climbing is greedy — SA is greedy with memory of the past.
Use hill climbing only when landscapes are smooth.
Hybrid SA + hill climb gives you global exploration + local refinement.

Convergence Guarantees and Practical Reality

Hajek's theorem (1988) proves that SA converges to the global optimum if the cooling schedule satisfies: T(t) ≥ c / log(t+2) for some c > d, where d is the maximum 'depth' of a local optimum. This is logarithmic cooling.

In practice, you cannot run infinitely many steps. Instead, you estimate the critical depth d* by running SA multiple times with different random seeds and check if the final cost variance is small. A variance < 2% across 10 runs suggests you're close to the global optimum.

Practical stopping criteria: - Number of iterations (e.g., 1000 problem size) - No improvement in best cost for n consecutive temperatures (e.g., 20 steps) - Temperature drops below a threshold (T < 0.001 T_start) - Time budget (e.g., 30 minutes)

Reproducibility Matters
SA is non-deterministic. For production pipelines, set a fixed random seed for reproducibility. When tuning, run multiple seeds and report median and IQR of final cost. Never report 'best run' — that's cherry-picking and will mislead your team.
Production Insight
In circuit board routing, engineers observed that a 1% probability of finishing far from global optimum was acceptable — the boards were still manufacturable. They stopped cooling as soon as the acceptance rate dropped below 2%, saving 40% runtime.
The key lesson: don't chase perfection. Define an acceptable threshold (e.g., within 3% of known best) and stop there. The last 1% of quality costs 10x the compute.
Monitor cost variance across runs as a health metric. If variance increases over time, your problem parameters changed — re-tune your schedule.
Key Takeaway
Theoretical guarantees require infinitely slow cooling — impractical.
Use multiple random restarts to estimate solution stability.
Stop early when you're 'good enough' — the last mile is the most expensive.

Applications and Where SA Fails

Simulated annealing shines in combinatorial optimisation with rugged landscapes. Key applications:

  • VLSI placement (original 1983 paper): minimise wire length in chip design
  • Traveling Salesman Problem: standard benchmark — SA can find near-optimal tours for up to 10⁵ cities
  • Protein folding: find minimum energy 3D structure (slow, but better bounds than greedy)
  • Circuit board routing: place tracks to minimise area and crosstalk
  • Machine learning: hyperparameter tuning for expensive models (e.g., neural architecture search)

Where SA fails: - Problems with long-range dependencies (e.g., graph partitioning where swaps affect many edges). The neighbour function becomes too expensive to evaluate. - Problems where you need a deterministic answer (healthcare, finance audit). Use branch-and-bound or integer programming instead. - Problems with many equivalent global optima (flat plateaus). SA wastes time jumping between equally good solutions — use a hill climber from a good initial point instead.

Production Insight
A machine learning team used SA to tune 15 hyperparameters for a random forest. It took 60 hours but found a configuration that improved recall by 1%. The same improvement was achieved by simply training more trees with default parameters. They wasted 60 hours because SA was overkill for a problem where grid search would have worked in 1 hour.
Rule: if your parameter space has fewer than 10 dimensions and the objective is cheap to evaluate (< 1 second), use grid search or random search — not SA.
SA is for problems where each evaluation costs minutes to hours.
Key Takeaway
SA is not a silver bullet — use it when evaluations are expensive and landscapes are multimodal.
For cheap evaluations, random search often matches SA with zero tuning.
For deterministic requirements, use exact methods — SA is probabilistic by nature.

The Physics Behind the Algorithm (Why Hot Metal Finds Better Bugs)

Simulated Annealing rips off thermodynamics. Specifically, the way metals cool and settle into a low-energy crystal structure. Heat a metal past its melting point and atoms scramble randomly; cool slowly and they lock into a near-perfect lattice. Cool too fast and you get defects — a local minimum in energy terms.

SA mimics that controlled chaos. High temperature means the algorithm accepts bad moves freely, exploring the search space like a drunk surveyor. As temperature drops, acceptance tightens. The algorithm starts rejecting uphill moves, eventually settling into a decent local minimum. The magic is that slow cooling allows escape from shallow local minima that would trap greedy hill climbing.

Kirkpatrick patented this in 1983 for the Traveling Salesman Problem. It's not a proof of optimality — it's statistical mechanics as optimization heuristic. The cost function becomes your energy landscape. The acceptance probability follows the Boltzmann distribution: exp(-\Delta E / T). Higher delta or lower temp kills acceptance probability exponentially fast.

BoltzmannAcceptance.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
// io.thecodeforge — dsa tutorial

// Core acceptance probability — stolen straight from physics
public class BoltzmannAcceptance {
    public static boolean accept(double currentCost, double candidateCost, double temperature) {
        if (candidateCost < currentCost) {
            return true; // Always accept improvement (greedy-as-f*ck)
        }
        double energyDelta = candidateCost - currentCost;
        double probability = Math.exp(-energyDelta / temperature);
        return Math.random() < probability;
    }

    public static void main(String[] args) {
        double temp = 100.0;
        double curr = 50.0;
        double cand = 55.0;
        System.out.println("Accept worse? " + accept(curr, cand, temp));
        temp = 10.0;
        System.out.println("Accept worse at cold? " + accept(curr, cand, temp));
    }
}
Output
Accept worse? true
Accept worse at cold? false
Engineering Reality:
The exponential is expensive. For thousands of iterations, pre-compute exp(-delta/T) in a lookup table keyed by discretized delta and temperature intervals. Your production loop will thank you.
Key Takeaway
Controlled randomness beats blind greed. Temperature governs how much bad news you tolerate.

Perturbation Functions: Tuning the Random Shove

You need a way to mutate a candidate solution. In SA, that's the perturbation function. This isn't a random number generator — it's a domain-specific operator that should produce valid, nearby solutions.

For continuous optimization, a common approach is Gaussian perturbation: new_x = current_x + N(0, sigma). The sigma parameter matters: too small and you inchworm through the space; too large and you leap into garbage territory. Adaptive schemes adjust sigma based on acceptance rate. If you're accepting everything, increase sigma; if accepting nothing, shrink it.

For combinatorial problems like TSP, use swaps, reverses, or insertions on the tour. The perturbation must preserve solution legality. Don't generate an invalid tour and then burn cycles repairing it — that's a different kind of hell. Measure the average delta-cost per perturbation type. Pick the one that yields the best trade-off between exploration and exploitation.

Production trap: Never hard-code perturbation magnitude. Parameterize it and tune via a grid search or Bayesian optimization. Yes, you now have a meta-optimizer. Welcome to senior engineering.

ContinuousPerturbation.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
// io.thecodeforge — dsa tutorial

import java.util.Random;

public class ContinuousPerturbation {
    private static final Random rng = new Random();
    private final double sigma; // Scale of perturbation, tune this

    public ContinuousPerturbation(double sigma) {
        this.sigma = sigma;
    }

    public double[] perturb(double[] current) {
        double[] next = current.clone();
        int idx = rng.nextInt(current.length);
        // Gaussian noise centered at 0
        next[idx] += sigma * rng.nextGaussian();
        return next;
    }

    public static void main(String[] args) {
        ContinuousPerturbation p = new ContinuousPerturbation(1.0);
        double[] state = {10.0, 20.0, 30.0};
        double[] nextState = p.perturb(state);
        System.out.println("Before: [" + state[0] + ", " + state[1] + ", " + state[2] + "]");
        System.out.println("After: [" + nextState[0] + ", " + nextState[1] + ", " + nextState[2] + "]");
    }
}
Output
Before: [10.0, 20.0, 30.0]
After: [10.0, 22.449, 30.0]
Senior Shortcut:
Start sigma at ~10% of your variable range. If acceptance rate drops below 20%, reduce sigma by half. If above 60%, double it. Simple, empirically validated, no PhD required.
Key Takeaway
Your perturbation defines the neighborhood. Bad perturbation = bad SA. Tune it like a control system.
● Production incidentPOST-MORTEMseverity: high

VLSI Chip Layout: The Unrouted Net Debacle

Symptom
The optimised placement passed routability checks in simulation but caused timing violations post-fabrication. Wire-length was 15% above the target budget.
Assumption
The default geometric cooling rate (α=0.99) and 200 iterations per temperature were 'safe'. More iterations would always improve quality.
Root cause
The cooling rate was too aggressive early on: it froze the search before exploring the long-range wire reroutes that save the most area. The algorithm settled in a local optimum near the initial placement.
Fix
Switched to a hybrid schedule: logarithmic cooling for the first 40% of runtime (T = T₀ / log(1+t)), then geometric cooling with α=0.998 for the remaining 60%. This allowed enough exploration of global rearrangements before fine-tuning.
Key lesson
  • Cooling schedule is the single most impactful parameter — more than iterations per temperature
  • Logarithmic cooling guarantees convergence to global optimum but is impractically slow; hybrid schedules are the production standard
  • Always verify with random restarts: if the final cost varies by more than 5% across runs, your schedule is too fast
Production debug guideDiagnose why SA is stuck or too slow4 entries
Symptom · 01
Final cost varies wildly (>20%) between runs
Fix
Increase iterations per temperature (e.g., from 100 to 500). Or switch to adaptive cooling that targets 50% acceptance rate.
Symptom · 02
Cost drops fast, then flatlines early
Fix
Your cooling is too aggressive — initial acceptance rate dropped below 20% too quickly. Decrease cooling rate (α closer to 0.999) or raise T₀.
Symptom · 03
Cost keeps dropping even at low temperature
Fix
Great — but if runtime is too long, try exponential cooling (T *= 0.95) for the last 10% to converge faster.
Symptom · 04
SA finishes but result is worse than a simple greedy run
Fix
Initial temperature too low — algorithm never explored. Re-estimate T₀ using acceptance rate calibation: sample random moves, set T₀ = -Δavg / ln(0.8).
★ SA Tuning in 2 CommandsQuick checks when simulated annealing isn't converging
Acceptance rate near 100% after 20% of cooling
Immediate action
Increase cooling rate (α) — you're wasting compute. Set α=0.95 instead of 0.995.
Commands
print(f'Acceptance rate at T={T:.3f}: {accepted/total:.2%}')
Track per-temperature best cost: plot(T_history, cost_history)
Fix now
Reset T₀ to make initial acceptance ~80%, then set α=0.99.
Cost doesn't improve after 5 temperature steps+
Immediate action
Check neighbour function — is step size too small or too large?
Commands
Compute avg |Δcost| over 1000 random neighbours
If avg |Δcost| < 1% of current cost, double step size
Fix now
For real-valued problems: step = uniform(-scale, scale), scale = 0.1 * domain width
Runtime too long (>hours) for no improvement+
Immediate action
Reduce iterations_per_temp from 200 to 50. Use exponential cooling (T*=α) with α=0.9.
Commands
Monte Carlo test: run SA for 5 minutes and plot cost vs time
If cost dropped <10% in 5 min, your problem is too large for pure SA — consider population-based methods
Fix now
Use a multi-start strategy: run 10 short SA runs (500 iters each) and pick best
Simulated Annealing vs Other Metaheuristics
CriterionSimulated AnnealingHill ClimbingGenetic AlgorithmsTabu Search
Exploration capabilityHigh (temperature controls it)None (greedy)High (population diversity)Medium (short-term memory)
Escape local optimaYes (probabilistic acceptance)NoYes (crossover + mutation)Yes (tabu list forbids revisited states)
Parameter tuningCooling schedule, T_startNonePopulation size, mutation rate, crossover rateTabu tenure, aspiration criteria
Solution quality (typical)Near-optimalLocal optimumNear-optimal (with large pop)Near-optimal
Computational costMedium (single solution)LowHigh (multiple solutions)Medium (single solution + memory)
Memory requiredO(1)O(1)O(population size)O(tabu list size)
DeterminismNo (probabilistic)Yes (deterministic)No (random crossover)No (random tabu choice)
Best forMultimodal landscapes, expensive evaluationsConvex or near-convex problemsBlack-box optimization, multi-objectiveLarge, complex combinatorial problems

Key takeaways

1
SA accepts worse solutions with probability exp(-Δcost/T). High T
explore freely. Low T: only accept improvements.
2
Cooling schedule is the critical parameter
too fast: trapped in local optima. Too slow: computationally expensive.
3
Advantage over hill climbing
escapes local optima. Advantage over GA: simpler implementation, single solution.
4
Guarantee
with logarithmic cooling, converges to global optimum — but impractically slow in practice.
5
Applications
VLSI placement, TSP, protein folding, circuit routing, timetabling.

Common mistakes to avoid

5 patterns
×

Starting temperature too low

Symptom
SA never explores — final result is worse than hill climbing. Acceptance rate from start is < 20%.
Fix
Calibrate T0: sample 100 random moves, compute average |Δcost|, set T0 = -avg|Δcost| / ln(0.8). Or just start with T0 = 10x the typical cost range.
×

Forgetting to track the best solution seen

Symptom
The final solution returned is worse than some intermediate state. Your SA 'wandered away' from a good solution.
Fix
Add a 'best' variable and update it whenever cost improves. Return best, not current. This is the #1 bug in SA implementations.
×

Using overly large neighbour steps

Symptom
The algorithm behaves like random search — acceptance rate is near 100% even at medium temperatures. No structured improvement.
Fix
For combinatorial problems, change only 1-2 elements per step. For continuous problems, use step size = 0.01 to 0.1 * range of the variable.
×

Not running multiple restarts to assess variability

Symptom
Team reports a single run's result as 'the' answer. Later, a different run with the same settings finds a 25% better solution.
Fix
Always run at least 10 restarts with different random seeds. Report median and IQR. If IQR > 10% of median, your tuning is premature.
×

Using logarithmic cooling for a production problem

Symptom
SA takes days to converge; weekly production cycles can't afford it. Schedules are buggy due to floating-point precision for small temperature increments.
Fix
Use geometric cooling with α between 0.95 and 0.999. If you need theoretical purity, use adaptive cooling as a pragmatic alternative.
INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01SENIOR
Why does simulated annealing escape local optima while hill climbing can...
Q02SENIOR
What is the Boltzmann acceptance criterion and what does the temperature...
Q03SENIOR
How does the cooling rate affect solution quality vs computation time?
Q04SENIOR
How would you implement a temperature schedule that adapts to the optimi...
Q01 of 04SENIOR

Why does simulated annealing escape local optima while hill climbing cannot?

ANSWER
Hill climbing only accepts moves that reduce the cost function. That's greed — you descend into the first local minimum and stop. SA accepts worse moves with probability exp(-ΔE/T). At high temperature, this probability is close to 1, so the algorithm explores freely. As temperature drops, it becomes less likely to accept bad moves, eventually behaving like hill climbing. The key is that early exploration lets it jump out of local minima that would trap a greedy search.
FAQ · 4 QUESTIONS

Frequently Asked Questions

01
How do you choose the initial temperature?
02
What is the difference between SA and a genetic algorithm?
03
How many iterations per temperature should I use?
04
Can SA be parallelised?
N
Naren Founder & Principal Engineer

20+ years shipping performance-critical code where algorithms decide the bill. Drawn from code that ran under real load.

Follow
Verified
production tested
May 23, 2026
last updated
1,596
articles · all by Naren
🔥

That's Genetic Algorithms. Mark it forged?

6 min read · try the examples if you haven't

Previous
Genetic Algorithms — Evolution-Based Optimisation
2 / 2 · Genetic Algorithms
Next
Quantum Computing Fundamentals for Developers