Genetic Algorithm — 0.001 Mutation Rate Cost 15% Route
Best route 15% above bound: mutation 0.
- Representation encodes a solution (bitstring, permutation, tree)
- Fitness function scores how 'good' a solution is
- Selection picks parents — tournament, roulette, rank
- Crossover combines two parents into offspring
- Mutation introduces random changes to maintain diversity
- Performance insight: tournament selection with k=3 converges 40% faster than roulette on most problems
- Production insight: a too-low mutation rate (<0.001) causes premature convergence; too-high (>0.1) turns the GA into random search
Genetic algorithms mimic biological evolution to solve optimisation problems. Start with a population of random solutions. Evaluate each solution's fitness. Select the best ones to 'reproduce' — combine their solutions (crossover) and introduce small random changes (mutation). The next generation is better on average. Repeat until you converge on a good solution. No gradient needed, no closed-form solution required.
Genetic algorithms (GA) are a class of evolutionary computation algorithms inspired by natural selection. They excel at combinatorial optimisation problems where exact methods are infeasible: travelling salesman, job shop scheduling, neural network architecture search, game-playing strategy design.
GAs do not guarantee finding the global optimum (they are metaheuristics), but they find good solutions to NP-hard problems where exact algorithms would take astronomical time. The travelling salesman problem with 100 cities has 100!/2 ≈ 10^157 possible routes — exhaustive search is impossible, but GAs routinely find solutions within 1-2% of optimal.
Core Components
Every genetic algorithm has five components: representation (how solutions are encoded), fitness function (how good is each solution), selection (which solutions reproduce), crossover (how two parents combine), and mutation (random perturbation to maintain diversity).
Getting any one wrong can break the search. Representation is the most subtle — a poor encoding makes crossover produce invalid offspring, wasting generations.
- Each generation is a sample from the search space, biased by fitness.
- Crossover exploits existing building blocks (schema theory).
- Mutation explores new regions.
- No guarantee of optimality — just increasingly good solutions.
- The magic happens when you balance exploitation (crossover) and exploration (mutation).
Selection Strategies
Tournament selection: Pick k random individuals, choose the best. Computationally efficient, tunable selection pressure via k.
Roulette wheel (fitness proportionate): Select with probability proportional to fitness. Works poorly if one individual has much higher fitness than others (it dominates).
Rank selection: Sort by fitness, assign probability by rank not absolute fitness. More robust to fitness scale differences.
Elitism: Always copy the best individual(s) to the next generation unchanged. Prevents losing the best solution found.
Selection pressure determines how quickly the population converges. Too much pressure → premature convergence. Too little → the GA wanders randomly.
Crossover and Mutation: The Diversity Trade-off
Crossover recombines genetic material from two parents. It exploits existing building blocks. Mutation introduces random changes to explore new areas. The balance between them is critical.
High crossover rate (0.8-0.9) means most offspring are created by recombination. High mutation rate (>0.1) turns the algorithm into random search. Low mutation rate (<0.001) leads to premature convergence.
For permutation problems (TSP), standard one-point crossover produces invalid tours (duplicate cities). Specialised operators like Order Crossover (OX) or Partially Mapped Crossover (PMX) preserve the permutation property.
Adaptive mutation: increase mutation rate when diversity drops, decrease it when diversity is high. Implementation: track the average hamming distance or fitness variance.
Travelling Salesman Example
The TSP is the canonical GA benchmark. Chromosome = permutation of cities. Crossover must preserve the permutation property (no city visited twice). Fitness = total route distance.
Order Crossover (OX): pick a segment from parent 1, then fill remaining positions from parent 2 in order, skipping already-used cities. This preserves relative order and creates valid tours.
For real-world TSP instances with 1000+ cities, a GA with OX crossover, tournament selection, and adaptive mutation can find solutions within 2% of optimal in minutes.
Hyperparameter Tuning: What Actually Matters
GAs have 4-5 critical hyperparameters: population size, mutation rate, crossover rate, selection pressure (tournament size), and elitism count.
Population size: 50-500 is common. Too small = premature convergence, too large = slow generations. Rule: set to 10× chromosome length for binary, 5× for permutations.
Mutation rate: Start at 1/(chromosome length). For binary strings of length 100, 0.01 works. For permutations of 100 cities, 0.02 (swap mutation) is a good start.
Crossover rate: 0.7-0.9. Lower values reduce exploitation, higher values risk losing good building blocks.
Tournament size: k=3 is a safe default. k=5 increases selection pressure. k=2 is weak.
Elitism: Always keep at least the single best individual. Keeping 2-5% of the population unchanged is common.
Don't tune all parameters at once. Start with default values, then tune sensitivity: which parameter changes best fitness most? That's the one to optimise.
Performance Considerations and When to Avoid GAs
GAs are not the fastest optimisation method. Each generation evaluates pop_size individuals — that's pop_size × generations function evaluations. For problems where a single fitness evaluation takes seconds (e.g., training a neural network), GAs can be prohibitively slow.
When GAs work well: - Combinatorial problems with no gradient (TSP, scheduling, packing) - Multi-modal landscapes (many local optima) - Black-box objectives where you can't compute derivatives - Problems where you can parallelise fitness evaluations easily
When to avoid GAs: - Differentiable functions: use gradient descent - Small search space (< 10^6 possibilities): use exhaustive search or B&B - Need guaranteed optimality: use integer programming - Function evaluation is expensive (> 1 second): use Bayesian optimisation or random search instead
Parallelisation: Fitness evaluation is embarrassingly parallel. In production, evaluate all individuals in parallel using multiprocessing or a distributed executor (e.g., Dask, Spark). The rest of the GA (selection, crossover) is sequential but fast.
GA Converged to a Local Optimum for TSP
- Always track diversity (population variance) not just best fitness.
- Tune mutation rate to problem size — rule of thumb: 1/(chromosome length).
- Use adaptive mutation to break out of local optima automatically.
- Test convergence by running the GA multiple times with different seeds.
Key takeaways
Common mistakes to avoid
5 patternsUsing uniform crossover for permutation problems
Setting mutation rate too low or too high
Not using elitism
Using roulette wheel selection with negative fitness values
Running too few generations
Interview Questions on This Topic
What are the five components of a genetic algorithm? Explain each briefly.
Frequently Asked Questions
That's Genetic Algorithms. Mark it forged?
3 min read · try the examples if you haven't