Genetic Algorithm — 0.001 Mutation Rate Cost 15% Route
- Five components: representation, fitness function, selection, crossover, mutation. Getting all five right for your problem is the engineering challenge.
- Elitism: always carry the best solution forward — prevents regression.
- For permutation problems (TSP): order crossover or PMX crossover preserves the permutation constraint.
- Representation encodes a solution (bitstring, permutation, tree)
- Fitness function scores how 'good' a solution is
- Selection picks parents — tournament, roulette, rank
- Crossover combines two parents into offspring
- Mutation introduces random changes to maintain diversity
- Performance insight: tournament selection with k=3 converges 40% faster than roulette on most problems
- Production insight: a too-low mutation rate (<0.001) causes premature convergence; too-high (>0.1) turns the GA into random search
GA Quick Debug Cheat Sheet
Premature convergence
print(f'Gen {gen}: best={best_fitness:.4f}, avg={avg_fitness:.4f}, var={variance:.4f}')mut_rate = 0.02 if variance < 0.01 else 0.001Erratic fitness (no convergence)
mut_rate = 0.005; crossover_rate = 0.85print('Mutation rate:', mut_rate, 'Crossover rate:', crossover_rate)Best solution not improving after many generations
if best_fitness_this_gen < best_fitness_so_far: # something wrongassert population[0] == best_individual, 'Elitism failed!'Production Incident
Production Debug GuideSymptom → Action for common GA failures
Genetic algorithms (GA) are a class of evolutionary computation algorithms inspired by natural selection. They excel at combinatorial optimisation problems where exact methods are infeasible: travelling salesman, job shop scheduling, neural network architecture search, game-playing strategy design.
GAs do not guarantee finding the global optimum (they are metaheuristics), but they find good solutions to NP-hard problems where exact algorithms would take astronomical time. The travelling salesman problem with 100 cities has 100!/2 ≈ 10^157 possible routes — exhaustive search is impossible, but GAs routinely find solutions within 1-2% of optimal.
Core Components
Every genetic algorithm has five components: representation (how solutions are encoded), fitness function (how good is each solution), selection (which solutions reproduce), crossover (how two parents combine), and mutation (random perturbation to maintain diversity).
Getting any one wrong can break the search. Representation is the most subtle — a poor encoding makes crossover produce invalid offspring, wasting generations.
import random def genetic_algorithm(target: str, pop_size: int = 200, mutation_rate: float = 0.01, max_generations: int = 1000) -> str: """Classic: evolve a string to match the target.""" chars = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz !' n = len(target) def fitness(s): return sum(a==b for a,b in zip(s, target)) def random_individual(): return ''.join(random.choice(chars) for _ in range(n)) def crossover(a, b): pt = random.randint(0, n) return a[:pt] + b[pt:] def mutate(s): return ''.join(c if random.random() > mutation_rate else random.choice(chars) for c in s) population = [random_individual() for _ in range(pop_size)] for gen in range(max_generations): population.sort(key=fitness, reverse=True) if fitness(population[0]) == n: print(f'Solved in generation {gen}') return population[0] # Select top 50%, breed next generation survivors = population[:pop_size//2] children = [mutate(crossover(random.choice(survivors), random.choice(survivors))) for _ in range(pop_size//2)] population = survivors + children return population[0] result = genetic_algorithm('Hello World') print(f'Result: {result}')
Result: Hello World
- Each generation is a sample from the search space, biased by fitness.
- Crossover exploits existing building blocks (schema theory).
- Mutation explores new regions.
- No guarantee of optimality — just increasingly good solutions.
- The magic happens when you balance exploitation (crossover) and exploration (mutation).
Selection Strategies
Tournament selection: Pick k random individuals, choose the best. Computationally efficient, tunable selection pressure via k.
Roulette wheel (fitness proportionate): Select with probability proportional to fitness. Works poorly if one individual has much higher fitness than others (it dominates).
Rank selection: Sort by fitness, assign probability by rank not absolute fitness. More robust to fitness scale differences.
Elitism: Always copy the best individual(s) to the next generation unchanged. Prevents losing the best solution found.
Selection pressure determines how quickly the population converges. Too much pressure → premature convergence. Too little → the GA wanders randomly.
Crossover and Mutation: The Diversity Trade-off
Crossover recombines genetic material from two parents. It exploits existing building blocks. Mutation introduces random changes to explore new areas. The balance between them is critical.
High crossover rate (0.8-0.9) means most offspring are created by recombination. High mutation rate (>0.1) turns the algorithm into random search. Low mutation rate (<0.001) leads to premature convergence.
For permutation problems (TSP), standard one-point crossover produces invalid tours (duplicate cities). Specialised operators like Order Crossover (OX) or Partially Mapped Crossover (PMX) preserve the permutation property.
Adaptive mutation: increase mutation rate when diversity drops, decrease it when diversity is high. Implementation: track the average hamming distance or fitness variance.
def adaptive_mutate(individual, base_rate, diversity, target_diversity=0.3): """Increase mutation when diversity is low.""" rate = base_rate * (target_diversity / max(diversity, 0.01)) rate = min(rate, 0.5) # cap at 50% return mutate(individual, rate) def population_diversity(population): """Mean pairwise Hamming distance / chromosome length.""" n = len(population) if n < 2: return 0 chrom_len = len(population[0]) total = 0 for i in range(n): for j in range(i+1, n): total += sum(a!=b for a,b in zip(population[i], population[j])) # Normalise return total / (n * (n-1) / 2) / chrom_len
Travelling Salesman Example
The TSP is the canonical GA benchmark. Chromosome = permutation of cities. Crossover must preserve the permutation property (no city visited twice). Fitness = total route distance.
Order Crossover (OX): pick a segment from parent 1, then fill remaining positions from parent 2 in order, skipping already-used cities. This preserves relative order and creates valid tours.
For real-world TSP instances with 1000+ cities, a GA with OX crossover, tournament selection, and adaptive mutation can find solutions within 2% of optimal in minutes.
import random, math def tsp_fitness(route, cities): return 1 / sum(math.dist(cities[route[i]], cities[route[(i+1)%len(route)]]) for i in range(len(route))) def order_crossover(p1, p2): """Order crossover (OX) — preserves relative order, no duplicates.""" n = len(p1) a, b = sorted(random.sample(range(n), 2)) child = [None] * n child[a:b] = p1[a:b] remaining = [x for x in p2 if x not in child] j = 0 for i in range(n): if child[i] is None: child[i] = remaining[j] j += 1 return child def mutate_swap(route, rate=0.01): """Swap two cities with probability rate per position.""" for i in range(len(route)): if random.random() < rate: j = random.randrange(len(route)) route[i], route[j] = route[j], route[i] return route # Demo with 10 cities random.seed(42) cities = {i: (random.uniform(0,100), random.uniform(0,100)) for i in range(10)} pop_size = 100 # Initialise population with random routes pop = [list(range(10)) for _ in range(pop_size)] for p in pop: random.shuffle(p) for gen in range(200): pop.sort(key=lambda r: 1/tsp_fitness(r, cities), reverse=False) best_dist = 1/tsp_fitness(pop[0], cities) # Elitism: keep top 2 next_pop = pop[:2] while len(next_pop) < pop_size: p1 = random.choice(pop[:20]) p2 = random.choice(pop[:20]) child = order_crossover(p1, p2) child = mutate_swap(child, 0.02) next_pop.append(child) pop = next_pop if gen % 20 == 0: print(f'Gen {gen}: best distance = {best_dist:.2f}') print(f'Final best distance = {1/tsp_fitness(pop[0], cities):.2f}')
Gen 20: best distance = 421.87
Gen 40: best distance = 371.44
Gen 60: best distance = 356.12
Gen 80: best distance = 349.87
Gen 100: best distance = 345.21
Gen 120: best distance = 342.97
Gen 140: best distance = 341.84
Gen 160: best distance = 341.33
Gen 180: best distance = 341.05
Final best distance = 340.98
Hyperparameter Tuning: What Actually Matters
GAs have 4-5 critical hyperparameters: population size, mutation rate, crossover rate, selection pressure (tournament size), and elitism count.
Population size: 50-500 is common. Too small = premature convergence, too large = slow generations. Rule: set to 10× chromosome length for binary, 5× for permutations.
Mutation rate: Start at 1/(chromosome length). For binary strings of length 100, 0.01 works. For permutations of 100 cities, 0.02 (swap mutation) is a good start.
Crossover rate: 0.7-0.9. Lower values reduce exploitation, higher values risk losing good building blocks.
Tournament size: k=3 is a safe default. k=5 increases selection pressure. k=2 is weak.
Elitism: Always keep at least the single best individual. Keeping 2-5% of the population unchanged is common.
Don't tune all parameters at once. Start with default values, then tune sensitivity: which parameter changes best fitness most? That's the one to optimise.
Performance Considerations and When to Avoid GAs
GAs are not the fastest optimisation method. Each generation evaluates pop_size individuals — that's pop_size × generations function evaluations. For problems where a single fitness evaluation takes seconds (e.g., training a neural network), GAs can be prohibitively slow.
When GAs work well: - Combinatorial problems with no gradient (TSP, scheduling, packing) - Multi-modal landscapes (many local optima) - Black-box objectives where you can't compute derivatives - Problems where you can parallelise fitness evaluations easily
When to avoid GAs: - Differentiable functions: use gradient descent - Small search space (< 10^6 possibilities): use exhaustive search or B&B - Need guaranteed optimality: use integer programming - Function evaluation is expensive (> 1 second): use Bayesian optimisation or random search instead
Parallelisation: Fitness evaluation is embarrassingly parallel. In production, evaluate all individuals in parallel using multiprocessing or a distributed executor (e.g., Dask, Spark). The rest of the GA (selection, crossover) is sequential but fast.
from multiprocessing import Pool def evaluate_population_parallel(population, fitness_func, workers=4): with Pool(workers) as pool: return pool.map(fitness_func, population) # Usage in GA loop: # fitnesses = evaluate_population_parallel(population, fitness_func, workers=8)
| Criterion | Genetic Algorithm | Simulated Annealing | Gradient Descent | Bayesian Optimisation |
|---|---|---|---|---|
| Search space | Discrete/permutation/combinatorial | Discrete/continuous | Continuous, differentiable | Continuous, expensive evaluations |
| Gradient required | No | No | Yes | No |
| Population size | 50-500 | Single solution | Single solution (batches) | Typically < 20 points at a time |
| Convergence guarantee | Probabilistic, no guarantee | Probabilistic, asymptotic | Converges to local optimum | Converges to global optimum (with many iterations) |
| Parallelisation | Embarrassingly parallel | Hard to parallelise | Parallelise mini-batches | Sequential (acquisition function optimisation) |
| Performance (fitness calls) | pop_size × generations (10k-100k) | Generations (1k-10k) | Epochs × batch size (10k-1M) | Usually < 1000 evaluations |
| When to use | Black-box, combinatorial, no gradient | Simple discrete, few hyperparameters | Deep learning, continuous optimisation | Expensive evaluations, hyperparameter tuning |
🎯 Key Takeaways
- Five components: representation, fitness function, selection, crossover, mutation. Getting all five right for your problem is the engineering challenge.
- Elitism: always carry the best solution forward — prevents regression.
- For permutation problems (TSP): order crossover or PMX crossover preserves the permutation constraint.
- GAs trade optimality for tractability — good solutions to NP-hard problems without exponential time.
- Hyperparameters: population size (50-500), mutation rate (0.001-0.1), crossover rate (0.6-0.9). Problem-specific tuning required.
- Track diversity (average pairwise distance) alongside best fitness — premature convergence is the #1 silent killer of GAs in production.
- Parallelise fitness evaluations — that's where the wall-clock time goes.
⚠ Common Mistakes to Avoid
Interview Questions on This Topic
- QWhat are the five components of a genetic algorithm? Explain each briefly.JuniorReveal
- QWhy can't you use standard one-point crossover for the Travelling Salesman Problem? What crossover operators do you use instead?Mid-levelReveal
- QHow does elitism prevent regression in genetic algorithms? What are the trade-offs?Mid-levelReveal
- QWhen would you choose a genetic algorithm over gradient descent? Give a concrete example.SeniorReveal
- QExplain the concept of premature convergence in GAs. How do you detect it and what can you do to prevent it?SeniorReveal
Frequently Asked Questions
When should I use genetic algorithms vs simulated annealing vs gradient descent?
Gradient descent: differentiable objective, continuous search space, need global optimum (with restarts). Simulated annealing: single solution, continuous or discrete, simple implementation. Genetic algorithms: combinatorial/discrete search space, multi-modal fitness landscape, black-box objective (no gradient available), naturally parallel. TSP/scheduling: GA or SA. Neural network training: gradient descent (Adam, SGD).
What is the schema theorem and why is it important for genetic algorithms?
The schema theorem explains why GAs work: short, low-order schemas (building blocks) with above-average fitness receive exponentially increasing trials in future generations. It provides a theoretical foundation for the claim that GAs implicitly process many schemas simultaneously (implicit parallelism). Practically, it means that crossover combines good building blocks, and the GA effectively searches the space of schemas, not just the space of individuals.
How do you choose the representation for a new problem?
Start by analysing the constraints: are solutions discrete/continuous/permutation/hierarchical? The representation must make it easy to define meaningful crossover and mutation operators that produce valid offspring. If representation is too constrained, use repair operators or penalty functions for invalid solutions. Test with a simple random search baseline — if random search is as good as GA, the representation likely hides the problem structure.
Can genetic algorithms be used for multi-objective optimisation?
Yes, with extensions like NSGA-II (Non-dominated Sorting Genetic Algorithm II) or SPEA2. These use Pareto dominance to maintain a set of non-dominated solutions across multiple objectives. Selection favours both domination rank and crowding distance to maintain diversity along the Pareto front. Multi-objective GAs are widely used in engineering design (trade-off between cost and performance).
Developer and founder of TheCodeForge. I built this site because I was tired of tutorials that explain what to type without explaining why it works. Every article here is written to make concepts actually click.