Genetic Algorithm — 0.001 Mutation Rate Cost 15% Route
Best route 15% above bound: mutation 0.001, pop 50 caused GA plateau for 200 gens.
20+ years shipping performance-critical code where algorithms decide the bill. Written from production experience, not tutorials.
- Representation encodes a solution (bitstring, permutation, tree)
- Fitness function scores how 'good' a solution is
- Selection picks parents — tournament, roulette, rank
- Crossover combines two parents into offspring
- Mutation introduces random changes to maintain diversity
- Performance insight: tournament selection with k=3 converges 40% faster than roulette on most problems
- Production insight: a too-low mutation rate (<0.001) causes premature convergence; too-high (>0.1) turns the GA into random search
Genetic algorithms mimic biological evolution to solve optimisation problems. Start with a population of random solutions. Evaluate each solution's fitness. Select the best ones to 'reproduce' — combine their solutions (crossover) and introduce small random changes (mutation). The next generation is better on average. Repeat until you converge on a good solution. No gradient needed, no closed-form solution required.
Genetic algorithms (GA) are a class of evolutionary computation algorithms inspired by natural selection. They excel at combinatorial optimisation problems where exact methods are infeasible: travelling salesman, job shop scheduling, neural network architecture search, game-playing strategy design.
GAs do not guarantee finding the global optimum (they are metaheuristics), but they find good solutions to NP-hard problems where exact algorithms would take astronomical time. The travelling salesman problem with 100 cities has 100!/2 ≈ 10^157 possible routes — exhaustive search is impossible, but GAs routinely find solutions within 1-2% of optimal.
Why Genetic Algorithms Work Despite Randomness
A genetic algorithm (GA) is a search heuristic that mimics natural selection: a population of candidate solutions evolves over generations through selection, crossover, and mutation. The core mechanic is that fitter individuals have higher probability of reproducing, gradually improving the population's average fitness. Unlike gradient-based methods, GAs require no derivative information — they explore the solution space through stochastic variation and selection pressure.
In practice, a GA maintains a population of N individuals (typically 50–500), each encoded as a chromosome (e.g., a bit string or permutation). Selection picks parents proportional to fitness (roulette wheel or tournament). Crossover recombines two parents to produce offspring, and mutation randomly flips a small fraction of genes — typically 0.1%–1% per generation. The algorithm converges when the population's diversity collapses or a fitness plateau is reached. Key hyperparameters: population size, crossover rate (0.7–0.9), mutation rate (0.001–0.01), and selection pressure.
Use GAs when the search space is large, non-convex, or discrete — e.g., TSP, scheduling, hyperparameter tuning, or circuit design. They shine where gradient information is unavailable or expensive. The trade-off: GAs are sample-inefficient (require many fitness evaluations) and offer no optimality guarantees. In production, they are often hybridized with local search (memetic algorithms) to accelerate convergence.
Core Components
Every genetic algorithm has five components: representation (how solutions are encoded), fitness function (how good is each solution), selection (which solutions reproduce), crossover (how two parents combine), and mutation (random perturbation to maintain diversity).
Getting any one wrong can break the search. Representation is the most subtle — a poor encoding makes crossover produce invalid offspring, wasting generations.
- Each generation is a sample from the search space, biased by fitness.
- Crossover exploits existing building blocks (schema theory).
- Mutation explores new regions.
- No guarantee of optimality — just increasingly good solutions.
- The magic happens when you balance exploitation (crossover) and exploration (mutation).
Selection Strategies
Tournament selection: Pick k random individuals, choose the best. Computationally efficient, tunable selection pressure via k.
Roulette wheel (fitness proportionate): Select with probability proportional to fitness. Works poorly if one individual has much higher fitness than others (it dominates).
Rank selection: Sort by fitness, assign probability by rank not absolute fitness. More robust to fitness scale differences.
Elitism: Always copy the best individual(s) to the next generation unchanged. Prevents losing the best solution found.
Selection pressure determines how quickly the population converges. Too much pressure → premature convergence. Too little → the GA wanders randomly.
Crossover and Mutation: The Diversity Trade-off
Crossover recombines genetic material from two parents. It exploits existing building blocks. Mutation introduces random changes to explore new areas. The balance between them is critical.
High crossover rate (0.8-0.9) means most offspring are created by recombination. High mutation rate (>0.1) turns the algorithm into random search. Low mutation rate (<0.001) leads to premature convergence.
For permutation problems (TSP), standard one-point crossover produces invalid tours (duplicate cities). Specialised operators like Order Crossover (OX) or Partially Mapped Crossover (PMX) preserve the permutation property.
Adaptive mutation: increase mutation rate when diversity drops, decrease it when diversity is high. Implementation: track the average hamming distance or fitness variance.
Travelling Salesman Example
The TSP is the canonical GA benchmark. Chromosome = permutation of cities. Crossover must preserve the permutation property (no city visited twice). Fitness = total route distance.
Order Crossover (OX): pick a segment from parent 1, then fill remaining positions from parent 2 in order, skipping already-used cities. This preserves relative order and creates valid tours.
For real-world TSP instances with 1000+ cities, a GA with OX crossover, tournament selection, and adaptive mutation can find solutions within 2% of optimal in minutes.
Hyperparameter Tuning: What Actually Matters
GAs have 4-5 critical hyperparameters: population size, mutation rate, crossover rate, selection pressure (tournament size), and elitism count.
Population size: 50-500 is common. Too small = premature convergence, too large = slow generations. Rule: set to 10× chromosome length for binary, 5× for permutations.
Mutation rate: Start at 1/(chromosome length). For binary strings of length 100, 0.01 works. For permutations of 100 cities, 0.02 (swap mutation) is a good start.
Crossover rate: 0.7-0.9. Lower values reduce exploitation, higher values risk losing good building blocks.
Tournament size: k=3 is a safe default. k=5 increases selection pressure. k=2 is weak.
Elitism: Always keep at least the single best individual. Keeping 2-5% of the population unchanged is common.
Don't tune all parameters at once. Start with default values, then tune sensitivity: which parameter changes best fitness most? That's the one to optimise.
Performance Considerations and When to Avoid GAs
GAs are not the fastest optimisation method. Each generation evaluates pop_size individuals — that's pop_size × generations function evaluations. For problems where a single fitness evaluation takes seconds (e.g., training a neural network), GAs can be prohibitively slow.
When GAs work well: - Combinatorial problems with no gradient (TSP, scheduling, packing) - Multi-modal landscapes (many local optima) - Black-box objectives where you can't compute derivatives - Problems where you can parallelise fitness evaluations easily
When to avoid GAs: - Differentiable functions: use gradient descent - Small search space (< 10^6 possibilities): use exhaustive search or B&B - Need guaranteed optimality: use integer programming - Function evaluation is expensive (> 1 second): use Bayesian optimisation or random search instead
Parallelisation: Fitness evaluation is embarrassingly parallel. In production, evaluate all individuals in parallel using multiprocessing or a distributed executor (e.g., Dask, Spark). The rest of the GA (selection, crossover) is sequential but fast.
Fitness Functions: The Objective That Makes or Breaks Your GA
The fitness function is the oracle your GA worships. It's the single function that scores every chromosome — and if it's wrong, your entire evolution optimises for garbage. Real-world GAs don't fail because crossover is buggy. They fail because the fitness function rewards the wrong thing.
You're not maximising fitness. You're minimising the gap between your function and reality. Every selection, crossover, and mutation step is just a search guided by this function. If your fitness landscape has plateaus — regions where every candidate scores identically — selection becomes random walks. If it has sharp spikes, you converge on local optima and never escape.
Practical rule: your fitness function should be differentiable in spirit if not in code. Small changes in gene values should produce small, meaningful changes in fitness. Flat regions kill gradients. Steep cliffs blind the search. Design your scoring to reward partial progress, not just perfection.
For combinatorial problems like TSP, that means penalising constraint violations with a severity that scales — not a binary dead-or-alive flag. For continuous optimisation, normalise gene contributions so no single variable dominates the score.
Termination Criteria: How to Stop Before You Burn CPU Cycles on Nothing
A GA that runs forever isn't an algorithm — it's an infinite loop with delusions of progress. You need hard stop conditions that balance solution quality against compute budget. The three real-world triggers: stagnation threshold, budget exhaustion, and early exit on good enough.
Stagnation matters most. Track the best fitness over the last N generations. If it doesn't improve by at least epsilon (say 0.1%) for 50 generations, the population has converged. More mutation won't save it — the search space is explored. Kill it gracefully.
Budget exhaustion is your safety net. Set a hard cap on generations or wall-clock time. In production, I've seen teams let GAs run for 12 hours chasing a 0.01% gain. Stop at generation 5000 or 10 minutes, whichever hits first. Log the convergence curve so you can tune next time.
Good enough: if your fitness hits a threshold that solves the business problem — like delivery routes under a cost target — stop immediately. Don't optimise for perfection when 95% solves the Monday morning logistics nightmare.
Combine all three: stop on stagnation OR max generations OR target fitness reached. And always serialise the best solution before terminating.
Encoding Methods: How You Squeeze Your Problem Into Chromosomes
Encoding is the translation layer between your problem domain and the GA's chromosome string. Pick wrong, and your crossover operator will spawn dead babies — invalid solutions that fail before selection gets a chance. Three patterns dominate production code: binary, permutation, and real-valued.
Binary encoding is the classic — each gene is a bit. Works for feature selection (gene i = include feature i?), combinatorial knapsack, or any yes/no decision. Crossover is trivial: cut and swap bit strings. Mutation flips bits. But for continuous problems, binary encoding wastes precision — you need many bits per variable, and Hamming distance doesn't map cleanly to solution distance.
Permutation encoding is your go-to for ordering problems like TSP, scheduling, or routing. Each chromosome is a permutation of city or job indices. Standard crossover will break the permutation constraint (duplicates, missing elements). Use PMX (Partially Mapped Crossover) or ordered crossover. Mutation swaps two positions, never flips bits.
Real-valued encoding is for continuous optimisation — tune 20 parameters of a PID controller? Each gene is a double. Crossover does arithmetic blending (BLX-alpha), not cut-and-swap. Mutation adds Gaussian noise. No invalid solutions possible, but you lose the crispness of binary search. Always clamp genes to domain bounds after mutation.
Never default to binary "because it's simple." The encoding determines what "close to optimal" looks like in genospace. Align it with your problem's natural structure.
Evaluating the Results: Metrics That Separate Real Progress From Noise
Most GA tutorials stop at showing you the best fitness value. In production, that tells you almost nothing. The question isn't "did the fitness go up" but "did the population actually converge on a valid solution or just memorize a local optimum?". You need three metrics: average population fitness (tells you if the whole gene pool is improving), diversity index (proportion of unique chromosomes), and stagnation count (generations since best fitness changed). Track these per generation. A rising average with dropping diversity means premature convergence — your crossover is too aggressive. A flat average with high diversity means mutation is drowning out selection. Plot all three. If best fitness plateaus while average keeps climbing, your selection pressure is too weak. If both plateau but diversity is still high, your encoding can't represent the true optimum — that's a fatal encoding problem, not a tuning issue.
Making It All Run: Async Evaluation and Distributed Workers
GAs are embarrassingly parallel: each fitness evaluation is independent. But most tutorials show synchronous loops that evaluate one chromosome at a time. That wastes GPU clusters and multicore machines. The fix: pool your workers (threads or remote servers) and batch-send chromosomes as soon as they're generated. Use a completion callback per chromosome, not a global barrier. Your main loop should do three things: spawn evaluation tasks, collect results, and generate next population — all overlapping. In Java, use a ForkJoinPool with RecursiveAction per evaluation. Set pool size to number of cores minus one. For distributed workers, use a message queue (RabbitMQ, Redis) with backpressure: if queue size exceeds 10x population size, pause generation. Never block the main thread waiting for all evaluations — that sequentializes your GA. The generation clock starts when the last slow worker finishes; optimize your slowest chromosome, not the average.
GA Converged to a Local Optimum for TSP
- Always track diversity (population variance) not just best fitness.
- Tune mutation rate to problem size — rule of thumb: 1/(chromosome length).
- Use adaptive mutation to break out of local optima automatically.
- Test convergence by running the GA multiple times with different seeds.
print(f'Gen {gen}: best={best_fitness:.4f}, avg={avg_fitness:.4f}, var={variance:.4f}')mut_rate = 0.02 if variance < 0.01 else 0.001Key takeaways
Common mistakes to avoid
5 patternsUsing uniform crossover for permutation problems
Setting mutation rate too low or too high
Not using elitism
Using roulette wheel selection with negative fitness values
Running too few generations
Practice These on LeetCode
Interview Questions on This Topic
What are the five components of a genetic algorithm? Explain each briefly.
Frequently Asked Questions
20+ years shipping performance-critical code where algorithms decide the bill. Written from production experience, not tutorials.
That's Genetic Algorithms. Mark it forged?
9 min read · try the examples if you haven't