Senior 3 min · March 24, 2026

Genetic Algorithm — 0.001 Mutation Rate Cost 15% Route

Best route 15% above bound: mutation 0.

N
Naren · Founder
Plain-English first. Then code. Then the interview question.
About
 ● Production Incident 🔎 Debug Guide
Quick Answer
  • Representation encodes a solution (bitstring, permutation, tree)
  • Fitness function scores how 'good' a solution is
  • Selection picks parents — tournament, roulette, rank
  • Crossover combines two parents into offspring
  • Mutation introduces random changes to maintain diversity
  • Performance insight: tournament selection with k=3 converges 40% faster than roulette on most problems
  • Production insight: a too-low mutation rate (<0.001) causes premature convergence; too-high (>0.1) turns the GA into random search
Plain-English First

Genetic algorithms mimic biological evolution to solve optimisation problems. Start with a population of random solutions. Evaluate each solution's fitness. Select the best ones to 'reproduce' — combine their solutions (crossover) and introduce small random changes (mutation). The next generation is better on average. Repeat until you converge on a good solution. No gradient needed, no closed-form solution required.

Genetic algorithms (GA) are a class of evolutionary computation algorithms inspired by natural selection. They excel at combinatorial optimisation problems where exact methods are infeasible: travelling salesman, job shop scheduling, neural network architecture search, game-playing strategy design.

GAs do not guarantee finding the global optimum (they are metaheuristics), but they find good solutions to NP-hard problems where exact algorithms would take astronomical time. The travelling salesman problem with 100 cities has 100!/2 ≈ 10^157 possible routes — exhaustive search is impossible, but GAs routinely find solutions within 1-2% of optimal.

Core Components

Every genetic algorithm has five components: representation (how solutions are encoded), fitness function (how good is each solution), selection (which solutions reproduce), crossover (how two parents combine), and mutation (random perturbation to maintain diversity).

Getting any one wrong can break the search. Representation is the most subtle — a poor encoding makes crossover produce invalid offspring, wasting generations.

genetic_algorithm.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
import random

def genetic_algorithm(target: str, pop_size: int = 200,
                       mutation_rate: float = 0.01,
                       max_generations: int = 1000) -> str:
    """Classic: evolve a string to match the target."""
    chars = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz !'
    n = len(target)

    def fitness(s): return sum(a==b for a,b in zip(s, target))
    def random_individual(): return ''.join(random.choice(chars) for _ in range(n))
    def crossover(a, b):
        pt = random.randint(0, n)
        return a[:pt] + b[pt:]
    def mutate(s):
        return ''.join(c if random.random() > mutation_rate
                        else random.choice(chars) for c in s)

    population = [random_individual() for _ in range(pop_size)]
    for gen in range(max_generations):
        population.sort(key=fitness, reverse=True)
        if fitness(population[0]) == n:
            print(f'Solved in generation {gen}')
            return population[0]
        # Select top 50%, breed next generation
        survivors = population[:pop_size//2]
        children = [mutate(crossover(random.choice(survivors),
                                     random.choice(survivors)))
                    for _ in range(pop_size//2)]
        population = survivors + children
    return population[0]

result = genetic_algorithm('Hello World')
print(f'Result: {result}')
Output
Solved in generation 47
Result: Hello World
Think of GA as a guided random search
  • Each generation is a sample from the search space, biased by fitness.
  • Crossover exploits existing building blocks (schema theory).
  • Mutation explores new regions.
  • No guarantee of optimality — just increasingly good solutions.
  • The magic happens when you balance exploitation (crossover) and exploration (mutation).
Production Insight
If your population size is too small (e.g., < 50), the GA converges prematurely because there's not enough genetic diversity.
If your mutation rate is static, the GA may get stuck in a local optimum after the initial diversity is lost.
Rule: control diversity explicitly — track average hamming distance every generation.
Key Takeaway
The five components are not independent — they must be chosen together to match your problem's structure.
Representation determines the search space shape more than any other component.
Debug representation first: if crossover creates invalid solutions, nothing else matters.
Choosing representation
IfProblem is permutation-based (TSP, scheduling)
UseUse permutation encoding — ensure crossover preserves valid permutations (Order Crossover, PMX).
IfProblem has real-valued parameters
UseUse real-valued encoding with arithmetic crossover and Gaussian mutation (or differential evolution).
IfProblem is combinatorial with binary decisions
UseUse binary encoding — standard one-point crossover and bit-flip mutation works well.
IfProblem involves hierarchical structure (grammar, programs)
UseUse tree encoding (Genetic Programming) — crossover swaps subtrees, mutation changes nodes.

Selection Strategies

Tournament selection: Pick k random individuals, choose the best. Computationally efficient, tunable selection pressure via k.

Roulette wheel (fitness proportionate): Select with probability proportional to fitness. Works poorly if one individual has much higher fitness than others (it dominates).

Rank selection: Sort by fitness, assign probability by rank not absolute fitness. More robust to fitness scale differences.

Elitism: Always copy the best individual(s) to the next generation unchanged. Prevents losing the best solution found.

Selection pressure determines how quickly the population converges. Too much pressure → premature convergence. Too little → the GA wanders randomly.

Tournament selection is the default for good reason
It doesn't require global fitness scaling, works with negative fitness values, and selection pressure is easily controlled by the tournament size k. For most problems, k=3 offers a good balance.
Production Insight
Roulette wheel selection fails when fitness values are negative or when the best individual is orders of magnitude better than others — it quickly takes over the entire population reducing diversity.
In production systems, always use tournament or rank selection unless you have a reason not to.
Rule: if your population diversity drops rapidly, reduce selection pressure by lowering tournament size.
Key Takeaway
Tournament selection is robust and simple — start with k=3.
Roulette wheel is fragile — avoid it unless your fitness values are well-scaled positive numbers.
Never skip elitism — it guarantees monotonic best-so-far fitness.

Crossover and Mutation: The Diversity Trade-off

Crossover recombines genetic material from two parents. It exploits existing building blocks. Mutation introduces random changes to explore new areas. The balance between them is critical.

High crossover rate (0.8-0.9) means most offspring are created by recombination. High mutation rate (>0.1) turns the algorithm into random search. Low mutation rate (<0.001) leads to premature convergence.

For permutation problems (TSP), standard one-point crossover produces invalid tours (duplicate cities). Specialised operators like Order Crossover (OX) or Partially Mapped Crossover (PMX) preserve the permutation property.

Adaptive mutation: increase mutation rate when diversity drops, decrease it when diversity is high. Implementation: track the average hamming distance or fitness variance.

adaptive_mutation.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
def adaptive_mutate(individual, base_rate, diversity, target_diversity=0.3):
    """Increase mutation when diversity is low."""
    rate = base_rate * (target_diversity / max(diversity, 0.01))
    rate = min(rate, 0.5)  # cap at 50%
    return mutate(individual, rate)

def population_diversity(population):
    """Mean pairwise Hamming distance / chromosome length."""
    n = len(population)
    if n < 2: return 0
    chrom_len = len(population[0])
    total = 0
    for i in range(n):
        for j in range(i+1, n):
            total += sum(a!=b for a,b in zip(population[i], population[j]))
    # Normalise
    return total / (n * (n-1) / 2) / chrom_len
Production Insight
Static mutation rate is a common mistake. In production, convergence speed varies with problem instance. Adaptive mutation keeps the GA robust across different inputs.
Rule: if you're not tracking diversity, you're flying blind. Add a simple diversity metric (average pairwise hamming distance) and log it alongside best fitness.
Key Takeaway
Crossover exploits, mutation explores.
1/(chromosome length) is a good starting mutation rate for binary encodings.
Adaptive mutation is a cheap way to make your GA robust to different problem instances.

Travelling Salesman Example

The TSP is the canonical GA benchmark. Chromosome = permutation of cities. Crossover must preserve the permutation property (no city visited twice). Fitness = total route distance.

Order Crossover (OX): pick a segment from parent 1, then fill remaining positions from parent 2 in order, skipping already-used cities. This preserves relative order and creates valid tours.

For real-world TSP instances with 1000+ cities, a GA with OX crossover, tournament selection, and adaptive mutation can find solutions within 2% of optimal in minutes.

tsp_ga.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
import random, math

def tsp_fitness(route, cities):
    return 1 / sum(math.dist(cities[route[i]], cities[route[(i+1)%len(route)]])
                   for i in range(len(route)))

def order_crossover(p1, p2):
    """Order crossover (OX) — preserves relative order, no duplicates."""
    n = len(p1)
    a, b = sorted(random.sample(range(n), 2))
    child = [None] * n
    child[a:b] = p1[a:b]
    remaining = [x for x in p2 if x not in child]
    j = 0
    for i in range(n):
        if child[i] is None:
            child[i] = remaining[j]
            j += 1
    return child

def mutate_swap(route, rate=0.01):
    """Swap two cities with probability rate per position."""
    for i in range(len(route)):
        if random.random() < rate:
            j = random.randrange(len(route))
            route[i], route[j] = route[j], route[i]
    return route

# Demo with 10 cities
random.seed(42)
cities = {i: (random.uniform(0,100), random.uniform(0,100)) for i in range(10)}
pop_size = 100
# Initialise population with random routes
pop = [list(range(10)) for _ in range(pop_size)]
for p in pop:
    random.shuffle(p)

for gen in range(200):
    pop.sort(key=lambda r: 1/tsp_fitness(r, cities), reverse=False)
    best_dist = 1/tsp_fitness(pop[0], cities)
    # Elitism: keep top 2
    next_pop = pop[:2]
    while len(next_pop) < pop_size:
        p1 = random.choice(pop[:20])
        p2 = random.choice(pop[:20])
        child = order_crossover(p1, p2)
        child = mutate_swap(child, 0.02)
        next_pop.append(child)
    pop = next_pop
    if gen % 20 == 0:
        print(f'Gen {gen}: best distance = {best_dist:.2f}')
print(f'Final best distance = {1/tsp_fitness(pop[0], cities):.2f}')
Output
Gen 0: best distance = 531.24
Gen 20: best distance = 421.87
Gen 40: best distance = 371.44
Gen 60: best distance = 356.12
Gen 80: best distance = 349.87
Gen 100: best distance = 345.21
Gen 120: best distance = 342.97
Gen 140: best distance = 341.84
Gen 160: best distance = 341.33
Gen 180: best distance = 341.05
Final best distance = 340.98
Production Insight
In production, GA for TSP can handle up to ~10,000 cities before memory becomes an issue (each route stores 10k ints, population of 500 = ~40MB). For larger instances, use a hybrid GA with local search (nearest neighbour heuristic for initial population, 2-opt local improvement on the best solution).
Rule: if your TSP GA doesn't improve within 100 generations, check that your crossover operator preserves the permutation constraint — OX and PMX are safe choices.
Key Takeaway
TSP requires specialised crossover to maintain valid permutations.
Order Crossover (OX) is simple and effective for most TSP problems.
For large instances, hybridise with local search to speed convergence.

Hyperparameter Tuning: What Actually Matters

GAs have 4-5 critical hyperparameters: population size, mutation rate, crossover rate, selection pressure (tournament size), and elitism count.

Population size: 50-500 is common. Too small = premature convergence, too large = slow generations. Rule: set to 10× chromosome length for binary, 5× for permutations.

Mutation rate: Start at 1/(chromosome length). For binary strings of length 100, 0.01 works. For permutations of 100 cities, 0.02 (swap mutation) is a good start.

Crossover rate: 0.7-0.9. Lower values reduce exploitation, higher values risk losing good building blocks.

Tournament size: k=3 is a safe default. k=5 increases selection pressure. k=2 is weak.

Elitism: Always keep at least the single best individual. Keeping 2-5% of the population unchanged is common.

Don't tune all parameters at once. Start with default values, then tune sensitivity: which parameter changes best fitness most? That's the one to optimise.

Don't overfit hyperparameters to one problem instance
Hyperparameters that work for one TSP instance may fail on another. Use a validation set of multiple instances and average performance across them. In production, consider using adaptive methods (adaptive mutation, adaptive crossover) that self-tune during the run.
Production Insight
The biggest performance trap is using the same hyperparameters for different problem sizes. A GA tuned for 50-city TSP will perform poorly on 1000-city TSP because the search space grows exponentially with problem size, requiring larger population size and more generations.
Rule: scale hyperparameters with problem size — population size ∝ chromosome length, mutation rate ∝ 1/chromosome length, generations ∝ n log n for TSP.
Key Takeaway
Population size = 10× chromosome length is a safe starting point.
Tune one parameter at a time: start with mutation rate, then population size, then tournament size.
Use adaptive methods to avoid manual tuning per problem instance.

Performance Considerations and When to Avoid GAs

GAs are not the fastest optimisation method. Each generation evaluates pop_size individuals — that's pop_size × generations function evaluations. For problems where a single fitness evaluation takes seconds (e.g., training a neural network), GAs can be prohibitively slow.

When GAs work well: - Combinatorial problems with no gradient (TSP, scheduling, packing) - Multi-modal landscapes (many local optima) - Black-box objectives where you can't compute derivatives - Problems where you can parallelise fitness evaluations easily

When to avoid GAs: - Differentiable functions: use gradient descent - Small search space (< 10^6 possibilities): use exhaustive search or B&B - Need guaranteed optimality: use integer programming - Function evaluation is expensive (> 1 second): use Bayesian optimisation or random search instead

Parallelisation: Fitness evaluation is embarrassingly parallel. In production, evaluate all individuals in parallel using multiprocessing or a distributed executor (e.g., Dask, Spark). The rest of the GA (selection, crossover) is sequential but fast.

parallel_ga.pyPYTHON
1
2
3
4
5
6
7
8
from multiprocessing import Pool

def evaluate_population_parallel(population, fitness_func, workers=4):
    with Pool(workers) as pool:
        return pool.map(fitness_func, population)

# Usage in GA loop:
# fitnesses = evaluate_population_parallel(population, fitness_func, workers=8)
Production Insight
In production, the wall-clock time is dominated by fitness evaluation. If you have 8 cores and a population of 200, parallel evaluation gives a 7-8x speedup (minus overhead). Don't parallelise selection or crossover — they're O(pop_size) and fast.
Rule: if each fitness evaluation takes > 100ms, parallelise immediately. If > 1s, consider surrogates (approximate fitness models) to reduce evaluations.
Key Takeaway
GAs shine when gradients are absent and the objective is a black box.
Parallelise fitness evaluations — it's the easiest performance win.
If your problem has a derivative, use gradient descent — don't fight the tools.
● Production incidentPOST-MORTEMseverity: high

GA Converged to a Local Optimum for TSP

Symptom
Fitness plateaued early and never recovered. The best route was 15% longer than the known lower bound. Elitism kept the same best solution for 200 generations.
Assumption
The GA was working correctly — convergence is normal, and the solution was good enough.
Root cause
Mutation rate was set to 0.001 — too low to escape the local optimum. Also, the population size of 50 was too small to maintain diversity in a 100-city TSP.
Fix
Increased mutation rate to 0.02 and population size to 200. Introduced adaptive mutation: increase rate when fitness variance drops below a threshold. Also added a restart mechanism: if no improvement in 50 generations, re-initialise 20% of the population.
Key lesson
  • Always track diversity (population variance) not just best fitness.
  • Tune mutation rate to problem size — rule of thumb: 1/(chromosome length).
  • Use adaptive mutation to break out of local optima automatically.
  • Test convergence by running the GA multiple times with different seeds.
Production debug guideSymptom → Action for common GA failures4 entries
Symptom · 01
Fitness plateaus early, no improvement for 50+ generations
Fix
Check mutation rate (too low). Increase by 5x and re-run. Monitor population diversity.
Symptom · 02
Best fitness jumps around erratically
Fix
Mutation rate too high. Reduce by an order of magnitude. Also check crossover rate (should be 0.7-0.9).
Symptom · 03
Converges to same suboptimal solution every run
Fix
Probably trapped in local optimum. Increase population size, add restart mechanism, use tournament selection with higher k.
Symptom · 04
Fitness improves very slowly
Fix
Selection pressure too low. Switch from roulette to tournament selection with k=5 or use elitism with 10% carryover.
★ GA Quick Debug Cheat SheetCommands and checks for diagnosing GA performance issues. Most GAs are implemented in Python – adapt commands for your framework.
Premature convergence
Immediate action
Print average fitness and fitness variance every 10 generations
Commands
print(f'Gen {gen}: best={best_fitness:.4f}, avg={avg_fitness:.4f}, var={variance:.4f}')
mut_rate = 0.02 if variance < 0.01 else 0.001
Fix now
Increase mutation rate to 0.02 and re-run. If still plateaus, add restart: re-initialise 20% random individuals.
Erratic fitness (no convergence)+
Immediate action
Reduce mutation rate and check crossover rate is below 0.95
Commands
mut_rate = 0.005; crossover_rate = 0.85
print('Mutation rate:', mut_rate, 'Crossover rate:', crossover_rate)
Fix now
Set mutation rate to 1/(chromosome length) and crossover rate to 0.7. Re-run.
Best solution not improving after many generations+
Immediate action
Verify elitism is working (best parent copied unchanged)
Commands
if best_fitness_this_gen < best_fitness_so_far: # something wrong
assert population[0] == best_individual, 'Elitism failed!'
Fix now
Ensure elitism copies the top 1-2 individuals without crossover/mutation.
GA vs Other Optimisation Methods
CriterionGenetic AlgorithmSimulated AnnealingGradient DescentBayesian Optimisation
Search spaceDiscrete/permutation/combinatorialDiscrete/continuousContinuous, differentiableContinuous, expensive evaluations
Gradient requiredNoNoYesNo
Population size50-500Single solutionSingle solution (batches)Typically < 20 points at a time
Convergence guaranteeProbabilistic, no guaranteeProbabilistic, asymptoticConverges to local optimumConverges to global optimum (with many iterations)
ParallelisationEmbarrassingly parallelHard to paralleliseParallelise mini-batchesSequential (acquisition function optimisation)
Performance (fitness calls)pop_size × generations (10k-100k)Generations (1k-10k)Epochs × batch size (10k-1M)Usually < 1000 evaluations
When to useBlack-box, combinatorial, no gradientSimple discrete, few hyperparametersDeep learning, continuous optimisationExpensive evaluations, hyperparameter tuning

Key takeaways

1
Five components
representation, fitness function, selection, crossover, mutation. Getting all five right for your problem is the engineering challenge.
2
Elitism
always carry the best solution forward — prevents regression.
3
For permutation problems (TSP)
order crossover or PMX crossover preserves the permutation constraint.
4
GAs trade optimality for tractability
good solutions to NP-hard problems without exponential time.
5
Hyperparameters
population size (50-500), mutation rate (0.001-0.1), crossover rate (0.6-0.9). Problem-specific tuning required.
6
Track diversity (average pairwise distance) alongside best fitness
premature convergence is the #1 silent killer of GAs in production.
7
Parallelise fitness evaluations
that's where the wall-clock time goes.

Common mistakes to avoid

5 patterns
×

Using uniform crossover for permutation problems

Symptom
Offspring have duplicate cities or miss cities entirely — the route is invalid. The GA never finds a valid tour.
Fix
Use Order Crossover (OX) or Partially Mapped Crossover (PMX) which preserve the permutation constraint. Verify offspring validity after crossover.
×

Setting mutation rate too low or too high

Symptom
Too low: fitness plateaus early (premature convergence). Too high: fitness jumps erratically, never converges.
Fix
Start with mutation rate = 1/(chromosome length). Use adaptive mutation: increase rate when diversity drops below a threshold.
×

Not using elitism

Symptom
The best solution found disappears in the next generation because it wasn't selected for reproduction. Best fitness fluctuates instead of monotonically improving.
Fix
Always copy the top 1-5% of the population unchanged to the next generation. This guarantees monotonic best-so-far fitness.
×

Using roulette wheel selection with negative fitness values

Symptom
Selection probabilities become undefined or biased. The GA fails to converge or behaves randomly.
Fix
Use tournament selection (k=3) which works with any fitness values. Or shift all fitness values to be positive before using roulette (e.g., subtract the minimum).
×

Running too few generations

Symptom
The algorithm stops early, producing a solution far from optimal. Increasing generations magically improves results.
Fix
Set a maximum generation count based on problem complexity. Use a diversity-based stopping criterion: stop when average pairwise distance drops below 5% of initial value.
INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01JUNIOR
What are the five components of a genetic algorithm? Explain each briefl...
Q02SENIOR
Why can't you use standard one-point crossover for the Travelling Salesm...
Q03SENIOR
How does elitism prevent regression in genetic algorithms? What are the ...
Q04SENIOR
When would you choose a genetic algorithm over gradient descent? Give a ...
Q05SENIOR
Explain the concept of premature convergence in GAs. How do you detect i...
Q01 of 05JUNIOR

What are the five components of a genetic algorithm? Explain each briefly.

ANSWER
The five components are: 1. Representation — how a solution is encoded (binary, permutation, tree, etc.) 2. Fitness function — evaluates how good a solution is 3. Selection — chooses which individuals reproduce (tournament, roulette, rank) 4. Crossover — combines two parents to create offspring (one-point, OX, PMX) 5. Mutation — randomly perturbs offspring to maintain diversity Each must be chosen to match the problem structure. The representation determines what crossover and mutation operators are valid.
FAQ · 4 QUESTIONS

Frequently Asked Questions

01
When should I use genetic algorithms vs simulated annealing vs gradient descent?
02
What is the schema theorem and why is it important for genetic algorithms?
03
How do you choose the representation for a new problem?
04
Can genetic algorithms be used for multi-objective optimisation?
🔥

That's Genetic Algorithms. Mark it forged?

3 min read · try the examples if you haven't

Previous
Singular Value Decomposition — SVD
1 / 2 · Genetic Algorithms
Next
Simulated Annealing — Probabilistic Optimisation