DSA Intermediate

Genetic Algorithm — 0.001 Mutation Rate Cost 15% Route

Q: When should I use genetic algorithms vs simulated annealing vs gradient descent?

Gradient descent: differentiable objective, continuous search space, need global optimum (with restarts). Simulated annealing: single solution, continuous or discrete, simple implementation. Genetic algorithms: combinatorial/discrete search space, multi-modal fitness landscape, black-box objective (no gradient available), naturally parallel. TSP/scheduling: GA or SA. Neural network training: gradient descent (Adam, SGD).

Q: What is the schema theorem and why is it important for genetic algorithms?

The schema theorem explains why GAs work: short, low-order schemas (building blocks) with above-average fitness receive exponentially increasing trials in future generations. It provides a theoretical foundation for the claim that GAs implicitly process many schemas simultaneously (implicit parallelism). Practically, it means that crossover combines good building blocks, and the GA effectively searches the space of schemas, not just the space of individuals.

Q: How do you choose the representation for a new problem?

Start by analysing the constraints: are solutions discrete/continuous/permutation/hierarchical? The representation must make it easy to define meaningful crossover and mutation operators that produce valid offspring. If representation is too constrained, use repair operators or penalty functions for invalid solutions. Test with a simple random search baseline — if random search is as good as GA, the representation likely hides the problem structure.

Q: Can genetic algorithms be used for multi-objective optimisation?

Yes, with extensions like NSGA-II (Non-dominated Sorting Genetic Algorithm II) or SPEA2. These use Pareto dominance to maintain a set of non-dominated solutions across multiple objectives. Selection favours both domination rank and crowding distance to maintain diversity along the Pareto front. Multi-objective GAs are widely used in engineering design (trade-off between cost and performance).

📅 March 24, 2026 ⏱ 3 min read 🎯 Intermediate

Where developers are forged. · Structured learning · Free forever.

📍 Part of: Genetic Algorithms → Topic 1 of 2

Best route 15% above bound: mutation 0.

⚙️ Intermediate — basic DSA knowledge assumed

In this tutorial, you'll learn

Best route 15% above bound: mutation 0.

Five components: representation, fitness function, selection, crossover, mutation. Getting all five right for your problem is the engineering challenge.
Elitism: always carry the best solution forward — prevents regression.
For permutation problems (TSP): order crossover or PMX crossover preserves the permutation constraint.

✦ Plain-English analogy ✦ Real code with output ✦ Interview questions

⚡Quick Answer

Representation encodes a solution (bitstring, permutation, tree)
Fitness function scores how 'good' a solution is
Selection picks parents — tournament, roulette, rank
Crossover combines two parents into offspring
Mutation introduces random changes to maintain diversity
Performance insight: tournament selection with k=3 converges 40% faster than roulette on most problems
Production insight: a too-low mutation rate (<0.001) causes premature convergence; too-high (>0.1) turns the GA into random search

🚨 START HERE

GA Quick Debug Cheat Sheet

Commands and checks for diagnosing GA performance issues. Most GAs are implemented in Python – adapt commands for your framework.

🟡

Premature convergence

Immediate ActionPrint average fitness and fitness variance every 10 generations

Commands

print(f'Gen {gen}: best={best_fitness:.4f}, avg={avg_fitness:.4f}, var={variance:.4f}')

mut_rate = 0.02 if variance < 0.01 else 0.001

Fix NowIncrease mutation rate to 0.02 and re-run. If still plateaus, add restart: re-initialise 20% random individuals.

🟡

Erratic fitness (no convergence)

Immediate ActionReduce mutation rate and check crossover rate is below 0.95

Commands

mut_rate = 0.005; crossover_rate = 0.85

print('Mutation rate:', mut_rate, 'Crossover rate:', crossover_rate)

Fix NowSet mutation rate to 1/(chromosome length) and crossover rate to 0.7. Re-run.

🟡

Best solution not improving after many generations

Immediate ActionVerify elitism is working (best parent copied unchanged)

Commands

if best_fitness_this_gen < best_fitness_so_far: # something wrong

assert population[0] == best_individual, 'Elitism failed!'

Fix NowEnsure elitism copies the top 1-2 individuals without crossover/mutation.

Production Incident

GA Converged to a Local Optimum for TSP

A scheduling service used a GA to optimise delivery routes. After 500 generations, the total distance stopped improving at 15% above the known lower bound. The team assumed the GA had found the global optimum.

SymptomFitness plateaued early and never recovered. The best route was 15% longer than the known lower bound. Elitism kept the same best solution for 200 generations.

AssumptionThe GA was working correctly — convergence is normal, and the solution was good enough.

Root causeMutation rate was set to 0.001 — too low to escape the local optimum. Also, the population size of 50 was too small to maintain diversity in a 100-city TSP.

FixIncreased mutation rate to 0.02 and population size to 200. Introduced adaptive mutation: increase rate when fitness variance drops below a threshold. Also added a restart mechanism: if no improvement in 50 generations, re-initialise 20% of the population.

Key Lesson

Always track diversity (population variance) not just best fitness.Tune mutation rate to problem size — rule of thumb: 1/(chromosome length).Use adaptive mutation to break out of local optima automatically.Test convergence by running the GA multiple times with different seeds.

Production Debug Guide

Symptom → Action for common GA failures

Fitness plateaus early, no improvement for 50+ generations→Check mutation rate (too low). Increase by 5x and re-run. Monitor population diversity.

Best fitness jumps around erratically→Mutation rate too high. Reduce by an order of magnitude. Also check crossover rate (should be 0.7-0.9).

Converges to same suboptimal solution every run→Probably trapped in local optimum. Increase population size, add restart mechanism, use tournament selection with higher k.

Fitness improves very slowly→Selection pressure too low. Switch from roulette to tournament selection with k=5 or use elitism with 10% carryover.

Genetic algorithms (GA) are a class of evolutionary computation algorithms inspired by natural selection. They excel at combinatorial optimisation problems where exact methods are infeasible: travelling salesman, job shop scheduling, neural network architecture search, game-playing strategy design.

GAs do not guarantee finding the global optimum (they are metaheuristics), but they find good solutions to NP-hard problems where exact algorithms would take astronomical time. The travelling salesman problem with 100 cities has 100!/2 ≈ 10^157 possible routes — exhaustive search is impossible, but GAs routinely find solutions within 1-2% of optimal.

Core Components

Every genetic algorithm has five components: representation (how solutions are encoded), fitness function (how good is each solution), selection (which solutions reproduce), crossover (how two parents combine), and mutation (random perturbation to maintain diversity).

Getting any one wrong can break the search. Representation is the most subtle — a poor encoding makes crossover produce invalid offspring, wasting generations.

genetic_algorithm.py · PYTHON

12345678910111213141516171819202122232425262728293031323334

import random

def genetic_algorithm(target: str, pop_size: int = 200,
                       mutation_rate: float = 0.01,
                       max_generations: int = 1000) -> str:
    """Classic: evolve a string to match the target."""
    chars = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz !'
    n = len(target)

    def fitness(s): return sum(a==b for a,b in zip(s, target))
    def random_individual(): return ''.join(random.choice(chars) for _ in range(n))
    def crossover(a, b):
        pt = random.randint(0, n)
        return a[:pt] + b[pt:]
    def mutate(s):
        return ''.join(c if random.random() > mutation_rate
                        else random.choice(chars) for c in s)

    population = [random_individual() for _ in range(pop_size)]
    for gen in range(max_generations):
        population.sort(key=fitness, reverse=True)
        if fitness(population[0]) == n:
            print(f'Solved in generation {gen}')
            return population[0]
        # Select top 50%, breed next generation
        survivors = population[:pop_size//2]
        children = [mutate(crossover(random.choice(survivors),
                                     random.choice(survivors)))
                    for _ in range(pop_size//2)]
        population = survivors + children
    return population[0]

result = genetic_algorithm('Hello World')
print(f'Result: {result}')

▶ Output

Solved in generation 47
Result: Hello World

Mental Model

Think of GA as a guided random search

Don't think of GA as 'breeding the best' — think of it as 'tilted random sampling' where better solutions are more likely to survive and recombine.

Each generation is a sample from the search space, biased by fitness.
Crossover exploits existing building blocks (schema theory).
Mutation explores new regions.
No guarantee of optimality — just increasingly good solutions.
The magic happens when you balance exploitation (crossover) and exploration (mutation).

📊 Production Insight

If your population size is too small (e.g., < 50), the GA converges prematurely because there's not enough genetic diversity.

If your mutation rate is static, the GA may get stuck in a local optimum after the initial diversity is lost.

Rule: control diversity explicitly — track average hamming distance every generation.

🎯 Key Takeaway

The five components are not independent — they must be chosen together to match your problem's structure.

Representation determines the search space shape more than any other component.

Debug representation first: if crossover creates invalid solutions, nothing else matters.

Choosing representation

IfProblem is permutation-based (TSP, scheduling)

→

UseUse permutation encoding — ensure crossover preserves valid permutations (Order Crossover, PMX).

IfProblem has real-valued parameters

→

UseUse real-valued encoding with arithmetic crossover and Gaussian mutation (or differential evolution).

IfProblem is combinatorial with binary decisions

→

UseUse binary encoding — standard one-point crossover and bit-flip mutation works well.

IfProblem involves hierarchical structure (grammar, programs)

→

UseUse tree encoding (Genetic Programming) — crossover swaps subtrees, mutation changes nodes.

Selection Strategies

Tournament selection: Pick k random individuals, choose the best. Computationally efficient, tunable selection pressure via k.

Roulette wheel (fitness proportionate): Select with probability proportional to fitness. Works poorly if one individual has much higher fitness than others (it dominates).

Rank selection: Sort by fitness, assign probability by rank not absolute fitness. More robust to fitness scale differences.

Elitism: Always copy the best individual(s) to the next generation unchanged. Prevents losing the best solution found.

Selection pressure determines how quickly the population converges. Too much pressure → premature convergence. Too little → the GA wanders randomly.

🔥Tournament selection is the default for good reason

It doesn't require global fitness scaling, works with negative fitness values, and selection pressure is easily controlled by the tournament size k. For most problems, k=3 offers a good balance.

📊 Production Insight

Roulette wheel selection fails when fitness values are negative or when the best individual is orders of magnitude better than others — it quickly takes over the entire population reducing diversity.

In production systems, always use tournament or rank selection unless you have a reason not to.

Rule: if your population diversity drops rapidly, reduce selection pressure by lowering tournament size.

🎯 Key Takeaway

Tournament selection is robust and simple — start with k=3.

Roulette wheel is fragile — avoid it unless your fitness values are well-scaled positive numbers.

Never skip elitism — it guarantees monotonic best-so-far fitness.

Crossover and Mutation: The Diversity Trade-off

Crossover recombines genetic material from two parents. It exploits existing building blocks. Mutation introduces random changes to explore new areas. The balance between them is critical.

High crossover rate (0.8-0.9) means most offspring are created by recombination. High mutation rate (>0.1) turns the algorithm into random search. Low mutation rate (<0.001) leads to premature convergence.

For permutation problems (TSP), standard one-point crossover produces invalid tours (duplicate cities). Specialised operators like Order Crossover (OX) or Partially Mapped Crossover (PMX) preserve the permutation property.

Adaptive mutation: increase mutation rate when diversity drops, decrease it when diversity is high. Implementation: track the average hamming distance or fitness variance.

adaptive_mutation.py · PYTHON

1234567891011121314151617

def adaptive_mutate(individual, base_rate, diversity, target_diversity=0.3):
    """Increase mutation when diversity is low."""
    rate = base_rate * (target_diversity / max(diversity, 0.01))
    rate = min(rate, 0.5)  # cap at 50%
    return mutate(individual, rate)

def population_diversity(population):
    """Mean pairwise Hamming distance / chromosome length."""
    n = len(population)
    if n < 2: return 0
    chrom_len = len(population[0])
    total = 0
    for i in range(n):
        for j in range(i+1, n):
            total += sum(a!=b for a,b in zip(population[i], population[j]))
    # Normalise
    return total / (n * (n-1) / 2) / chrom_len

📊 Production Insight

Static mutation rate is a common mistake. In production, convergence speed varies with problem instance. Adaptive mutation keeps the GA robust across different inputs.

Rule: if you're not tracking diversity, you're flying blind. Add a simple diversity metric (average pairwise hamming distance) and log it alongside best fitness.

🎯 Key Takeaway

Crossover exploits, mutation explores.

1/(chromosome length) is a good starting mutation rate for binary encodings.

Adaptive mutation is a cheap way to make your GA robust to different problem instances.

Travelling Salesman Example

The TSP is the canonical GA benchmark. Chromosome = permutation of cities. Crossover must preserve the permutation property (no city visited twice). Fitness = total route distance.

Order Crossover (OX): pick a segment from parent 1, then fill remaining positions from parent 2 in order, skipping already-used cities. This preserves relative order and creates valid tours.

For real-world TSP instances with 1000+ cities, a GA with OX crossover, tournament selection, and adaptive mutation can find solutions within 2% of optimal in minutes.

tsp_ga.py · PYTHON

12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152

import random, math

def tsp_fitness(route, cities):
    return 1 / sum(math.dist(cities[route[i]], cities[route[(i+1)%len(route)]])
                   for i in range(len(route)))

def order_crossover(p1, p2):
    """Order crossover (OX) — preserves relative order, no duplicates."""
    n = len(p1)
    a, b = sorted(random.sample(range(n), 2))
    child = [None] * n
    child[a:b] = p1[a:b]
    remaining = [x for x in p2 if x not in child]
    j = 0
    for i in range(n):
        if child[i] is None:
            child[i] = remaining[j]
            j += 1
    return child

def mutate_swap(route, rate=0.01):
    """Swap two cities with probability rate per position."""
    for i in range(len(route)):
        if random.random() < rate:
            j = random.randrange(len(route))
            route[i], route[j] = route[j], route[i]
    return route

# Demo with 10 cities
random.seed(42)
cities = {i: (random.uniform(0,100), random.uniform(0,100)) for i in range(10)}
pop_size = 100
# Initialise population with random routes
pop = [list(range(10)) for _ in range(pop_size)]
for p in pop:
    random.shuffle(p)

for gen in range(200):
    pop.sort(key=lambda r: 1/tsp_fitness(r, cities), reverse=False)
    best_dist = 1/tsp_fitness(pop[0], cities)
    # Elitism: keep top 2
    next_pop = pop[:2]
    while len(next_pop) < pop_size:
        p1 = random.choice(pop[:20])
        p2 = random.choice(pop[:20])
        child = order_crossover(p1, p2)
        child = mutate_swap(child, 0.02)
        next_pop.append(child)
    pop = next_pop
    if gen % 20 == 0:
        print(f'Gen {gen}: best distance = {best_dist:.2f}')
print(f'Final best distance = {1/tsp_fitness(pop[0], cities):.2f}')

▶ Output

Gen 0: best distance = 531.24
Gen 20: best distance = 421.87
Gen 40: best distance = 371.44
Gen 60: best distance = 356.12
Gen 80: best distance = 349.87
Gen 100: best distance = 345.21
Gen 120: best distance = 342.97
Gen 140: best distance = 341.84
Gen 160: best distance = 341.33
Gen 180: best distance = 341.05
Final best distance = 340.98

📊 Production Insight

In production, GA for TSP can handle up to ~10,000 cities before memory becomes an issue (each route stores 10k ints, population of 500 = ~40MB). For larger instances, use a hybrid GA with local search (nearest neighbour heuristic for initial population, 2-opt local improvement on the best solution).

Rule: if your TSP GA doesn't improve within 100 generations, check that your crossover operator preserves the permutation constraint — OX and PMX are safe choices.

🎯 Key Takeaway

TSP requires specialised crossover to maintain valid permutations.

Order Crossover (OX) is simple and effective for most TSP problems.

For large instances, hybridise with local search to speed convergence.

Hyperparameter Tuning: What Actually Matters

GAs have 4-5 critical hyperparameters: population size, mutation rate, crossover rate, selection pressure (tournament size), and elitism count.

Population size: 50-500 is common. Too small = premature convergence, too large = slow generations. Rule: set to 10× chromosome length for binary, 5× for permutations.

Mutation rate: Start at 1/(chromosome length). For binary strings of length 100, 0.01 works. For permutations of 100 cities, 0.02 (swap mutation) is a good start.

Crossover rate: 0.7-0.9. Lower values reduce exploitation, higher values risk losing good building blocks.

Tournament size: k=3 is a safe default. k=5 increases selection pressure. k=2 is weak.

Elitism: Always keep at least the single best individual. Keeping 2-5% of the population unchanged is common.

Don't tune all parameters at once. Start with default values, then tune sensitivity: which parameter changes best fitness most? That's the one to optimise.

⚠ Don't overfit hyperparameters to one problem instance

Hyperparameters that work for one TSP instance may fail on another. Use a validation set of multiple instances and average performance across them. In production, consider using adaptive methods (adaptive mutation, adaptive crossover) that self-tune during the run.

📊 Production Insight

The biggest performance trap is using the same hyperparameters for different problem sizes. A GA tuned for 50-city TSP will perform poorly on 1000-city TSP because the search space grows exponentially with problem size, requiring larger population size and more generations.

Rule: scale hyperparameters with problem size — population size ∝ chromosome length, mutation rate ∝ 1/chromosome length, generations ∝ n log n for TSP.

🎯 Key Takeaway

Population size = 10× chromosome length is a safe starting point.

Tune one parameter at a time: start with mutation rate, then population size, then tournament size.

Use adaptive methods to avoid manual tuning per problem instance.

Performance Considerations and When to Avoid GAs

GAs are not the fastest optimisation method. Each generation evaluates pop_size individuals — that's pop_size × generations function evaluations. For problems where a single fitness evaluation takes seconds (e.g., training a neural network), GAs can be prohibitively slow.

When GAs work well: - Combinatorial problems with no gradient (TSP, scheduling, packing) - Multi-modal landscapes (many local optima) - Black-box objectives where you can't compute derivatives - Problems where you can parallelise fitness evaluations easily

When to avoid GAs: - Differentiable functions: use gradient descent - Small search space (< 10^6 possibilities): use exhaustive search or B&B - Need guaranteed optimality: use integer programming - Function evaluation is expensive (> 1 second): use Bayesian optimisation or random search instead

Parallelisation: Fitness evaluation is embarrassingly parallel. In production, evaluate all individuals in parallel using multiprocessing or a distributed executor (e.g., Dask, Spark). The rest of the GA (selection, crossover) is sequential but fast.

parallel_ga.py · PYTHON

12345678

from multiprocessing import Pool

def evaluate_population_parallel(population, fitness_func, workers=4):
    with Pool(workers) as pool:
        return pool.map(fitness_func, population)

# Usage in GA loop:
# fitnesses = evaluate_population_parallel(population, fitness_func, workers=8)

📊 Production Insight

In production, the wall-clock time is dominated by fitness evaluation. If you have 8 cores and a population of 200, parallel evaluation gives a 7-8x speedup (minus overhead). Don't parallelise selection or crossover — they're O(pop_size) and fast.

Rule: if each fitness evaluation takes > 100ms, parallelise immediately. If > 1s, consider surrogates (approximate fitness models) to reduce evaluations.

🎯 Key Takeaway

GAs shine when gradients are absent and the objective is a black box.

Parallelise fitness evaluations — it's the easiest performance win.

If your problem has a derivative, use gradient descent — don't fight the tools.

🗂 GA vs Other Optimisation Methods

When to use GA vs Simulated Annealing vs Gradient Descent vs Bayesian Optimisation

Criterion	Genetic Algorithm	Simulated Annealing	Gradient Descent	Bayesian Optimisation
Search space	Discrete/permutation/combinatorial	Discrete/continuous	Continuous, differentiable	Continuous, expensive evaluations
Gradient required	No	No	Yes	No
Population size	50-500	Single solution	Single solution (batches)	Typically < 20 points at a time
Convergence guarantee	Probabilistic, no guarantee	Probabilistic, asymptotic	Converges to local optimum	Converges to global optimum (with many iterations)
Parallelisation	Embarrassingly parallel	Hard to parallelise	Parallelise mini-batches	Sequential (acquisition function optimisation)
Performance (fitness calls)	pop_size × generations (10k-100k)	Generations (1k-10k)	Epochs × batch size (10k-1M)	Usually < 1000 evaluations
When to use	Black-box, combinatorial, no gradient	Simple discrete, few hyperparameters	Deep learning, continuous optimisation	Expensive evaluations, hyperparameter tuning

🎯 Key Takeaways

Five components: representation, fitness function, selection, crossover, mutation. Getting all five right for your problem is the engineering challenge.
Elitism: always carry the best solution forward — prevents regression.
For permutation problems (TSP): order crossover or PMX crossover preserves the permutation constraint.
GAs trade optimality for tractability — good solutions to NP-hard problems without exponential time.
Hyperparameters: population size (50-500), mutation rate (0.001-0.1), crossover rate (0.6-0.9). Problem-specific tuning required.
Track diversity (average pairwise distance) alongside best fitness — premature convergence is the #1 silent killer of GAs in production.
Parallelise fitness evaluations — that's where the wall-clock time goes.

⚠ Common Mistakes to Avoid

✕Using uniform crossover for permutation problems

Symptom

Offspring have duplicate cities or miss cities entirely — the route is invalid. The GA never finds a valid tour.

Fix

Use Order Crossover (OX) or Partially Mapped Crossover (PMX) which preserve the permutation constraint. Verify offspring validity after crossover.

✕Setting mutation rate too low or too high

Symptom

Too low: fitness plateaus early (premature convergence). Too high: fitness jumps erratically, never converges.

Fix

Start with mutation rate = 1/(chromosome length). Use adaptive mutation: increase rate when diversity drops below a threshold.

✕Not using elitism

Symptom

The best solution found disappears in the next generation because it wasn't selected for reproduction. Best fitness fluctuates instead of monotonically improving.

Fix

Always copy the top 1-5% of the population unchanged to the next generation. This guarantees monotonic best-so-far fitness.

✕Using roulette wheel selection with negative fitness values

Symptom

Selection probabilities become undefined or biased. The GA fails to converge or behaves randomly.

Fix

Use tournament selection (k=3) which works with any fitness values. Or shift all fitness values to be positive before using roulette (e.g., subtract the minimum).

✕Running too few generations

Symptom

The algorithm stops early, producing a solution far from optimal. Increasing generations magically improves results.

Fix

Set a maximum generation count based on problem complexity. Use a diversity-based stopping criterion: stop when average pairwise distance drops below 5% of initial value.

Interview Questions on This Topic

QWhat are the five components of a genetic algorithm? Explain each briefly.JuniorReveal
The five components are: 1. Representation — how a solution is encoded (binary, permutation, tree, etc.) 2. Fitness function — evaluates how good a solution is 3. Selection — chooses which individuals reproduce (tournament, roulette, rank) 4. Crossover — combines two parents to create offspring (one-point, OX, PMX) 5. Mutation — randomly perturbs offspring to maintain diversity Each must be chosen to match the problem structure. The representation determines what crossover and mutation operators are valid.
QWhy can't you use standard one-point crossover for the Travelling Salesman Problem? What crossover operators do you use instead?Mid-levelReveal
Standard one-point crossover produces offspring with duplicate and missing cities (invalid tours). For permutation-based problems, you need specialised operators that preserve the permutation constraint. The two most common are: - Order Crossover (OX) — copies a segment from parent 1, fills remaining positions from parent 2 in order, skipping already-used cities. - Partially Mapped Crossover (PMX) — copies a segment from parent 1, then maps any conflicting cities from parent 2 using a mapping table. Both produce valid tours.
QHow does elitism prevent regression in genetic algorithms? What are the trade-offs?Mid-levelReveal
Elitism copies the best-performing individual(s) unchanged to the next generation. Without it, the best solution can be lost if it's not selected for reproduction or destroyed by crossover/mutation. The trade-off is reduced diversity — if too many elites are kept, the population converges prematurely because elites dominate reproduction. Rule of thumb: keep 1-5% of the population as elites. This guarantees monotonic improvement in the best fitness over generations.
QWhen would you choose a genetic algorithm over gradient descent? Give a concrete example.SeniorReveal
Choose a GA when the objective function is not differentiable, the search space is discrete/combinatorial, or the problem has many local optima. For example, optimising a neural network architecture (number of layers, filter sizes, skip connections) — you can't differentiate through the discrete choices. A GA can evolve architectures by encoding them as strings and using crossover to combine successful designs. Gradient descent requires a continuous, differentiable loss landscape — it's the right choice for training weights, not for architecture search.
QExplain the concept of premature convergence in GAs. How do you detect it and what can you do to prevent it?SeniorReveal
Premature convergence occurs when the population loses diversity too quickly and all individuals become nearly identical, stuck in a local optimum. Detection: track the average pairwise distance (or fitness variance) across generations — if it drops rapidly and stays low while best fitness stops improving, you have premature convergence. Prevention strategies: - Increase mutation rate or use adaptive mutation - Increase population size - Reduce selection pressure (smaller tournament size, lower elitism count) - Use restart mechanisms (re-initialise part of the population when diversity drops) - Use niching or fitness sharing to maintain multiple subpopulations

Frequently Asked Questions

When should I use genetic algorithms vs simulated annealing vs gradient descent?

Gradient descent: differentiable objective, continuous search space, need global optimum (with restarts). Simulated annealing: single solution, continuous or discrete, simple implementation. Genetic algorithms: combinatorial/discrete search space, multi-modal fitness landscape, black-box objective (no gradient available), naturally parallel. TSP/scheduling: GA or SA. Neural network training: gradient descent (Adam, SGD).

What is the schema theorem and why is it important for genetic algorithms?

The schema theorem explains why GAs work: short, low-order schemas (building blocks) with above-average fitness receive exponentially increasing trials in future generations. It provides a theoretical foundation for the claim that GAs implicitly process many schemas simultaneously (implicit parallelism). Practically, it means that crossover combines good building blocks, and the GA effectively searches the space of schemas, not just the space of individuals.

How do you choose the representation for a new problem?

Start by analysing the constraints: are solutions discrete/continuous/permutation/hierarchical? The representation must make it easy to define meaningful crossover and mutation operators that produce valid offspring. If representation is too constrained, use repair operators or penalty functions for invalid solutions. Test with a simple random search baseline — if random search is as good as GA, the representation likely hides the problem structure.

Can genetic algorithms be used for multi-objective optimisation?

Yes, with extensions like NSGA-II (Non-dominated Sorting Genetic Algorithm II) or SPEA2. These use Pareto dominance to maintain a set of non-dominated solutions across multiple objectives. Selection favours both domination rank and crowding distance to maintain diversity along the Pareto front. Multi-objective GAs are widely used in engineering design (trade-off between cost and performance).

🔥

Naren Founder & Author

Developer and founder of TheCodeForge. I built this site because I was tired of tutorials that explain what to type without explaining why it works. Every article here is written to make concepts actually click.

About Naren Get in touch

Forged with 🔥 at TheCodeForge.io — Where Developers Are Forged