Overfitting - When Fraud Detection Blocks All Transactions
Fraud alert rate jumped from 2% to 80% in hours due to overfitting on imbalanced data; never trust training accuracy - use our production debug guide.
- Overfitting: training error low, validation error high — model memorises noise
- Underfitting: both training and validation errors high — model misses signal
- Use learning curves: training/validation error vs training set size
- Fix overfitting: regularisation, more data, simplify model, early stopping
- Fix underfitting: increase complexity, add features, train longer
- Biggest mistake: adding more data when the model is underfitting — it won't help
Imagine you're studying for a history exam. If you memorise every question from last year's paper word-for-word, you'll ace a re-run but completely blank on any new question — that's overfitting. If you barely glance at the textbook and just guess 'World War 2' for everything, you'll fail because you learned too little — that's underfitting. A great student learns the patterns and principles, not the exact answers. Your ML model needs to do exactly the same thing.
Every ML model you build has one job: make good predictions on data it has never seen before. It sounds simple, but the single biggest reason models fail in production isn't bad algorithms or messy data — it's getting the balance of learning wrong. A model that learns too much from its training data becomes obsessed with noise and quirks that don't generalise. A model that learns too little never captures the real signal in the first place. Both failures have names, both are measurable, and both are fixable once you understand what's actually happening inside the model.
Overfitting and underfitting sit at opposite ends of a spectrum called the bias-variance tradeoff. Understanding this tradeoff is what separates engineers who tune models by intuition from those who tune them systematically. When you know WHY a model overfits, you stop throwing random regularisation at it and start making deliberate, principled decisions about complexity, data size, and training strategy.
By the end of this article you'll be able to plot a learning curve and diagnose whether your model is overfitting or underfitting just by looking at it. You'll have working Python code that deliberately creates both problems and then fixes them — so the concepts stick in your hands, not just your head. And you'll walk away knowing exactly which levers to pull in each scenario.
The Technical Root: Bias vs. Variance
To fix a failing model, you must diagnose its soul. Underfitting is caused by High Bias—the model makes simplistic assumptions about the data. Overfitting is caused by High Variance—the model is overly sensitive to small fluctuations in the training set. In a production environment, we use Learning Curves (plotting Error vs. Training Set Size) to visualize this struggle.
The bias-variance tradeoff is fundamental. A model with high bias pays little attention to data and consistently underfits. A model with high variance pays too much attention to data, including noise. Finding the sweet spot is what model tuning is all about.
- Both curves high and close together: high bias (underfitting)
- Training curve low, validation curve high with a gap: high variance (overfitting)
- Both curves low and close together: good fit
Detecting Overfitting Early with Validation Curves
Validation curves show how model performance changes with a hyperparameter (e.g., polynomial degree, tree depth). They help you find the sweet spot before overfitting takes hold. Plotting validation curves during development is far cheaper than discovering the problem in production.
In production systems, we automate this with CI/CD pipelines that generate validation curves for every candidate model. The pipeline rejects any model where the validation curve shows a gap larger than a configurable threshold (typically 10-15% for classification accuracy).
Here's how to generate a validation curve for polynomial degree using scikit-learn:
Fixing Overfitting: Regularisation, Dropout, and Pruning
When you've confirmed overfitting, the toolkit is broad but principled. The most effective levers are:
- L1/L2 Regularisation: Adds a penalty to large weights. L1 drives weights to zero (feature selection), L2 shrinks them.
- Dropout: Randomly drops neurons during training — forces the network to learn redundant representations.
- Early Stopping: Monitors validation loss and stops training when it starts increasing.
- Reduce Model Complexity: Fewer layers, fewer trees, lower polynomial degree.
- Increase Training Data: More data reduces the impact of noise.
- Feature Selection: Remove irrelevant features that introduce noise.
Here's a production-ready Python example that applies all three regularisation techniques:
- L2 regularisation adds a penalty proportional to the square of weights — common in neural nets.
- L1 regularisation adds penalty proportional to absolute weight — useful for feature selection.
- Dropout randomly turns off neurons — like training an ensemble of simpler models.
- Early stopping cuts training when validation stops improving — prevents memorisation.
Fixing Underfitting: Complexity, Features, and Training Time
Underfitting means your model is too simple. The solution is to give it more capacity. But adding complexity blindly can tip into overfitting — you need a controlled approach.
- Increase Model Complexity: Use higher-degree polynomials, deeper trees, more layers.
- Engineer Better Features: Interactions, polynomial features, domain-specific aggregations.
- Reduce Regularisation: Too much regularisation can itself cause underfitting.
- Train Longer: Sometimes the model just needs more epochs to converge.
- Reduce Feature Noise: Remove irrelevant features that dilute signal.
Here's a Python workflow that detects underfitting and applies fixes systematically:
Production-Grade Monitoring and Alerting for Model Drift
Once your model is in production, it can still drift into overfitting or underfitting as data distributions change. This is called concept drift or covariate shift. You need automated monitoring.
- Prediction distribution: Does the average prediction stay stable?
- Feature distribution: Are incoming features within the training range?
- Error rate over time: Is the model's error creeping up?
Here's a Java implementation that logs warnings when drift indicators trigger:
When a Fraud Detection Model Starts Calling Every Transaction Fraudulent
- Never trust training accuracy alone — especially on imbalanced data.
- Always monitor validation metrics during retraining.
- If the validation curve starts diverging from training, stop and check.
- More complexity isn't better; it's just more dangerous.
Key takeaways
Common mistakes to avoid
5 patternsThrowing more data at an underfitting model
Ignoring the validation set until the end of the project
Over-tuning hyperparameters on the test set (data leakage)
Using a high-degree polynomial on a small dataset without regularization
Assuming more features always improve performance
Interview Questions on This Topic
How do you distinguish between high bias and high variance using a learning curve?
Frequently Asked Questions
That's ML Basics. Mark it forged?
3 min read · try the examples if you haven't