Junior 3 min · March 06, 2026

Overfitting - When Fraud Detection Blocks All Transactions

Fraud alert rate jumped from 2% to 80% in hours due to overfitting on imbalanced data; never trust training accuracy - use our production debug guide.

N
Naren · Founder
Plain-English first. Then code. Then the interview question.
About
 ● Production Incident 🔎 Debug Guide
Quick Answer
  • Overfitting: training error low, validation error high — model memorises noise
  • Underfitting: both training and validation errors high — model misses signal
  • Use learning curves: training/validation error vs training set size
  • Fix overfitting: regularisation, more data, simplify model, early stopping
  • Fix underfitting: increase complexity, add features, train longer
  • Biggest mistake: adding more data when the model is underfitting — it won't help
Plain-English First

Imagine you're studying for a history exam. If you memorise every question from last year's paper word-for-word, you'll ace a re-run but completely blank on any new question — that's overfitting. If you barely glance at the textbook and just guess 'World War 2' for everything, you'll fail because you learned too little — that's underfitting. A great student learns the patterns and principles, not the exact answers. Your ML model needs to do exactly the same thing.

Every ML model you build has one job: make good predictions on data it has never seen before. It sounds simple, but the single biggest reason models fail in production isn't bad algorithms or messy data — it's getting the balance of learning wrong. A model that learns too much from its training data becomes obsessed with noise and quirks that don't generalise. A model that learns too little never captures the real signal in the first place. Both failures have names, both are measurable, and both are fixable once you understand what's actually happening inside the model.

Overfitting and underfitting sit at opposite ends of a spectrum called the bias-variance tradeoff. Understanding this tradeoff is what separates engineers who tune models by intuition from those who tune them systematically. When you know WHY a model overfits, you stop throwing random regularisation at it and start making deliberate, principled decisions about complexity, data size, and training strategy.

By the end of this article you'll be able to plot a learning curve and diagnose whether your model is overfitting or underfitting just by looking at it. You'll have working Python code that deliberately creates both problems and then fixes them — so the concepts stick in your hands, not just your head. And you'll walk away knowing exactly which levers to pull in each scenario.

The Technical Root: Bias vs. Variance

To fix a failing model, you must diagnose its soul. Underfitting is caused by High Bias—the model makes simplistic assumptions about the data. Overfitting is caused by High Variance—the model is overly sensitive to small fluctuations in the training set. In a production environment, we use Learning Curves (plotting Error vs. Training Set Size) to visualize this struggle.

The bias-variance tradeoff is fundamental. A model with high bias pays little attention to data and consistently underfits. A model with high variance pays too much attention to data, including noise. Finding the sweet spot is what model tuning is all about.

In practice, you'll see three patterns
  • Both curves high and close together: high bias (underfitting)
  • Training curve low, validation curve high with a gap: high variance (overfitting)
  • Both curves low and close together: good fit
model_diagnostics.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
import numpy as np
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import PolynomialFeatures, StandardScaler
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import learning_curve

# io.thecodeforge approach: Systematic Diagnostic
def generate_learning_curve(model, X, y):
    train_sizes, train_scores, test_scores = learning_curve(
        model, X, y, cv=5, scoring='neg_mean_squared_error'
    )
    
    # Calculate mean and standard deviation
    train_mean = -np.mean(train_scores, axis=1)
    test_mean = -np.mean(test_scores, axis=1)
    
    return train_sizes, train_mean, test_mean

# Underfitting: Linear model on non-linear data
underfit_model = LinearRegression()

# Overfitting: High-degree polynomial
overfit_model = Pipeline([
    ("poly_features", PolynomialFeatures(degree=15, include_bias=False)),
    ("std_scaler", StandardScaler()),
    ("lin_reg", LinearRegression()),
])
Output
[Learning Curve Data Generated]
The Convergence Trap
In Underfitting, the training and validation curves converge quickly but at a high error rate. Adding more data won't help; you need a more complex model.
Production Insight
A common production mistake is training a linear model on non-linear data and then throwing more data at it.
The curves will converge at high error — clear high bias.
Rule: if both errors are high and close, increase model complexity, not data size.
Key Takeaway
Bias trades off against variance.
Underfitting = high bias (curves high and close).
Overfitting = high variance (curves diverge).

Detecting Overfitting Early with Validation Curves

Validation curves show how model performance changes with a hyperparameter (e.g., polynomial degree, tree depth). They help you find the sweet spot before overfitting takes hold. Plotting validation curves during development is far cheaper than discovering the problem in production.

In production systems, we automate this with CI/CD pipelines that generate validation curves for every candidate model. The pipeline rejects any model where the validation curve shows a gap larger than a configurable threshold (typically 10-15% for classification accuracy).

Here's how to generate a validation curve for polynomial degree using scikit-learn:

validation_curve.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
from sklearn.model_selection import validation_curve
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
import numpy as np

def generate_validation_curve(X, y, param_range):
    model = Pipeline([
        ("poly", PolynomialFeatures(include_bias=False)),
        ("reg", LinearRegression()),
    ])
    train_scores, test_scores = validation_curve(
        model, X, y,
        param_name="poly__degree",
        param_range=param_range,
        cv=5,
        scoring="neg_mean_squared_error"
    )
    train_mean = -np.mean(train_scores, axis=1)
    test_mean = -np.mean(test_scores, axis=1)
    return param_range, train_mean, test_mean
Automated Thresholding
In our CI/CD pipelines, we set a max allowed gap of 0.15 (15%) between training and validation accuracy. Any model exceeding this is automatically rejected and logged for review.
Production Insight
Validation curves are not just for research — they catch overfitting before deployment.
Automate the gap check: if training accuracy - validation accuracy > 0.15, the model likely overfits.
Rule: reject models with high variance early, before they hit production.
Key Takeaway
Validation curves reveal the optimal hyperparameter range.
Stop before the validation error starts rising.
Automate the gap threshold check in your MLOps pipeline.

Fixing Overfitting: Regularisation, Dropout, and Pruning

When you've confirmed overfitting, the toolkit is broad but principled. The most effective levers are:

  • L1/L2 Regularisation: Adds a penalty to large weights. L1 drives weights to zero (feature selection), L2 shrinks them.
  • Dropout: Randomly drops neurons during training — forces the network to learn redundant representations.
  • Early Stopping: Monitors validation loss and stops training when it starts increasing.
  • Reduce Model Complexity: Fewer layers, fewer trees, lower polynomial degree.
  • Increase Training Data: More data reduces the impact of noise.
  • Feature Selection: Remove irrelevant features that introduce noise.

Here's a production-ready Python example that applies all three regularisation techniques:

io/thecodeforge/fix_overfitting.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
import numpy as np
from sklearn.neural_network import MLPRegressor
from tensorflow import keras
from keras import layers, regularizers, callbacks

def build_regularised_model(input_dim):
    model = keras.Sequential([
        layers.Dense(64, activation='relu', 
                     kernel_regularizer=regularizers.l2(0.01)),
        layers.Dropout(0.5),
        layers.Dense(32, activation='relu',
                     kernel_regularizer=regularizers.l2(0.01)),
        layers.Dropout(0.3),
        layers.Dense(1)
    ])
    model.compile(optimizer='adam', loss='mse')
    return model

# Early stopping callback
early_stop = callbacks.EarlyStopping(
    monitor='val_loss', patience=10, restore_best_weights=True
)

model = build_regularised_model(X_train.shape[1])
history = model.fit(X_train, y_train,
                    validation_data=(X_val, y_val),
                    epochs=200, callbacks=[early_stop], verbose=0)
Regularisation as a Constraint
  • L2 regularisation adds a penalty proportional to the square of weights — common in neural nets.
  • L1 regularisation adds penalty proportional to absolute weight — useful for feature selection.
  • Dropout randomly turns off neurons — like training an ensemble of simpler models.
  • Early stopping cuts training when validation stops improving — prevents memorisation.
Production Insight
Dropout should typically be 0.2-0.5 — too high and the model underfits.
L2 regularisation coefficient (λ) is often tuned via cross-validation; 0.01 is a common starting point.
Rule: always combine regularisation with early stopping in production pipelines.
Key Takeaway
Regularisation reduces variance.
Dropout, L2, and early stopping are the three pillars.
Always validate on a holdout set after applying them.

Fixing Underfitting: Complexity, Features, and Training Time

Underfitting means your model is too simple. The solution is to give it more capacity. But adding complexity blindly can tip into overfitting — you need a controlled approach.

Common fixes for underfitting
  • Increase Model Complexity: Use higher-degree polynomials, deeper trees, more layers.
  • Engineer Better Features: Interactions, polynomial features, domain-specific aggregations.
  • Reduce Regularisation: Too much regularisation can itself cause underfitting.
  • Train Longer: Sometimes the model just needs more epochs to converge.
  • Reduce Feature Noise: Remove irrelevant features that dilute signal.

Here's a Python workflow that detects underfitting and applies fixes systematically:

io/thecodeforge/fix_underfitting.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures, StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.model_selection import cross_val_score

def diagnose_and_fix_underfitting(X, y):
    # Baseline linear model
    linear_model = LinearRegression()
    linear_score = -cross_val_score(linear_model, X, y, cv=5, 
                                    scoring='neg_mean_squared_error').mean()
    
    # Try polynomial features (degree 3)
    poly_model = Pipeline([
        ('poly', PolynomialFeatures(degree=3, include_bias=False)),
        ('scaler', StandardScaler()),
        ('reg', LinearRegression())
    ])
    poly_score = -cross_val_score(poly_model, X, y, cv=5, 
                                  scoring='neg_mean_squared_error').mean()
    
    if poly_score < linear_score * 0.8:
        print("Underfitting fixed with polynomial features. Using poly model.")
        return poly_model
    else:
        print("Still underfitting. Try feature engineering or more complex algorithm.")
        return None
The Complexity Trap
Don't jump from a linear model straight to a 10-layer neural net. Increase complexity step by step and validate at each step. Otherwise you'll overshoot and end up overfitting.
Production Insight
In production, I've seen teams add neural networks to solve underfitting when polynomial features would have done the job.
Start with the simplest fix (polynomial features, interaction terms) and escalate.
Rule: underfitting is often a feature engineering problem, not a model problem.
Key Takeaway
Underfitting = not enough capacity.
Fix: more features, more complexity, less regularisation.
Validate with cross-validation after each change.

Production-Grade Monitoring and Alerting for Model Drift

Once your model is in production, it can still drift into overfitting or underfitting as data distributions change. This is called concept drift or covariate shift. You need automated monitoring.

At TheCodeForge, we deploy a model health monitor that checks
  • Prediction distribution: Does the average prediction stay stable?
  • Feature distribution: Are incoming features within the training range?
  • Error rate over time: Is the model's error creeping up?

Here's a Java implementation that logs warnings when drift indicators trigger:

io/thecodeforge/ml/ModelHealthMonitor.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
package io.thecodeforge.ml;

import java.util.logging.Logger;
import java.time.Instant;

public class ModelHealthMonitor {
    private static final Logger logger = Logger.getLogger(ModelHealthMonitor.class.getName());
    private double baselineErrorRate;
    private double driftThreshold = 0.2; // 20% increase
    
    public ModelHealthMonitor(double baselineErrorRate) {
        this.baselineErrorRate = baselineErrorRate;
    }
    
    public void checkCurrentErrorRate(double currentErrorRate) {
        double relativeChange = (currentErrorRate - baselineErrorRate) / baselineErrorRate;
        if (relativeChange > driftThreshold) {
            logger.warning("CRITICAL: Model error rate increased by " + 
                           Math.round(relativeChange * 100) + "%. Overfitting? Underfitting? Check learning curves.");
        } else {
            logger.info("Model error rate within bounds.");
        }
    }
    
    public static void main(String[] args) {
        ModelHealthMonitor monitor = new ModelHealthMonitor(0.10);
        monitor.checkCurrentErrorRate(0.18);
    }
}
Output
CRITICAL: Model error rate increased by 80%. Overfitting? Underfitting? Check learning curves.
Retrain Cadence
Most models need retraining when error increases by more than 15-20% from baseline. Automate retraining triggers, but always validate on a holdout set before deploying the new model.
Production Insight
Drift detection is not optional — it's a production requirement.
I've seen teams lose millions because they didn't monitor for covariate shift.
Rule: deploy a monitoring endpoint that returns current error rates and distribution stats.
Key Takeaway
Models degrade in production.
Monitor error rates and feature distributions.
Alert when drift exceeds 20% and trigger retraining.
● Production incidentPOST-MORTEMseverity: high

When a Fraud Detection Model Starts Calling Every Transaction Fraudulent

Symptom
Fraud alert rate jumped from 2% to 80% in hours. Customer complaints flooded in. The model's training accuracy was 99.9%.
Assumption
Higher training accuracy means better performance. The team assumed a more complex model would catch more fraud.
Root cause
The model was a deep neural network trained on historical data with a massive class imbalance. During retraining, the team added more layers and trained for 500 epochs without early stopping. The model memorised the exact fraud patterns from the training set, including noise, but failed on new transaction patterns.
Fix
Rolled back to the previous simpler model. Then applied L2 regularisation (λ=0.01), added dropout (0.5), and used early stopping with a patience of 10 epochs based on validation AUC. Retained only the top 50 features. The false positive rate dropped back to 2%.
Key lesson
  • Never trust training accuracy alone — especially on imbalanced data.
  • Always monitor validation metrics during retraining.
  • If the validation curve starts diverging from training, stop and check.
  • More complexity isn't better; it's just more dangerous.
Production debug guideUse these symptom-action pairs when a model's performance degrades after deployment4 entries
Symptom · 01
Model accuracy is high on training but low on live traffic
Fix
Plot learning curves: training and validation error vs training set size. A large gap indicates overfitting.
Symptom · 02
Model is consistently wrong on both training and validation data
Fix
Check feature engineering — are you using enough features? Try a more complex model (e.g., RandomForest vs LinearReg). This is underfitting.
Symptom · 03
Validation error oscillates wildly from run to run
Fix
Likely high variance. Reduce model complexity or increase regularisation. Check if training set is too small.
Symptom · 04
Adding more training data barely changes validation accuracy
Fix
If both curves are flat and close together at high error, you have high bias. More data won't help — need better features or algorithm.
★ Quick Debug Cheat Sheet for Overfitting/UnderfittingWhen you suspect your model is overfitting or underfitting, run these commands and checks immediately.
Training accuracy > 95%, validation accuracy < 70%
Immediate action
Stop training and inspect last epochs for divergence
Commands
python -c "import numpy as np; print('Gap:', np.abs(train_acc[-1] - val_acc[-1]))"
plot_learning_curve(model, X_train, y_train, X_val, y_val)
Fix now
Add L2 regularisation (alpha=0.01) or reduce number of layers/features
Training and validation accuracy both below 60%+
Immediate action
Check if model is too simple (e.g., linear on non-linear data)
Commands
print('Training accuracy:', train_acc[-1], 'Validation:', val_acc[-1])
model_complexity_curve(model, X_train, y_train, X_val, y_val, param_range)
Fix now
Increase model complexity: add polynomial features or use a more powerful algorithm (tree-based)
Validation loss starts rising after epoch N+
Immediate action
Stop training immediately and revert to epoch N-1 checkpoint
Commands
# Early stopping callback already triggered keras.callbacks.EarlyStopping(patience=5, restore_best_weights=True)
print('Best epoch:', best_epoch, 'Val loss:', min(val_losses))
Fix now
Retrain with early stopping and reduce learning rate by factor of 0.5
Overfitting vs Underfitting at a Glance
FeatureUnderfitting (High Bias)Overfitting (High Variance)
Training ErrorHighVery Low
Validation ErrorHighHigh
CauseModel is too simple (e.g., Linear for Non-linear)Model is too complex (e.g., Deep Tree for noisy data)
Primary FixIncrease complexity, add featuresRegularization (L1/L2), Pruning, More Data
Learning Curve PatternBoth curves high and close togetherTraining low, validation high with gap
Effect of More DataLittle to no improvementReduces gap, helps generalize

Key takeaways

1
Underfitting = High Bias. The model is too dumb for the data.
2
Overfitting = High Variance. The model is too 'clever' and sees patterns in noise.
3
The validation set is your compass; never optimize based on the test set alone.
4
Regularization (Dropout, L1, L2) is the primary weapon against overfitting.
5
For underfitting, increase model complexity or engineer better features
not more data.
6
Automate model monitoring in production to catch drift early.

Common mistakes to avoid

5 patterns
×

Throwing more data at an underfitting model

Symptom
Validation error remains high despite increasing dataset size. Training and validation errors stay close and high.
Fix
Stop adding data. Instead, increase model complexity (e.g., add polynomial features, use a tree-based model) or engineer better features.
×

Ignoring the validation set until the end of the project

Symptom
Model achieves 99% accuracy on test set but fails in production. Actually the test set was used for hyperparameter tuning, causing information leakage.
Fix
Split data into training, validation, and test sets upfront. Never tune hyperparameters on the test set. Use cross-validation for tuning.
×

Over-tuning hyperparameters on the test set (data leakage)

Symptom
Model performs great on held-out test set but fails on new data. Hidden overfitting to the test set.
Fix
Use a separate validation set for tuning. The test set should only be used once at the end to estimate real-world performance.
×

Using a high-degree polynomial on a small dataset without regularization

Symptom
Training error near zero, but validation error is enormous. The model follows every data point.
Fix
Reduce polynomial degree, add L2 regularization, or use cross-validation to select optimal degree.
×

Assuming more features always improve performance

Symptom
Model complexity increases, validation error rises due to noise features.
Fix
Perform feature selection (e.g., L1 regularization, mutual information) to keep only relevant features.
INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01JUNIOR
How do you distinguish between high bias and high variance using a learn...
Q02JUNIOR
What is the 'Double Descent' phenomenon in modern Deep Learning?
Q03JUNIOR
Explain the role of Early Stopping in preventing overfitting.
Q04JUNIOR
How would you set up a CI/CD pipeline to automatically reject overfitted...
Q01 of 04JUNIOR

How do you distinguish between high bias and high variance using a learning curve?

ANSWER
High Bias is identified when both training and validation errors are high and close to each other, indicating the model hasn't captured the underlying trend. High Variance is identified by a large 'gap' between a low training error and a significantly higher validation error. In a production debugging scenario, you'd plot learning curves for different training set sizes and look for convergence or divergence.
FAQ · 5 QUESTIONS

Frequently Asked Questions

01
Can a model be both overfit and underfit?
02
Why does regularization reduce overfitting?
03
Does increasing the number of features cause overfitting?
04
How do I know if I need more data or a different model?
05
What is the difference between cross-validation and a validation set?
🔥

That's ML Basics. Mark it forged?

3 min read · try the examples if you haven't

Previous
ML Workflow — Data to Deployment
4 / 25 · ML Basics
Next
Train Test Split and Cross Validation