Senior 4 min · April 15, 2026

Visualization — ROC Perfection Hid 3-Week Fraud Failure

False positive rate tripled, complaints spiked 400% because a perfect ROC curve (AUC=1.

N
Naren · Founder
Plain-English first. Then code. Then the interview question.
About
 ● Production Incident 🔎 Debug Guide
Quick Answer
  • Matplotlib is the foundation — every chart in Python builds on its figure/axes model
  • Seaborn wraps Matplotlib with statistical defaults and far less boilerplate code
  • Confusion matrices, ROC curves, and residual plots reveal model flaws numbers hide
  • Use fig.savefig() at 300 DPI — screen-resolution plots break in reports and slides
  • Production rule: never present raw accuracy alone — always pair with precision, recall, or error distribution
  • Biggest mistake: choosing the wrong chart type for the data relationship you want to communicate
  • Always call plt.close(fig) after saving — open figures leak memory and crash long-running pipelines
Plain-English First

Machine learning outputs are numbers. Visualization turns those numbers into stories that humans can act on. A confusion matrix is not decoration — it is the difference between knowing your model is '95% accurate' and knowing it misses 67% of the fraud cases you actually care about. This guide teaches you the specific charts that reveal whether your model is actually working, how to build them properly in Python, and how to format them so they survive the journey from notebook to boardroom slide deck.

Model metrics like accuracy and F1-score tell you the score. Visualizations tell you why. A confusion matrix shows exactly which classes your model confuses. A residual plot reveals systematic prediction errors that RMSE averages away. A learning curve tells you whether collecting more data will help or whether you need a fundamentally different model. These are not decorative — they are diagnostic tools.

Matplotlib provides the rendering engine. Seaborn provides statistical awareness on top of it. You need both: Matplotlib for full control over publication-quality figures, and Seaborn for rapid exploratory analysis with sensible defaults. They are not competitors — Seaborn is literally built on Matplotlib, and every Seaborn plot returns a Matplotlib axes object you can customize further.

The common mistake is treating visualization as an afterthought — something you do after the model is trained and shipped. In production, a well-designed diagnostic dashboard catches model degradation weeks before aggregate metrics move. The charts you build during evaluation become your monitoring tools after deployment. Skip them, and you are flying blind.

Matplotlib Fundamentals: Figure and Axes

Every Matplotlib chart lives inside a Figure that contains one or more Axes. The Figure is the canvas — it controls overall dimensions, background color, and file output. The Axes is the actual plot area with its own x-axis, y-axis, title, and data layers.

Understanding this hierarchy prevents 90% of the layout confusion beginners hit. When you call plt.plot(), Matplotlib implicitly creates a Figure and Axes behind the scenes. This works for quick exploration but falls apart the moment you need multiple subplots, consistent sizing, or saved files. The object-oriented interface — fig, ax = plt.subplots() — gives you explicit handles to both objects and should be your default for anything beyond throwaway exploration.

io/thecodeforge/viz/matplotlib_basics.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
import matplotlib.pyplot as plt
import numpy as np


# --- Method 1: pyplot interface (quick exploration only) ---
# Implicitly creates a Figure and Axes. Fine for throwaway cells.
plt.plot([1, 2, 3], [4, 5, 6])
plt.title('Simple Line Plot')
plt.xlabel('X Axis')
plt.ylabel('Y Axis')
plt.show()


# --- Method 2: object-oriented interface (production standard) ---
# Explicitly creates Figure and Axes. Use this for everything you save.
fig, ax = plt.subplots(figsize=(10, 6))

ax.plot([1, 2, 3], [4, 5, 6], marker='o', linewidth=2, label='Series A')
ax.set_title('Production-Ready Line Plot', fontsize=14, fontweight='bold')
ax.set_xlabel('X Axis')
ax.set_ylabel('Y Axis')
ax.legend()
ax.grid(True, alpha=0.3)

fig.tight_layout()
fig.savefig('plot.png', dpi=300, bbox_inches='tight')
plt.close(fig)  # Free memory — critical in loops and pipelines


# --- Multi-panel figure: the pattern you will use most ---
fig, axes = plt.subplots(2, 2, figsize=(14, 10))
np.random.seed(42)
data = np.random.randn(200)

# Panel 1: Distribution
axes[0, 0].hist(data, bins=30, edgecolor='black', alpha=0.7, color='steelblue')
axes[0, 0].set_title('Distribution')
axes[0, 0].set_xlabel('Value')
axes[0, 0].set_ylabel('Frequency')

# Panel 2: Sequential scatter
axes[0, 1].scatter(np.arange(len(data)), data, alpha=0.4, s=12, color='coral')
axes[0, 1].axhline(y=0, color='black', linestyle='--', alpha=0.3)
axes[0, 1].set_title('Sequential Scatter')
axes[0, 1].set_xlabel('Index')

# Panel 3: Box plot
axes[1, 0].boxplot(data, vert=True, patch_artist=True,
                    boxprops=dict(facecolor='lightblue'))
axes[1, 0].set_title('Box Plot')

# Panel 4: Cumulative sum
axes[1, 1].plot(np.cumsum(data), color='seagreen', linewidth=1.5)
axes[1, 1].set_title('Cumulative Sum')
axes[1, 1].set_xlabel('Index')

fig.suptitle('Exploratory Data Summary', fontsize=16, fontweight='bold')
fig.tight_layout()
fig.savefig('dashboard.png', dpi=300, bbox_inches='tight')
plt.close(fig)
Figure vs Axes
  • Figure = the full canvas. Controls overall size (figsize), background, DPI, and file saving.
  • Axes = one plot area. Has its own x-axis, y-axis, title, legend, and data layers. A Figure can hold many Axes.
  • fig, ax = plt.subplots() creates one Figure with one Axes. This is your starting point for every chart.
  • fig, axes = plt.subplots(2, 3) creates a 2×3 grid. Access individual plots with axes[row, col].
  • Always use the object-oriented interface (ax.plot, ax.set_title) for anything you save or present. The pyplot interface (plt.plot, plt.title) operates on an implicit 'current axes' that causes bugs in multi-panel figures.
Production Insight
plt.show() destroys the figure object in most Matplotlib backends. If you call plt.show() then fig.savefig(), you save a blank file with no error message.
Always save before showing: fig.savefig() first, plt.show() second — or skip plt.show() entirely in automated pipelines.
Rule: in production scripts, scheduled jobs, and CI/CD pipelines, never call plt.show(). Use fig.savefig() and plt.close(fig) to render and release memory. Open figures accumulate and will eventually crash long-running processes.
Key Takeaway
Figure is the canvas, Axes is the plot. Always use the object-oriented interface.
fig, ax = plt.subplots() is your starting point for every chart — no exceptions for production code.
Save with fig.savefig('name.png', dpi=300, bbox_inches='tight') and always call plt.close(fig) afterward.

Seaborn for Statistical Visualization

Seaborn builds on Matplotlib with high-level functions that understand DataFrames natively. Pass column names directly, and Seaborn handles grouping, aggregation, statistical estimation, and legend creation automatically. Where Matplotlib requires 20 lines for a grouped bar chart with confidence intervals, Seaborn does it in 3.

The key insight is that Seaborn is not a replacement for Matplotlib — it is an accelerator for the statistical plotting patterns you use most often. Every Seaborn function returns a Matplotlib axes object, so you can always drop down to Matplotlib for fine-grained customization after Seaborn does the heavy lifting.

io/thecodeforge/viz/seaborn_basics.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np


# Set Seaborn theme once at the top of your notebook or script
sns.set_theme(style='whitegrid', palette='muted', font_scale=1.1)

# Generate example data
np.random.seed(42)
df = pd.DataFrame({
    'feature_a': np.random.randn(200),
    'feature_b': np.random.randn(200) * 2 + 1,
    'category': np.random.choice(['Class A', 'Class B', 'Class C'], 200),
    'target': np.random.choice([0, 1], 200)
})


# --- Distribution plots: understand feature spread ---
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

sns.histplot(data=df, x='feature_a', hue='category', kde=True, ax=axes[0])
axes[0].set_title('Feature A Distribution by Category')

sns.boxplot(data=df, x='category', y='feature_b', ax=axes[1])
axes[1].set_title('Feature B Spread by Category')

fig.tight_layout()
fig.savefig('distributions.png', dpi=300, bbox_inches='tight')
plt.close(fig)


# --- Correlation heatmap: find feature relationships ---
fig, ax = plt.subplots(figsize=(8, 6))
numeric_df = df.select_dtypes(include=[np.number])
corr_matrix = numeric_df.corr()

sns.heatmap(
    corr_matrix, annot=True, fmt='.2f', cmap='RdBu_r',
    center=0, vmin=-1, vmax=1, ax=ax,
    linewidths=0.5, square=True
)
ax.set_title('Feature Correlation Matrix')

fig.tight_layout()
fig.savefig('correlation.png', dpi=300, bbox_inches='tight')
plt.close(fig)


# --- Pair plot: explore all pairwise relationships at once ---
# Useful for small feature sets (<10 features). Slow for large ones.
pair = sns.pairplot(
    df, hue='category', diag_kind='kde',
    plot_kws={'alpha': 0.4, 's': 15}
)
pair.figure.suptitle('Pairwise Feature Relationships', y=1.02)
pair.savefig('pairplot.png', dpi=150, bbox_inches='tight')
plt.close('all')


# --- Seaborn + Matplotlib customization: the practical pattern ---
fig, ax = plt.subplots(figsize=(10, 6))
sns.violinplot(data=df, x='category', y='feature_a', ax=ax, inner='quartile')

# Drop down to Matplotlib for fine-tuning
ax.set_title('Feature A Violin Plot', fontsize=14, fontweight='bold')
ax.set_xlabel('Category', fontsize=12)
ax.set_ylabel('Feature A Value', fontsize=12)
ax.axhline(y=0, color='red', linestyle='--', alpha=0.5, label='Zero baseline')
ax.legend()

fig.tight_layout()
fig.savefig('violin_customized.png', dpi=300, bbox_inches='tight')
plt.close(fig)
When to Use Seaborn vs Matplotlib
  • Seaborn excels at: grouped plots, statistical overlays (confidence intervals, KDE curves), automatic legend handling, DataFrame-native column references.
  • Matplotlib excels at: precise axis control, custom annotations and arrows, multi-panel layouts with unequal sizing, publication-quality formatting.
  • You can always access the underlying Matplotlib axes from any Seaborn plot: ax = sns.histplot(...); ax.set_xlim(0, 100).
  • Rule of thumb: prototype in Seaborn, polish in Matplotlib. Start fast, refine as needed.
Production Insight
sns.set_theme() affects all subsequent plots globally in the current Python process. In shared notebooks or multi-team environments, this can silently change the appearance of other people's charts.
Call sns.set_theme() once at the very top of your notebook or script, and document the style choice.
For production pipelines that generate multiple report types, use matplotlib.rcParams context managers to scope style changes: with plt.rc_context({'font.size': 12}): ...
Key Takeaway
Seaborn wraps Matplotlib with DataFrame awareness and statistical defaults — use it for exploration.
sns.histplot, sns.boxplot, sns.heatmap, and sns.pairplot cover 80% of ML visualization needs.
Every Seaborn plot returns a Matplotlib axes object — drop down to Matplotlib for final polish.

Confusion Matrix: Where Your Model Gets Confused

The confusion matrix is the single most important diagnostic chart for classification models. It shows exactly which classes your model confuses with which — information that a scalar metric like accuracy or F1 compresses into a single number and loses.

A model with 95% accuracy might be completely failing on one class. In a fraud detection system where only 2% of transactions are fraudulent, a model that predicts 'not fraud' for every single input achieves 98% accuracy while catching zero fraud. Only the confusion matrix reveals this. Always plot it. Always.

io/thecodeforge/viz/confusion_matrix.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
from sklearn.metrics import confusion_matrix


def plot_confusion_matrix(
    y_true, y_pred, labels=None, title='Confusion Matrix'
):
    """Production-grade confusion matrix with both counts and percentages.

    Displays two panels side by side:
    - Left: raw counts (useful for understanding volume)
    - Right: row-normalized percentages (useful for understanding recall per class)

    Args:
        y_true: ground truth labels
        y_pred: predicted labels
        labels: list of class names for axis labels
        title: figure title

    Returns:
        Matplotlib Figure object (caller saves and closes).
    """
    cm = confusion_matrix(y_true, y_pred)
    cm_percent = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis] * 100

    fig, axes = plt.subplots(1, 2, figsize=(14, 5))

    # Left panel: raw counts
    sns.heatmap(
        cm, annot=True, fmt='d', cmap='Blues',
        xticklabels=labels, yticklabels=labels,
        ax=axes[0], linewidths=0.5
    )
    axes[0].set_xlabel('Predicted')
    axes[0].set_ylabel('Actual')
    axes[0].set_title(f'{title} (Counts)')

    # Right panel: row-normalized percentages (each row sums to 100%)
    sns.heatmap(
        cm_percent, annot=True, fmt='.1f', cmap='Blues',
        xticklabels=labels, yticklabels=labels,
        ax=axes[1], linewidths=0.5, vmin=0, vmax=100
    )
    axes[1].set_xlabel('Predicted')
    axes[1].set_ylabel('Actual')
    axes[1].set_title(f'{title} (Row %, i.e., Recall)')

    fig.tight_layout()
    return fig


# Example usage
np.random.seed(42)
y_true = np.array([0, 0, 0, 1, 1, 1, 2, 2, 2] * 20)
y_pred = np.array([0, 0, 1, 1, 1, 0, 2, 2, 2] * 20)
labels = ['Cat', 'Dog', 'Bird']

fig = plot_confusion_matrix(y_true, y_pred, labels=labels, title='Animal Classifier')
fig.savefig('confusion_matrix.png', dpi=300, bbox_inches='tight')
plt.close(fig)
Accuracy Hides Class-Level Failures
A model predicting 990 correct out of 1000 samples has 99% accuracy. But if those 10 errors are all in the fraud class (which has only 15 total samples), the model missed 67% of all fraud cases. The confusion matrix shows this immediately — the fraud row will have a large off-diagonal value. A single accuracy number never reveals this. On imbalanced datasets, accuracy is almost meaningless. The confusion matrix is not.
Production Insight
Always display both raw counts and row-normalized percentages in your confusion matrix.
Raw counts mislead on imbalanced datasets because 95% of predictions naturally land in the majority class, making the diagonal look strong even when minority class recall is terrible.
Row percentages show recall per class — how much of each true class the model actually captures.
Column percentages show precision per class — of everything predicted as class X, how much is correct.
Rule: for production monitoring dashboards, plot the row-normalized version by default and provide the raw count version as a drill-down.
Key Takeaway
The confusion matrix is the most important classification diagnostic — plot it for every model, every time.
Always show both counts and row-normalized percentages. Counts tell you volume; percentages tell you recall.
Off-diagonal patterns reveal exactly which classes your model cannot distinguish and guide targeted improvements.
Confusion Matrix Interpretation
IfDiagonal cells are strong, off-diagonal cells are near zero
UseModel separates classes well. Verify that performance is consistent across all classes — a strong overall diagonal can mask one weak class.
IfOne row has high off-diagonal values (model confuses class A with class B specifically)
UseClasses A and B share similar features. Consider feature engineering to surface distinguishing characteristics, collecting more training data for the confused class, or merging the classes if they are semantically close.
IfAll predictions cluster into one class (entire column is dark, rest of matrix is blank)
UseModel is degenerate — predicting the majority class for every input. Check class balance, lower the decision threshold, or apply class weights during training.
IfMatrix looks good on test data but deteriorates on production data
UseData distribution shift. Plot prediction probability distributions over time to detect when the drift started. Compare feature distributions between training data and recent production data.

ROC and Precision-Recall Curves

ROC curves plot the true positive rate against the false positive rate across all possible classification thresholds. They answer the question: as I lower the threshold to catch more positives, how many false positives do I accept?

Precision-Recall curves are more informative for imbalanced datasets because they focus exclusively on the positive class. On a dataset where only 1% of samples are positive, ROC can show an impressive AUC of 0.95 while the model's precision at useful recall levels is actually terrible. Precision-Recall curves expose this directly.

Both curves let you visualize the tradeoff space and choose the optimal threshold for your specific business requirements — something a single F1 score cannot do.

io/thecodeforge/viz/roc_pr_curves.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
import matplotlib.pyplot as plt
import numpy as np
from sklearn.metrics import (
    roc_curve, auc, precision_recall_curve, average_precision_score
)


def plot_roc_and_pr(y_true, y_proba, title='Model Evaluation'):
    """Plot ROC and Precision-Recall curves side by side.

    Both curves visualize model performance across all possible
    classification thresholds. Together they give a complete picture
    that no single metric can provide.

    Args:
        y_true: ground truth binary labels (0 or 1)
        y_proba: predicted probabilities for the positive class
        title: figure title prefix

    Returns:
        Matplotlib Figure object.
    """
    # Compute ROC curve
    fpr, tpr, roc_thresholds = roc_curve(y_true, y_proba)
    roc_auc = auc(fpr, tpr)

    # Compute Precision-Recall curve
    precision, recall, pr_thresholds = precision_recall_curve(y_true, y_proba)
    avg_precision = average_precision_score(y_true, y_proba)

    fig, axes = plt.subplots(1, 2, figsize=(14, 6))

    # --- ROC Curve ---
    axes[0].plot(fpr, tpr, linewidth=2, label=f'Model (AUC = {roc_auc:.3f})')
    axes[0].plot([0, 1], [0, 1], 'k--', alpha=0.5, label='Random (AUC = 0.5)')
    axes[0].fill_between(fpr, tpr, alpha=0.1)
    axes[0].set_xlabel('False Positive Rate')
    axes[0].set_ylabel('True Positive Rate (Recall)')
    axes[0].set_title(f'{title} — ROC Curve')
    axes[0].legend(loc='lower right')
    axes[0].grid(True, alpha=0.3)
    axes[0].set_xlim([-0.02, 1.02])
    axes[0].set_ylim([-0.02, 1.02])

    # --- Precision-Recall Curve ---
    axes[1].plot(
        recall, precision, linewidth=2, color='orange',
        label=f'Model (AP = {avg_precision:.3f})'
    )
    baseline = y_true.sum() / len(y_true)
    axes[1].axhline(
        y=baseline, color='k', linestyle='--', alpha=0.5,
        label=f'Random baseline = {baseline:.3f}'
    )
    axes[1].fill_between(recall, precision, alpha=0.1, color='orange')
    axes[1].set_xlabel('Recall')
    axes[1].set_ylabel('Precision')
    axes[1].set_title(f'{title} — Precision-Recall Curve')
    axes[1].legend(loc='lower left')
    axes[1].grid(True, alpha=0.3)
    axes[1].set_xlim([-0.02, 1.02])
    axes[1].set_ylim([0, 1.05])

    fig.tight_layout()
    return fig


# Example: imbalanced fraud detection scenario
np.random.seed(42)
y_true = np.random.choice([0, 1], size=500, p=[0.95, 0.05])
y_proba = np.clip(y_true * 0.6 + np.random.randn(500) * 0.2, 0, 1)

fig = plot_roc_and_pr(y_true, y_proba, title='Fraud Detection')
fig.savefig('roc_pr_curves.png', dpi=300, bbox_inches='tight')
plt.close(fig)
ROC vs Precision-Recall: When to Use Which
Use ROC when classes are roughly balanced — it gives a clean summary of the true-positive vs false-positive tradeoff across all thresholds. Use Precision-Recall when the positive class is rare (fraud detection, disease screening, anomaly detection, conversion prediction). On highly imbalanced data, ROC can show AUC > 0.95 because the massive true negative count inflates the true positive rate calculation. Meanwhile, Precision-Recall reveals that the model's precision collapses to 10% at any useful recall level. Always plot both. A model can look excellent on one curve and mediocre on the other. If you only show one, you are hiding information.
Production Insight
AUC and Average Precision summarize performance across all thresholds. In production, you deploy at one specific threshold.
The curve shape tells you where your operating point lives and what tradeoffs it forces. Two models with identical AUC can have very different characteristics at the threshold that matters for your business.
Rule: overlay the actual deployed threshold on the curve as a dot or vertical line. This makes it immediately clear how much performance room exists if you adjust the threshold — and what the cost of that adjustment is in the other metric.
Key Takeaway
ROC curves work well for balanced classes. Precision-Recall curves are essential for imbalanced ones.
Always plot both side by side — a model can look good on one and poor on the other.
The curve shape reveals operating characteristics that a single AUC or AP number compresses away.

Residual Plots for Regression Models

Residual plots reveal systematic errors in regression models that aggregate metrics like RMSE and MAE completely hide. RMSE tells you the average error magnitude. Residual plots tell you whether those errors are random (acceptable) or structured (a sign your model is missing something).

If residuals show a pattern — a curve, a fan shape, clusters — your model is not capturing a relationship in the data. No amount of hyperparameter tuning will fix this. You need different features, a different transformation, or a different model family. The residual plot is the chart that tells you which.

io/thecodeforge/viz/residual_plots.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
from scipy import stats
from sklearn.linear_model import LinearRegression
from sklearn.datasets import make_regression


def plot_regression_diagnostics(y_true, y_pred, title='Regression Diagnostics'):
    """Four-panel diagnostic plot for regression models.

    Panels:
    1. Predicted vs Actual — overall fit quality
    2. Residuals vs Predicted — detect non-linearity, heteroscedasticity
    3. Residual Distribution — check normality assumption
    4. Q-Q Plot — sensitive normality check at distribution tails

    Args:
        y_true: actual target values (numpy array)
        y_pred: predicted target values (numpy array)
        title: overall figure title

    Returns:
        Matplotlib Figure object.
    """
    residuals = y_true - y_pred

    fig, axes = plt.subplots(2, 2, figsize=(14, 10))

    # Panel 1: Predicted vs Actual
    axes[0, 0].scatter(y_true, y_pred, alpha=0.4, s=15, color='steelblue')
    min_val = min(y_true.min(), y_pred.min())
    max_val = max(y_true.max(), y_pred.max())
    axes[0, 0].plot(
        [min_val, max_val], [min_val, max_val],
        'r--', linewidth=2, label='Perfect prediction'
    )
    axes[0, 0].set_xlabel('Actual')
    axes[0, 0].set_ylabel('Predicted')
    axes[0, 0].set_title('Predicted vs Actual')
    axes[0, 0].legend()

    # Panel 2: Residuals vs Predicted (the most important panel)
    axes[0, 1].scatter(y_pred, residuals, alpha=0.4, s=15, color='coral')
    axes[0, 1].axhline(y=0, color='r', linestyle='--', linewidth=2)
    axes[0, 1].set_xlabel('Predicted Value')
    axes[0, 1].set_ylabel('Residual (Actual - Predicted)')
    axes[0, 1].set_title('Residuals vs Predicted')

    # Panel 3: Residual Distribution
    sns.histplot(residuals, kde=True, ax=axes[1, 0], bins=30, color='steelblue')
    axes[1, 0].axvline(x=0, color='r', linestyle='--')
    axes[1, 0].set_xlabel('Residual')
    axes[1, 0].set_title(f'Residual Distribution (mean={residuals.mean():.2f})')

    # Panel 4: Q-Q plot (normality check — deviations at tails matter most)
    stats.probplot(residuals, dist='norm', plot=axes[1, 1])
    axes[1, 1].set_title('Q-Q Plot (Normality Check)')

    fig.suptitle(title, fontsize=14, fontweight='bold')
    fig.tight_layout()
    return fig


# Example
X, y = make_regression(
    n_samples=300, n_features=3, noise=15, random_state=42
)
model = LinearRegression().fit(X, y)
y_pred = model.predict(X)

fig = plot_regression_diagnostics(y, y_pred, title='Linear Regression Diagnostics')
fig.savefig('residual_plots.png', dpi=300, bbox_inches='tight')
plt.close(fig)
What Good Residuals Look Like
  • Residuals vs Predicted: random scatter centered on zero. No fan shape, no curve, no clusters.
  • Residual Distribution: approximately normal, centered at zero. Skew or heavy tails indicate the model handles some value ranges worse than others.
  • Q-Q Plot: points follow the diagonal line closely. Deviations at the tails mean the model produces more extreme errors than a normal distribution predicts.
  • If you see any pattern in the residual plot, your model is missing a signal. Add features, apply transformations, or switch model families.
Production Insight
A fan-shaped residual plot — where residuals spread wider as predicted values increase — means heteroscedasticity. The model's error is not constant: it predicts well for small values and poorly for large ones, or vice versa.
This violates a core assumption of ordinary least squares and inflates confidence intervals on predictions.
Rule: apply a log transform to the target variable (np.log1p) or use weighted least squares to stabilize error variance. If the fan is severe, tree-based models handle heteroscedasticity naturally without transformation.
Key Takeaway
Residual plots reveal errors that RMSE hides — always generate them for regression models.
Random scatter around zero means the model is well-specified. Any pattern means missing signal.
Four diagnostic panels: predicted vs actual, residuals vs predicted, residual histogram, Q-Q plot.
Residual Pattern Diagnosis
IfResiduals show a U-shape or curve against predicted values
UseModel is missing a non-linear relationship. Add polynomial features (degree 2 or 3), interaction terms between features, or switch to a non-linear model like gradient boosted trees.
IfResiduals fan out — spread increases with predicted value
UseHeteroscedasticity. Log-transform the target variable with np.log1p(y), use weighted least squares, or switch to a model family that handles non-constant variance naturally (e.g., tree-based models).
IfResiduals are not centered at zero — consistent bias in one direction
UseModel has systematic bias. Check for a missing intercept term, incorrect feature encoding, or a target variable that needs transformation.
IfResiduals show a clear trend when plotted against time or row index
UseAutocorrelation — your data has temporal structure that the model ignores. Add lag features, rolling statistics, or switch to a time-series model (ARIMA, Prophet, temporal neural networks).

Feature Importance Visualization

Feature importance plots show which inputs drive your model's predictions. For tree-based models, importance is built in via impurity reduction. For any model, permutation importance provides a model-agnostic alternative by measuring how much accuracy drops when each feature's values are randomly shuffled.

Visualization makes these rankings immediately interpretable to non-technical stakeholders who need to understand why the model makes the decisions it does — not just what it predicts. A horizontal bar chart sorted by importance is the universal format that everyone from data scientists to product managers can read.

io/thecodeforge/viz/feature_importance.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
import matplotlib.pyplot as plt
import numpy as np
from sklearn.ensemble import RandomForestClassifier
from sklearn.inspection import permutation_importance
from sklearn.datasets import make_classification


def plot_feature_importance(
    model, feature_names, X_test, y_test, top_n=15
):
    """Plot built-in and permutation importance side by side.

    Built-in importance (Gini) is fast but biased toward high-cardinality
    features. Permutation importance is slower but model-agnostic and
    unbiased. Showing both highlights discrepancies worth investigating.

    Args:
        model: fitted sklearn estimator
        feature_names: list of feature name strings
        X_test: test features for permutation importance
        y_test: test labels for permutation importance
        top_n: number of top features to display

    Returns:
        Matplotlib Figure object.
    """
    fig, axes = plt.subplots(1, 2, figsize=(14, max(6, top_n * 0.4)))

    # Left panel: built-in importance (tree-based models only)
    if hasattr(model, 'feature_importances_'):
        importances = model.feature_importances_
        indices = np.argsort(importances)[::-1][:top_n]

        axes[0].barh(
            [feature_names[i] for i in indices][::-1],
            importances[indices][::-1],
            color='steelblue', edgecolor='black', alpha=0.8
        )
        axes[0].set_xlabel('Gini Importance (Impurity Reduction)')
        axes[0].set_title('Built-in Feature Importance')
    else:
        axes[0].text(
            0.5, 0.5, 'Not available\n(model has no feature_importances_)',
            ha='center', va='center', fontsize=12, transform=axes[0].transAxes
        )
        axes[0].set_title('Built-in Feature Importance (N/A)')

    # Right panel: permutation importance (model-agnostic)
    perm_result = permutation_importance(
        model, X_test, y_test, n_repeats=10, random_state=42, n_jobs=-1
    )
    perm_mean = perm_result.importances_mean
    perm_std = perm_result.importances_std
    indices = np.argsort(perm_mean)[::-1][:top_n]

    axes[1].barh(
        [feature_names[i] for i in indices][::-1],
        perm_mean[indices][::-1],
        xerr=perm_std[indices][::-1],
        color='coral', edgecolor='black', alpha=0.8
    )
    axes[1].set_xlabel('Mean Accuracy Decrease When Shuffled')
    axes[1].set_title('Permutation Importance')

    fig.suptitle(
        'Feature Importance Comparison', fontsize=14, fontweight='bold'
    )
    fig.tight_layout()
    return fig


# Example
X, y = make_classification(
    n_samples=1000, n_features=10, n_informative=5, random_state=42
)
feature_names = [f'feature_{i}' for i in range(10)]
model = RandomForestClassifier(
    n_estimators=100, random_state=42
).fit(X, y)

fig = plot_feature_importance(model, feature_names, X, y)
fig.savefig('feature_importance.png', dpi=300, bbox_inches='tight')
plt.close(fig)
Built-in Importance Can Mislead
Gini importance (feature_importances_) in tree-based models is biased toward high-cardinality features. A feature with 1,000 unique values — like a raw ID column — will appear more important than a genuinely predictive binary feature, because the tree has more possible split points to choose from. This is a measurement artifact, not real predictive value. Always validate with permutation importance, which directly measures the accuracy cost of losing each feature and is unbiased by cardinality.
Production Insight
Feature importance rankings can shift dramatically between model versions — not because the data changed, but because tree-based models have inherent randomness in split selection.
Track importance rankings over time across deployments. A feature that drops from top 3 to zero importance between versions may indicate data pipeline corruption (the column went null, changed format, or stopped updating).
Rule: store and compare feature importance snapshots as part of your model registry metadata. Unexpected ranking changes should trigger investigation before deployment, not after.
Key Takeaway
Built-in importance is fast but biased toward high-cardinality features. Permutation importance is slower but reliable and model-agnostic.
Plot both side by side — significant disagreement between them signals a cardinality bias or data leakage problem.
Track importance rankings across model versions to detect data pipeline degradation early.

Learning Curves: Diagnosing Bias and Variance

Learning curves plot model performance against training set size. They answer the most fundamental question in model improvement: should I get more data, or should I change the model?

The gap between the training score and validation score at each data size reveals whether your model suffers from high bias (underfitting — both curves are low) or high variance (overfitting — training is high, validation is low). This is not an academic distinction. It directly determines whether spending three weeks collecting more data will help or be completely wasted effort.

io/thecodeforge/viz/learning_curves.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
import matplotlib.pyplot as plt
import numpy as np
from sklearn.model_selection import learning_curve
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification


def plot_learning_curve(
    estimator, X, y, title='Learning Curve', cv=5, scoring='accuracy'
):
    """Plot learning curve showing the bias-variance tradeoff.

    The gap between training and validation curves tells you exactly
    what to fix: more data, more regularization, or a different model.

    Args:
        estimator: unfitted sklearn estimator (will be cloned internally)
        X: feature matrix
        y: target vector
        title: plot title
        cv: number of cross-validation folds
        scoring: sklearn scoring metric name

    Returns:
        Matplotlib Figure object.
    """
    train_sizes, train_scores, val_scores = learning_curve(
        estimator, X, y,
        cv=cv,
        n_jobs=-1,
        train_sizes=np.linspace(0.1, 1.0, 10),
        scoring=scoring
    )

    train_mean = train_scores.mean(axis=1)
    train_std = train_scores.std(axis=1)
    val_mean = val_scores.mean(axis=1)
    val_std = val_scores.std(axis=1)

    fig, ax = plt.subplots(figsize=(10, 6))

    # Confidence bands
    ax.fill_between(
        train_sizes, train_mean - train_std, train_mean + train_std,
        alpha=0.1, color='blue'
    )
    ax.fill_between(
        train_sizes, val_mean - val_std, val_mean + val_std,
        alpha=0.1, color='orange'
    )

    # Mean curves
    ax.plot(
        train_sizes, train_mean, 'o-', color='blue',
        linewidth=2, label='Training Score'
    )
    ax.plot(
        train_sizes, val_mean, 'o-', color='orange',
        linewidth=2, label='Validation Score'
    )

    ax.set_xlabel('Training Set Size')
    ax.set_ylabel(scoring.capitalize())
    ax.set_title(title, fontsize=14, fontweight='bold')
    ax.legend(loc='lower right')
    ax.grid(True, alpha=0.3)

    # Annotate the final gap between curves
    final_gap = train_mean[-1] - val_mean[-1]
    ax.annotate(
        f'Gap: {final_gap:.3f}',
        xy=(train_sizes[-1], (train_mean[-1] + val_mean[-1]) / 2),
        fontsize=11, fontweight='bold', color='red',
        ha='right'
    )

    fig.tight_layout()
    return fig


# Example
X, y = make_classification(
    n_samples=1000, n_features=20, n_informative=10, random_state=42
)
model = RandomForestClassifier(n_estimators=100, random_state=42)

fig = plot_learning_curve(model, X, y, title='Random Forest Learning Curve')
fig.savefig('learning_curve.png', dpi=300, bbox_inches='tight')
plt.close(fig)
Reading Learning Curves
  • Large gap (training high, validation low) = high variance (overfitting). Fix with: more data, stronger regularization, fewer features, simpler model.
  • Both curves low and converging together = high bias (underfitting). Fix with: more features, more complex model, less regularization. More data will NOT help here.
  • Both curves high and converging together = good fit. Model is well-calibrated for this data volume.
  • Validation curve still rising at the right edge = more data will help. Collecting additional training examples is a productive investment.
Production Insight
Learning curves computed on a tiny subsample can be misleading about convergence behavior. If your full dataset has 1M rows but you compute the learning curve on a 5K sample, the curve might show convergence that disappears at full scale.
Always compute learning curves on a representative sample large enough to show the real convergence pattern — at least 10% of the full dataset or 10K samples, whichever is larger.
Rule: if the validation curve has not plateaued at the maximum training size, your model will measurably benefit from more training data. If it has plateaued, spending three weeks collecting more data is wasted effort — change the model instead.
Key Takeaway
Learning curves diagnose bias vs variance — the fundamental decision point for model improvement.
Large gap between curves = overfitting (needs regularization or more data). Both curves low = underfitting (needs more complexity).
The curve shape tells you whether to invest in more data or in a different model architecture.

Saving and Formatting for Production

Charts in notebooks are for exploration. Charts in reports, dashboards, presentations, and papers require consistent formatting, appropriate resolution, and accessible color choices. The gap between a notebook plot and a production-ready figure is not aesthetics — it is legibility, accessibility, and reproducibility.

A chart that looks fine on your 4K monitor becomes an unreadable blur when projected onto a conference room screen or embedded in a PDF at print resolution. This section covers the production formatting pipeline that ensures your figures survive every medium they encounter.

io/thecodeforge/viz/production_formatting.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
import matplotlib.pyplot as plt
import matplotlib as mpl
import numpy as np


def apply_production_style():
    """Apply a consistent, publication-quality style globally.

    Call this once at the top of your notebook or script.
    Overrides Matplotlib defaults with production-safe values.
    """
    mpl.rcParams.update({
        # Typography
        'font.size': 12,
        'axes.titlesize': 14,
        'axes.labelsize': 12,
        'xtick.labelsize': 10,
        'ytick.labelsize': 10,
        'legend.fontsize': 10,
        'figure.titlesize': 16,

        # Figure defaults
        'figure.figsize': (10, 6),
        'figure.dpi': 100,           # Screen display DPI
        'savefig.dpi': 300,          # Saved file DPI
        'savefig.bbox': 'tight',     # Prevent label clipping
        'savefig.pad_inches': 0.1,

        # Grid and spines
        'axes.grid': True,
        'grid.alpha': 0.3,
        'axes.spines.top': False,    # Remove top spine
        'axes.spines.right': False,  # Remove right spine

        # Lines and markers
        'lines.linewidth': 2,
        'lines.markersize': 6,
    })
    print("Production style applied.")


def save_publication(fig, filename, formats=None):
    """Save figure in multiple formats for different use cases.

    Args:
        fig: Matplotlib Figure object
        filename: base filename without extension
        formats: list of format strings. Defaults to PNG + SVG.
    """
    if formats is None:
        formats = ['png', 'svg']

    for fmt in formats:
        filepath = f"{filename}.{fmt}"
        fig.savefig(filepath, dpi=300, bbox_inches='tight', facecolor='white')
        print(f"Saved: {filepath}")


# --- Usage ---
apply_production_style()

fig, ax = plt.subplots()
colors = ['#2563eb', '#16a34a', '#dc2626']  # Blue, green, red — distinguishable
ax.bar(
    ['Model A', 'Model B', 'Model C'],
    [0.89, 0.92, 0.87],
    color=colors, edgecolor='black', alpha=0.9
)
ax.set_ylabel('Accuracy')
ax.set_title('Model Comparison — Q1 2026')
ax.set_ylim(0.80, 0.95)

# Add value labels on bars
for i, v in enumerate([0.89, 0.92, 0.87]):
    ax.text(i, v + 0.003, f'{v:.2f}', ha='center', fontweight='bold')

save_publication(fig, 'model_comparison')
plt.close(fig)
Accessibility in Visualizations
  • Use colorblind-safe palettes: sns.color_palette('colorblind') or the 'muted' palette. Avoid pure red/green combinations as the only differentiator.
  • Add patterns (hatching), markers, or line styles to distinguish series — not just color. ax.bar(..., hatch='//') adds visual texture.
  • Never use the 'jet' or 'rainbow' colormap for continuous data — they introduce perceptual artifacts. Use 'viridis', 'plasma', or 'cividis' instead.
  • Add direct value labels on bars and direct labels on lines instead of relying on a distant legend that requires color matching.
  • Test your charts in grayscale. If they still communicate the message, they are accessible.
Production Insight
PNG at 300 DPI is the standard for reports and presentations. SVG is best for web dashboards and documentation sites because it scales without pixelation and has a smaller file size for simple charts. PDF is best for print publications, LaTeX documents, and archival.
Rule: always save in at least two formats — PNG for immediate sharing and embedding, SVG or PDF for archival and web. The save_publication helper above handles this automatically.
In automated report generation pipelines, save figures to a versioned artifact directory alongside the model they evaluate. Figures and models should share the same version tag.
Key Takeaway
Apply a consistent style with mpl.rcParams at the top of every notebook or script — never rely on Matplotlib defaults.
Save at 300 DPI in PNG for reports, SVG for web, PDF for print. Always save before calling plt.show().
Use colorblind-safe palettes and direct value labels. Never rely on color alone to convey meaning.
● Production incidentPOST-MORTEMseverity: high

Fraud Detection Model Degraded for 3 Weeks Because No One Plotted Predictions Over Time

Symptom
False positive rate tripled. Customer support received a 400% spike in fraud-flag complaints from legitimate merchants. The aggregate weekly accuracy metric — reported as a single number on the team dashboard — still showed 89%, masking the class-level collapse entirely.
Assumption
The team monitored a single aggregate accuracy number in their Grafana dashboard. They assumed stability because the headline metric had not moved more than 1% in either direction. No per-class breakdown existed. No prediction distribution plot existed.
Root cause
A new merchant category code (MCC 7399) was introduced by the payment processor three weeks prior. The model had never seen this code during training. It defaulted to high suspicion scores for all transactions with the unfamiliar code, flagging legitimate purchases as fraud. The aggregate accuracy stayed high because fraud cases represent only 1% of transactions — the model's correct predictions on the other 99% of normal transactions dominated the average, drowning out the class-level failure.
Fix
Added daily confusion matrix heatmaps to the monitoring dashboard, broken down by predicted class. Implemented per-class precision and recall time-series plots with automated PagerDuty alerts when any class metric dropped below a configurable threshold for two consecutive days. Added a weekly prediction probability distribution plot (histogram of model confidence scores) to detect distribution shifts before they manifest as metric degradation.
Key lesson
  • Never monitor a single aggregate metric — break performance down by class, by segment, and over time.
  • Confusion matrices catch class-level failures that accuracy, F1, and even AUC hide when classes are imbalanced.
  • Plot prediction probability distributions weekly to detect distribution shift before downstream metrics degrade.
  • The charts you build during model evaluation should become your production monitoring dashboards — not throwaway notebook cells.
Production debug guideWhen your charts do not reveal what you expect — or when they reveal something you did not anticipate.5 entries
Symptom · 01
All points on a scatter plot overlap into a single blob
Fix
Use alpha transparency (alpha=0.05 to 0.2 depending on density), add jitter with np.random.normal(0, 0.1, size=len(x)), or switch to a 2D density plot with sns.kdeplot(x=x, y=y, fill=True). For very large datasets (>100K points), use datashader or hexbin plots (ax.hexbin) instead of scatter.
Symptom · 02
Bar chart error bars look identical across all groups
Fix
Check if you are plotting standard deviation on a log-scale axis, which compresses the visual differences. Switch to confidence intervals (ci=95 in Seaborn) or standard error of the mean instead of standard deviation. Also verify that your groups actually have different variances — identical error bars might be correct.
Symptom · 03
ROC curve looks perfect (AUC = 1.0) but model performs poorly in production
Fix
This is almost certainly data leakage. Check for target-derived features in your training data, duplicates spanning train and test splits, or temporal leakage where future information bleeds into training rows. A perfect ROC on held-out data means the model has access to the answer, not that it learned the pattern.
Symptom · 04
Residual plot shows a clear curved or fan-shaped pattern instead of random scatter
Fix
A curve means missing non-linearity — add polynomial features, interaction terms, or switch to a non-linear model. A fan shape (residuals widening with predicted value) means heteroscedasticity — log-transform the target variable or use weighted regression.
Symptom · 05
Saved figure looks different from the notebook display — wrong size, cut-off labels, or blank
Fix
Always save before calling plt.show(), which destroys the figure in most backends. Use fig.savefig('name.png', dpi=300, bbox_inches='tight') — the bbox_inches parameter prevents label clipping. Set figsize explicitly in plt.subplots() rather than relying on notebook defaults.
★ ML Visualization Debug Cheat SheetQuick checks when your charts do not tell the right story or something looks suspicious.
Confusion matrix shows all predictions in one class
Immediate action
Check class balance and prediction threshold. The model is likely predicting the majority class for every input.
Commands
print(f'Positive predictions: {y_pred.sum()} / {len(y_pred)}')
print(df['target'].value_counts(normalize=True))
Fix now
Lower the decision threshold (e.g., from 0.5 to 0.3) and re-evaluate. If the problem persists, address class imbalance with SMOTE, class weights, or stratified sampling before retraining.
Learning curve shows training score much higher than validation score+
Immediate action
Model is overfitting — it memorizes training data but cannot generalize.
Commands
from sklearn.model_selection import learning_curve
train_sizes, train_scores, val_scores = learning_curve(model, X, y, cv=5, train_sizes=np.linspace(0.1, 1.0, 10))
Fix now
Increase regularization, reduce model complexity (fewer trees, shallower depth), or collect more training data. If the validation curve is still rising at maximum data size, more data will help.
Feature importance plot shows one dominant feature at 95%++
Immediate action
Check for data leakage — the dominant feature may directly encode or derive from the target variable.
Commands
print(df.corrwith(df['target']).abs().sort_values(ascending=False).head(10))
# Retrain without the suspicious feature and compare performance model_no_leak = model.fit(X.drop(columns=['suspicious_feature']), y)
Fix now
Remove the leaky feature and retrain. If accuracy collapses dramatically (e.g., from 99% to 60%), the original model learned nothing real — it was just memorizing the leaked signal.
Matplotlib vs Seaborn: When to Use Which
AspectMatplotlibSeaborn
Learning CurveSteeper — more code required for statistical plotsGentler — sensible defaults and fewer lines for common charts
Control LevelFull pixel-level control over every elementLess granular control, but faster to prototype
DataFrame AwarenessNone — requires manual extraction of arrays from DataFramesNative — pass column names directly via data= parameter
Statistical PlotsManual — compute confidence intervals, KDE, regressions yourselfBuilt-in — automatic confidence intervals, KDE, regression lines
Multi-Panel LayoutsExcellent — full control over grid spacing and sizingLimited — pairplot and FacetGrid handle specific patterns only
CustomizationUnlimited — every element is individually addressableGood via Matplotlib axes access, but some Seaborn elements resist customization
Production FormattingFull control via rcParams and style sheetsInherits Matplotlib settings, adds its own theme layer via set_theme()
Best ForFinal figures, custom annotations, publication-quality outputExploratory analysis, statistical summaries, rapid prototyping

Key takeaways

1
Every Matplotlib chart starts with fig, ax = plt.subplots()
use the object-oriented interface, always.
2
Seaborn handles DataFrame grouping and statistical estimation automatically
use it for rapid exploration, then drop down to Matplotlib for polish.
3
Confusion matrices reveal class-level failures that accuracy hides
always show both raw counts and row-normalized percentages.
4
ROC curves work for balanced data; Precision-Recall curves are essential for imbalanced data. Plot both.
5
Residual plots diagnose regression model errors that RMSE averages away
check for patterns, not just magnitude.
6
Learning curves tell you whether to invest in more data or a different model
read the gap between training and validation curves.
7
Save at 300 DPI with fig.savefig() and always call plt.close(fig) afterward to prevent memory leaks in pipelines.
8
Use perceptually uniform colormaps (viridis, plasma, cividis)
never use jet or rainbow for continuous data.

Common mistakes to avoid

6 patterns
×

Using plt.plot() instead of the object-oriented ax.plot() interface

Symptom
Multi-panel figures break unpredictably. Titles, labels, and data end up on the wrong subplot. Saving produces blank files after calling plt.show().
Fix
Always use fig, ax = plt.subplots() and call methods on the ax object: ax.plot(), ax.set_title(), ax.set_xlabel(). The pyplot interface (plt.plot()) operates on an implicit 'current axes' that changes unpredictably in multi-panel figures. The object-oriented interface is explicit, debuggable, and production-safe.
×

Not calling plt.close(fig) after saving

Symptom
Memory usage climbs steadily during training loops or report generation scripts. After generating 50–100 figures, the process crashes with a memory error or slows to a crawl.
Fix
Always call plt.close(fig) after fig.savefig(). Each open figure consumes memory. In loops, use plt.close('all') as a safety net. In Jupyter notebooks, this matters less because %matplotlib inline auto-closes, but it is still good practice.
×

Using the 'jet' or 'rainbow' colormap for continuous data

Symptom
Charts create visual artifacts — bright yellow bands appear to be boundaries or features that do not exist in the data. Colorblind viewers cannot distinguish adjacent regions. Print outputs in grayscale are completely unreadable.
Fix
Use perceptually uniform colormaps: 'viridis' (default), 'plasma', 'inferno', or 'cividis' (designed specifically for colorblind accessibility). For diverging data (centered around zero), use 'RdBu_r' or 'coolwarm' with center=0.
×

Presenting only accuracy without a confusion matrix or error distribution

Symptom
Stakeholders approve a model that is 95% accurate on an imbalanced dataset. In production, it misses 60% of the minority class (the class that actually matters for the business). Nobody knew because accuracy masked the class-level failure.
Fix
Always present the confusion matrix alongside any aggregate metric. For regression, always include a residual plot alongside RMSE. The aggregate metric is the headline; the visualization is the evidence. If the evidence contradicts the headline, the headline is wrong.
×

Saving figures at screen resolution (72–96 DPI)

Symptom
Charts look fine in the notebook but become pixelated and blurry when embedded in PDF reports, printed on paper, or projected in meeting rooms. Text labels become unreadable.
Fix
Always save with fig.savefig('name.png', dpi=300, bbox_inches='tight'). 300 DPI is the minimum for print and presentation quality. For posters or large-format prints, use 600 DPI. Set savefig.dpi in rcParams so you never forget.
×

Choosing the wrong chart type for the data relationship

Symptom
Pie chart used for 15 categories — impossible to compare slice sizes. Line chart used for categorical data — implies a continuous trend that does not exist. Scatter plot used for 1 million points — produces an opaque blob.
Fix
Match the chart to the relationship: histogram or KDE for distributions, bar chart for categorical comparisons, scatter plot for bivariate correlation (with alpha for large N), line chart for trends over time or ordered sequences, heatmap for matrices and correlations. When in doubt, ask: what question should this chart answer? Then pick the chart type that answers it most directly.
INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01SENIOR
Why is a Precision-Recall curve more informative than an ROC curve for i...
Q02SENIOR
Your residual plot shows a U-shaped pattern. What does this tell you abo...
Q03JUNIOR
How would you present model evaluation results to a non-technical stakeh...
Q04SENIOR
Explain the difference between built-in feature importance and permutati...
Q01 of 04SENIOR

Why is a Precision-Recall curve more informative than an ROC curve for imbalanced classification problems?

ANSWER
ROC curves plot true positive rate against false positive rate. On imbalanced datasets where negatives vastly outnumber positives, even a large number of false positives represents a small false positive rate because the denominator (total negatives) is enormous. This makes the ROC curve look deceptively good — AUC can exceed 0.95 while the model's precision at any useful recall level is actually terrible. Precision-Recall curves focus exclusively on the positive class. Precision measures what fraction of positive predictions are correct, and recall measures what fraction of actual positives are detected. Neither metric is inflated by the large pool of true negatives. On a 1% positive rate dataset, the PR curve immediately shows that achieving 80% recall requires accepting 30% precision — a tradeoff the ROC curve hides entirely. In production, I always plot both side by side. If ROC looks excellent but PR looks mediocre, the model is benefiting from the imbalance, not from genuine discriminative power.
FAQ · 5 QUESTIONS

Frequently Asked Questions

01
Should I use Matplotlib or Seaborn?
02
How do I choose the right chart type for my data?
03
Why do my saved plots look different from what I see in the notebook?
04
How many charts should I include in a model evaluation report?
05
How do I make my charts accessible to colorblind viewers?
🔥

That's ML Basics. Mark it forged?

4 min read · try the examples if you haven't

Previous
Data Cleaning and Preprocessing for Absolute Beginners
20 / 25 · ML Basics
Next
How to Choose the Right Algorithm as a Beginner