Python Intermediate

Seaborn for Data Visualisation: Charts That Actually Tell a Story

📅 March 2026 ⏱ 8 min read 🎯 Intermediate

In Plain English 🔥

Imagine you have a spreadsheet of 10,000 sales records and your boss asks 'is there a pattern here?' You could stare at the numbers, or you could hand them to an artist who instantly draws a picture that makes the pattern obvious. Seaborn is that artist for Python. It takes raw data — messy, tabular, full of columns — and turns it into publication-quality charts in just a few lines of code. It sits on top of Matplotlib the way a power drill sits on top of a motor: the motor does the hard work, but the drill makes it actually usable.

⚡ Quick Answer

Every data project hits the same wall: you have the numbers, but you can't see them. A DataFrame full of customer ages, purchase values, and churn flags is just a rectangle of digits until someone visualises it. Seaborn exists precisely for that moment — the moment between 'I have data' and 'I understand data'. It's used daily by data scientists at companies like Spotify and Airbnb to explore datasets before modelling and to communicate findings to non-technical stakeholders.

The real problem Seaborn solves isn't just aesthetics, though its defaults are beautiful. It solves the complexity problem. To draw a grouped box plot with error bars and a sensible colour palette in pure Matplotlib takes 40 lines and a lot of Stack Overflow. In Seaborn it takes three. More importantly, Seaborn understands the concept of 'tidy data' — it knows what a DataFrame is, it reads column names directly, and it maps statistical relationships onto visual properties automatically. That's a fundamentally different abstraction level.

By the end of this article you'll know which Seaborn chart to reach for in six real-world scenarios, why the Figure-level vs Axes-level distinction matters when you're building dashboards, how to customise without fighting the library, and the three mistakes that silently ruin charts for beginners. You'll also be ready to answer the Seaborn questions that come up in data analyst and data science interviews.

Seaborn's Mental Model: Tidy Data, Figure-Level vs Axes-Level

Before you write a single line of Seaborn, you need to understand its two core assumptions, because breaking either one causes confusing bugs.

First: Seaborn expects tidy data. That means one observation per row and one variable per column. If your DataFrame has columns called 'Jan_Sales', 'Feb_Sales', 'Mar_Sales', Seaborn will fight you. The correct shape has a 'Month' column and a 'Sales' column — one row per month per product. Pandas' melt() function is your friend here.

Second: Seaborn has two tiers of functions. Axes-level functions like histplot(), scatterplot(), and boxplot() draw onto a single Matplotlib Axes object — they behave like normal Matplotlib and you can combine them freely. Figure-level functions like displot(), relplot(), and catplot() create their own Figure and can produce multi-panel grids via a 'col=' or 'row=' argument. They return a FacetGrid object, not an Axes, which is why calling plt.title() on one produces the wrong result.

Knowing this split stops you spending an hour wondering why your title is in the wrong place or why subplots won't cooperate.

seaborn_mental_model.py · PYTHON

12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667

import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd

# --- Build a tidy sales DataFrame ---
# Each row = one record. No pivoted month columns.
sales_data = pd.DataFrame({
    'month':    ['Jan','Jan','Feb','Feb','Mar','Mar'] * 3,
    'region':   (['North','South'] * 9),
    'revenue':  [42000, 38000, 51000, 47000, 63000, 58000,
                 39000, 41000, 49000, 52000, 61000, 66000,
                 44000, 37000, 55000, 48000, 67000, 60000]
})

# --- AXES-LEVEL example: scatterplot onto an existing Axes ---
# We control the figure size ourselves before calling Seaborn.
fig, axes = plt.subplots(1, 2, figsize=(12, 5))

# Axes-level: pass the target ax explicitly so Seaborn knows where to draw.
sns.boxplot(
    data=sales_data,
    x='month',
    y='revenue',
    hue='region',   # colour-codes boxes by region automatically
    ax=axes[0]      # <-- THIS is what makes it axes-level
)
axes[0].set_title('Revenue by Month and Region (Axes-level)')
axes[0].set_ylabel('Revenue (USD)')

# Second panel: bar chart of average revenue per month
sns.barplot(
    data=sales_data,
    x='month',
    y='revenue',
    hue='region',
    errorbar='sd',  # shows standard deviation as error bars
    ax=axes[1]
)
axes[1].set_title('Average Revenue with Std Dev (Axes-level)')
axes[1].set_ylabel('Mean Revenue (USD)')

plt.suptitle('Axes-Level Seaborn: We Own the Figure', fontsize=14, y=1.02)
plt.tight_layout()
plt.savefig('axes_level_demo.png', dpi=150, bbox_inches='tight')
plt.show()
print('Axes-level chart saved.')

# --- FIGURE-LEVEL example: catplot manages its own figure ---
# We do NOT create a fig/axes first. Seaborn does it.
grid = sns.catplot(
    data=sales_data,
    x='month',
    y='revenue',
    col='region',        # creates one panel per region automatically
    kind='box',
    height=4,
    aspect=0.9,
    palette='muted'
)
# For figure-level functions, set the title on the FacetGrid object,
# NOT with plt.title() — that would go on the wrong axes.
grid.set_titles('Region: {col_name}')  # {col_name} is a Seaborn template token
grid.set_axis_labels('Month', 'Revenue (USD)')
grid.figure.suptitle('Figure-Level catplot: Seaborn Owns the Figure', y=1.03)
plt.savefig('figure_level_demo.png', dpi=150, bbox_inches='tight')
plt.show()
print('Figure-level chart saved.')

▶ Output

Axes-level chart saved.
Figure-level chart saved.

⚠️

Watch Out: plt.title() Doesn't Work on FacetGridAfter catplot(), relplot(), or displot(), calling plt.title('My Title') places the title on the last active Axes panel, not the whole figure. Use grid.figure.suptitle('My Title') instead, or grid.set_titles('{col_name}') for per-panel labels.

Choosing the Right Chart: Six Real-World Scenarios

The most common Seaborn mistake isn't bad syntax — it's reaching for the wrong chart. Here's the decision framework professionals actually use.

Distribution of a single numeric variable? Use histplot() with kde=True to overlay the density curve. It answers 'is this data normally distributed, skewed, or bimodal?' before you choose a statistical test.

Relationship between two numeric variables? scatterplot() with hue= for a third categorical dimension. Add a regression line with lmplot() when you want to communicate correlation to a non-technical audience.

Comparing a numeric variable across categories? boxplot() for showing spread and outliers, violinplot() when sample size is large enough to trust the density estimate (roughly n > 30 per group), and barplot() only when mean + uncertainty is the right summary.

Correlation across many numeric columns? heatmap() on a correlation matrix. This is the chart that identifies multicollinearity before you build a regression model.

Change over time? lineplot() with hue= for multiple groups. Seaborn automatically aggregates and draws confidence intervals when multiple observations exist per x value.

Distribution across two categorical dimensions? heatmap() on a pivot table, or pointplot() with both x= and hue= for overlapping line-point combos.

seaborn_chart_selection.py · PYTHON

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103

import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

# Set a clean, professional theme once at the top of your script.
# 'whitegrid' adds horizontal guide lines that help readers trace values.
sns.set_theme(style='whitegrid', palette='colorblind', font_scale=1.1)

# Load the built-in penguins dataset — real biological measurements.
# This is tidy data: one penguin per row.
penguins = sns.load_dataset('penguins').dropna()  # drop 11 rows with missing values

print(f"Dataset shape: {penguins.shape}")
print(penguins.head(3))

# ── SCENARIO 1: Distribution of flipper length ──────────────────────────────
fig, ax = plt.subplots(figsize=(8, 4))
sns.histplot(
    data=penguins,
    x='flipper_length_mm',
    hue='species',       # separate colour per species
    kde=True,            # overlay kernel density estimate
    bins=25,
    alpha=0.5,           # transparency so overlapping bars are still visible
    ax=ax
)
ax.set_title('Flipper Length Distribution by Species')
ax.set_xlabel('Flipper Length (mm)')
plt.tight_layout()
plt.savefig('scenario1_distribution.png', dpi=150)
plt.show()

# ── SCENARIO 2: Relationship — bill length vs bill depth ────────────────────
# lmplot is figure-level and adds a regression line per hue group.
lm_grid = sns.lmplot(
    data=penguins,
    x='bill_length_mm',
    y='bill_depth_mm',
    hue='species',
    height=5,
    aspect=1.3,
    scatter_kws={'alpha': 0.6, 's': 40}  # pass kwargs down to the scatter layer
)
lm_grid.set_axis_labels('Bill Length (mm)', 'Bill Depth (mm)')
lm_grid.figure.suptitle('Bill Dimensions: Species Show Opposite Trends (Simpson Paradox)', y=1.02)
plt.savefig('scenario2_regression.png', dpi=150, bbox_inches='tight')
plt.show()

# ── SCENARIO 3: Numeric across categories — body mass by species and sex ────
fig, axes = plt.subplots(1, 2, figsize=(12, 5))

sns.violinplot(
    data=penguins,
    x='species',
    y='body_mass_g',
    hue='sex',
    split=True,         # mirror halves on same violin — saves space
    inner='quartile',   # draw quartile lines inside the violin
    palette='Set2',
    ax=axes[0]
)
axes[0].set_title('Body Mass Distribution (Violin)')
axes[0].set_xlabel('')
axes[0].set_ylabel('Body Mass (g)')

sns.boxplot(
    data=penguins,
    x='species',
    y='body_mass_g',
    hue='sex',
    palette='Set2',
    ax=axes[1]
)
axes[1].set_title('Body Mass Distribution (Box)')
axes[1].set_xlabel('')
axes[1].set_ylabel('Body Mass (g)')

plt.suptitle('Same Data, Different Chart — Violin Shows Full Shape', fontsize=13)
plt.tight_layout()
plt.savefig('scenario3_violin_vs_box.png', dpi=150)
plt.show()

# ── SCENARIO 4: Correlation heatmap before modelling ────────────────────────
numeric_cols = penguins.select_dtypes(include='number')
correlation_matrix = numeric_cols.corr()  # Pearson correlation by default

fig, ax = plt.subplots(figsize=(6, 5))
sns.heatmap(
    correlation_matrix,
    annot=True,          # print correlation value inside each cell
    fmt='.2f',           # format to 2 decimal places
    cmap='coolwarm',     # red = positive, blue = negative correlation
    vmin=-1, vmax=1,     # pin the colour scale to the valid correlation range
    square=True,         # force square cells for readability
    linewidths=0.5,
    ax=ax
)
ax.set_title('Penguin Feature Correlations — Check Before Modelling')
plt.tight_layout()
plt.savefig('scenario4_heatmap.png', dpi=150)
plt.show()
print('All charts saved.')

▶ Output

Dataset shape: (333, 7)
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex
0 Adelie Torgersen 39.1 18.7 181.0 3750.0 Male
1 Adelie Torgersen 39.5 17.4 186.0 3800.0 Female
2 Adelie Torgersen 40.3 18.0 195.0 3250.0 Female
All charts saved.

⚠️

Pro Tip: The Heatmap That Saves Your ModelAlways run the correlation heatmap before fitting a linear or logistic regression. If two features have a correlation above 0.85 (deep red on coolwarm), you have multicollinearity — keep only one of them or your coefficients will be unstable and uninterpretable.

Customising Seaborn Without Fighting It — Themes, Palettes, and Matplotlib Escape Hatches

Seaborn's defaults are intentionally good. The trap beginners fall into is immediately overriding everything and ending up with something worse than the default. The right mental model is: let Seaborn do 80%, then use Matplotlib for the final 20%.

The sns.set_theme() call at the top of your script is the single most powerful line. It sets the background, grid style, font scale, and colour palette for every chart that follows. Choose from five styles: 'darkgrid', 'whitegrid', 'dark', 'white', and 'ticks'. For presentations use 'white'; for exploratory analysis 'whitegrid' helps you read values.

Colour palettes deserve real thought. The 'colorblind' palette is the professional default — it's distinguishable by people with deuteranopia and protanopia (about 8% of men). For sequential data (low to high) use 'Blues' or 'YlOrRd'. For diverging data (negative to positive, like correlations) use 'coolwarm' or 'RdBu_r'. Never use the default rainbow — it implies ordering where none exists.

For anything Seaborn can't do natively, you always have access to the underlying Matplotlib object. Axes-level functions return the Axes; figure-level functions expose their Figure via grid.figure and individual axes via grid.axes_dict.

seaborn_customisation.py · PYTHON

import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib.ticker as mticker
import pandas as pd
import numpy as np

# ── Build a realistic e-commerce monthly metrics DataFrame ──────────────────
np.random.seed(42)  # reproducible random data
months = pd.date_range('2023-01', periods=12, freq='MS')
channels = ['Organic', 'Paid Search', 'Email', 'Social']

records = []
for channel in channels:
    base = {'Organic': 12000, 'Paid Search': 8000, 'Email': 5000, 'Social': 3000}[channel]
    for month in months:
        revenue = base + np.random.randint(-1500, 3000) + (months.get_loc(month) * 200)
        records.append({'month': month, 'channel': channel, 'revenue': revenue})

ecommerce_df = pd.DataFrame(records)

# ── Set a publication-ready theme ───────────────────────────────────────────
# rc= accepts any valid Matplotlib rcParam — great for font and line tweaks.
sns.set_theme(
    style='white',
    palette='colorblind',
    font='DejaVu Sans',
    font_scale=1.15,
    rc={
        'axes.spines.top': False,    # remove top spine for a cleaner look
        'axes.spines.right': False,  # remove right spine
        'lines.linewidth': 2.2
    }
)

# ── Line chart: revenue trend with confidence band ──────────────────────────
fig, ax = plt.subplots(figsize=(11, 5))

sns.lineplot(
    data=ecommerce_df,
    x='month',
    y='revenue',
    hue='channel',
    # Seaborn auto-computes mean + 95% CI when multiple y values share an x.
    # Here each month/channel has one value, so we see raw lines.
    markers=True,
    dashes=False,  # solid lines for all channels — easier to read in colour
    ax=ax
)

# ── Matplotlib escape hatch: format the y-axis as currency ──────────────────
# Seaborn gives us the Axes object — we use standard Matplotlib from here.
ax.yaxis.set_major_formatter(mticker.FuncFormatter(
    lambda value, _: f'${value:,.0f}'  # e.g. 12000 → $12,000
))

# Rotate month labels so they don't collide
plt.xticks(rotation=30, ha='right')

# Add a vertical line marking a 'Black Friday campaign' event
black_friday = pd.Timestamp('2023-11-01')
ax.axvline(x=black_friday, color='crimson', linestyle='--', linewidth=1.5, alpha=0.8)
ax.text(
    black_friday, ax.get_ylim()[1] * 0.97,
    ' Black Friday\n Campaign',
    color='crimson', fontsize=9, va='top'
)

ax.set_title('Monthly Revenue by Channel — 2023', fontsize=15, pad=12)
ax.set_xlabel('')
ax.set_ylabel('Revenue (USD)')
ax.legend(title='Channel', bbox_to_anchor=(1.01, 1), loc='upper left')

plt.tight_layout()
plt.savefig('ecommerce_revenue_trend.png', dpi=150, bbox_inches='tight')
plt.show()
print('Revenue trend chart saved.')

# ── Palette demo: sequential vs diverging ───────────────────────────────────
fig, axes = plt.subplots(1, 3, figsize=(14, 4))

category_totals = ecommerce_df.groupby('channel')['revenue'].sum().reset_index()

# Default colorblind palette — categorical, no implied order
sns.barplot(data=category_totals, x='channel', y='revenue',
            palette='colorblind', ax=axes[0])
axes[0].set_title('Colorblind Palette\n(Categorical Data)')
axes[0].yaxis.set_major_formatter(mticker.FuncFormatter(lambda v, _: f'${v/1000:.0f}k'))

# Sequential palette — implies low-to-high ordering
sns.barplot(data=category_totals, x='channel', y='revenue',
            palette='Blues_d', ax=axes[1])
axes[1].set_title('Blues_d Palette\n(Sequential — Implies Rank)')
axes[1].yaxis.set_major_formatter(mticker.FuncFormatter(lambda v, _: f'${v/1000:.0f}k'))

# Custom hex palette — brand colours
brand_palette = ['#0057FF', '#FF6B35', '#2EC4B6', '#FFBF00']
sns.barplot(data=category_totals, x='channel', y='revenue',
            palette=brand_palette, ax=axes[2])
axes[2].set_title('Custom Brand Palette\n(Hex Codes)')
axes[2].yaxis.set_major_formatter(mticker.FuncFormatter(lambda v, _: f'${v/1000:.0f}k'))

for ax in axes:
    ax.set_xlabel('')
    ax.set_ylabel('Total Revenue')
    sns.despine(ax=ax)  # remove top and right spines

plt.suptitle('Palette Choice Changes the Story', fontsize=13, y=1.02)
plt.tight_layout()
plt.savefig('palette_comparison.png', dpi=150, bbox_inches='tight')
plt.show()
print('Palette comparison saved.')

▶ Output

Revenue trend chart saved.
Palette comparison saved.

🔥

Interview Gold: Why Colorblind Palette?Interviewers love asking about accessibility in visualisation. The 'colorblind' palette in Seaborn uses the Wong (2011) colour set, which remains distinguishable under the three most common forms of colour vision deficiency. Always default to it for any chart that goes into a report or dashboard.

Pairplots and FacetGrids — Exploring Entire Datasets in One Call

Once you've got individual charts under control, Seaborn's real superpower for exploratory data analysis is the multi-chart grid. Two functions deliver this: pairplot() and FacetGrid.

pairplot() is the tool you run on a new dataset before you do anything else. It draws every numeric column against every other numeric column — scatter plots off-diagonal, distributions on-diagonal — and colour-codes by a categorical variable. In five seconds you can see which pairs of features are linearly related, which ones cluster by class, and which ones are skewed. It's the fastest possible dataset overview.

FacetGrid is the manual version. You control exactly which variable goes on rows, which goes on columns, and then you map any Axes-level Seaborn or Matplotlib function onto every panel. This is how you build dashboards programmatically — one loop builds 12 charts, perfectly aligned, with shared axes.

Both are figure-level, so the plt.title() caveat from Section 1 applies. The payoff is that the layout, spacing, and legend are all handled for you.

seaborn_pairplot_facetgrid.py · PYTHON

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384

import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

sns.set_theme(style='ticks', palette='colorblind', font_scale=1.0)

# Load the diamonds dataset — price, carat, cut, colour, clarity
# We'll use a sample so the pairplot renders quickly
diamonds = sns.load_dataset('diamonds')
diamond_sample = diamonds.sample(n=1500, random_state=99).reset_index(drop=True)

print(f"Diamonds sample: {diamond_sample.shape}")
print(diamond_sample[['carat','price','depth','cut']].head(3))

# ── PAIRPLOT: dataset overview in one call ───────────────────────────────────
# We pick four numeric columns — plotting all eight would be illegible.
pair_grid = sns.pairplot(
    diamond_sample[['carat', 'price', 'depth', 'table', 'cut']],
    hue='cut',                   # colour-code points by diamond cut quality
    diag_kind='kde',             # kernel density on the diagonal instead of histogram
    plot_kws={'alpha': 0.4, 's': 15},  # small, transparent points avoid overplotting
    height=2.2
)
pair_grid.figure.suptitle(
    'Diamond Features Pairplot — Cut Quality Colour-Coded',
    y=1.01, fontsize=13
)
plt.savefig('diamond_pairplot.png', dpi=130, bbox_inches='tight')
plt.show()
print('Pairplot saved.')

# ── FACETGRID: custom multi-panel chart ─────────────────────────────────────
# We want one panel per cut quality, showing carat vs price scatter
# with a regression line. This is how reports are built programmatically.

# FacetGrid needs the variable that defines panels set in col= or row=
cut_grid = sns.FacetGrid(
    data=diamond_sample,
    col='cut',
    col_order=['Fair', 'Good', 'Very Good', 'Premium', 'Ideal'],
    height=3.5,
    aspect=0.75,
    sharey=True   # shared y-axis so panels are directly comparable
)

# .map_dataframe() applies a function to each panel's subset of data.
# The first arg is the function, subsequent args are column names.
cut_grid.map_dataframe(
    sns.scatterplot,
    x='carat',
    y='price',
    alpha=0.3,
    s=12,
    color='steelblue'
)

# Add regression line to every panel
cut_grid.map_dataframe(
    sns.regplot,
    x='carat',
    y='price',
    scatter=False,   # don't redraw dots — we already drew them above
    line_kws={'color': 'crimson', 'linewidth': 1.8}
)

cut_grid.set_titles('{col_name} Cut')    # panel titles like 'Ideal Cut'
cut_grid.set_axis_labels('Carat', 'Price (USD)')
cut_grid.set(xlim=(0.2, 3.0))

# Format y-axis as dollars on each panel
import matplotlib.ticker as mticker
for ax in cut_grid.axes.flat:
    ax.yaxis.set_major_formatter(
        mticker.FuncFormatter(lambda v, _: f'${v/1000:.0f}k')
    )

cut_grid.figure.suptitle(
    'Carat vs Price by Cut Quality — Steeper Slopes = Better Value Per Carat',
    y=1.02, fontsize=12
)
plt.savefig('diamond_facetgrid.png', dpi=130, bbox_inches='tight')
plt.show()
print('FacetGrid saved.')

▶ Output

Diamonds sample: (1500, 10)
carat price depth cut
0 0.90 4954 62.5 Good
1 0.31 916 61.6 Ideal
2 1.01 6486 62.8 Premium
Pairplot saved.
FacetGrid saved.

⚠️

Pro Tip: Sample Before Pairplotpairplot() on a full 50,000-row dataset will freeze your machine — it draws n² points per panel. Always sample first: df.sample(n=2000, random_state=42). The patterns visible at 2,000 rows are the same as at 50,000, and the chart renders in under five seconds.

Feature / Aspect	Axes-Level Functions (e.g. boxplot)	Figure-Level Functions (e.g. catplot)
Returns	Matplotlib Axes object	FacetGrid object
Use plt.title()?	Yes — works as expected	No — use grid.figure.suptitle()
Multi-panel grids	Manual (plt.subplots)	Built-in via col=, row= params
Combine with other charts	Easy — pass ax= param	Harder — use .map_dataframe()
Best for	Dashboard panels, custom layouts	Exploratory faceting, quick multi-group views
Legend control	Full Matplotlib control	Via grid.add_legend() method
Figure size control	figsize on plt.subplots()	height= and aspect= params

🎯 Key Takeaways

⚠ Common Mistakes to Avoid

✕Mistake 1: Calling plt.title() after a figure-level function — The title either doesn't appear or lands on only the last panel. Fix it by using grid.figure.suptitle('Your Title', y=1.02) for an overall title, or grid.set_titles('{col_name}') for per-panel labels.
✕Mistake 2: Passing wide-format data directly to Seaborn — If your DataFrame has columns like 'Q1_Revenue', 'Q2_Revenue', Seaborn can't find a single column to map to x or y and throws a KeyError or produces a nonsensical chart. Fix it with pd.melt(df, id_vars=['product'], value_vars=['Q1_Revenue','Q2_Revenue'], var_name='quarter', value_name='revenue') before plotting.
✕Mistake 3: Forgetting dropna() before plotting — A DataFrame with NaN values in the column mapped to hue= causes Seaborn to either silently drop rows (giving misleading group sizes) or crash with a TypeError. Always call df.dropna(subset=['your_hue_column']) explicitly so you know exactly which rows are excluded and can document that decision.

🔥

TheCodeForge Editorial Team Verified Author

Written and reviewed by senior developers with real-world experience across enterprise, startup and open-source projects. Every article on TheCodeForge is written to be clear, accurate and genuinely useful — not just SEO filler.

About Our Team Editorial Standards

Forged with 🔥 at TheCodeForge.io — Where Developers Are Forged