Home Python Streamlit for Data Apps: Build Interactive Dashboards in Pure Python

Streamlit for Data Apps: Build Interactive Dashboards in Pure Python

In Plain English 🔥
Imagine you've baked an amazing cake (your data analysis) but it's sitting in your kitchen where nobody can see it. Streamlit is like a pop-up bakery window — it takes your Python code and instantly gives it a front door, a menu, and a way for customers to interact with what you made. You don't need to know how to build a shop; you just focus on the cake. That's Streamlit: a way to share your data work with the world without learning web development.
⚡ Quick Answer
Imagine you've baked an amazing cake (your data analysis) but it's sitting in your kitchen where nobody can see it. Streamlit is like a pop-up bakery window — it takes your Python code and instantly gives it a front door, a menu, and a way for customers to interact with what you made. You don't need to know how to build a shop; you just focus on the cake. That's Streamlit: a way to share your data work with the world without learning web development.

Most data insights die in Jupyter notebooks. A data scientist spends days building a brilliant sales forecasting model, but the only person who can see it is someone who can run Python. The business team — the people who actually need the insight — are locked out. Streamlit exists to fix exactly this problem. It's the fastest path from 'I have a Python analysis' to 'anyone with a browser can use this.'

How Streamlit's Execution Model Actually Works (This Changes Everything)

Before you write a single widget, you need to understand Streamlit's most important — and most surprising — design decision: every time a user interacts with your app, Streamlit re-runs your entire Python script from top to bottom. Every. Single. Time.

This is completely different from how you'd think about a web app. There's no event loop, no callbacks, no 'onclick' handler wiring. When a user moves a slider, Streamlit just re-executes your script with the new slider value baked in. It sounds expensive, but it's what makes Streamlit so simple to reason about.

The upside: your app logic stays linear and readable, just like a regular script. The downside: if you're loading a 2GB CSV on every re-run, your app will be unusably slow. That's why caching — which we'll cover next — isn't optional, it's essential.

Understanding this re-run model is what separates developers who fight Streamlit from those who work with it. Once it clicks, you'll structure your code completely differently.

execution_model_demo.py · PYTHON
12345678910111213141516171819202122232425262728293031323334
import streamlit as st
import datetime

# This line runs EVERY time the user does anything in the app.
# Watch the timestamp change every time you interact with a widget.
st.write(f"Script last ran at: {datetime.datetime.now().strftime('%H:%M:%S')}")

st.title("Understanding Streamlit's Re-Run Model")

# When the user moves this slider, the ENTIRE script above AND below re-executes.
temperature_celsius = st.slider(
    label="Set temperature (°C)",
    min_value=-20,
    max_value=50,
    value=22  # This is the default value on first load
)

# By the time we reach this line, temperature_celsius already holds
# whatever value the user selected — no callbacks needed.
temperature_fahrenheit = (temperature_celsius * 9 / 5) + 32

st.metric(
    label="Temperature in Fahrenheit",
    value=f"{temperature_fahrenheit:.1f} °F",
    delta=f"{temperature_celsius} °C input"  # delta shows a small indicator arrow
)

# Conditional rendering is just plain Python — no special syntax.
if temperature_celsius > 35:
    st.warning("🌡️ That's dangerously hot! Stay hydrated.")
elif temperature_celsius < 0:
    st.info("❄️ Below freezing — roads may be icy.")
else:
    st.success("✅ Comfortable temperature range.")
▶ Output
Script last ran at: 14:32:07
[Title: Understanding Streamlit's Re-Run Model]
[Slider rendered at 22°C by default]
Temperature in Fahrenheit: 71.6 °F (+22 °C input)
[Green success box]: ✅ Comfortable temperature range.

--- After user drags slider to 38°C ---
Script last ran at: 14:32:11
Temperature in Fahrenheit: 100.4 °F (+38 °C input)
[Yellow warning box]: 🌡️ That's dangerously hot! Stay hydrated.
🔥
The Golden Rule:Put your expensive operations (database queries, model loading, file reads) inside @st.cache_data or @st.cache_resource functions. If you don't, they re-execute on every widget interaction and your app will grind to a halt.

Caching and State: Making Your App Fast and Stateful

Streamlit gives you two caching decorators and they solve different problems. @st.cache_data is for functions that return data — CSVs, API responses, processed DataFrames. It serializes the return value and stores it by the function's arguments. Call it with the same arguments, get the cached result instantly.

@st.cache_resource is for things you can't serialize — database connections, ML models loaded into memory, API clients. It stores the actual Python object and shares it across all users and sessions. Use this for anything that should be initialized once and reused.

Then there's st.session_state — Streamlit's answer to 'how do I remember something across re-runs?' Since every interaction wipes the local variable slate clean, session_state is a dictionary that persists for the lifetime of a user's session. It's how you build multi-step forms, track login state, or accumulate user inputs over time.

These three tools — cache_data, cache_resource, and session_state — unlock 90% of real-world Streamlit patterns. Get comfortable with all three.

caching_and_state_demo.py · PYTHON
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778
import streamlit as st
import pandas as pd
import numpy as np
import time

# ── CACHING DATA ─────────────────────────────────────────────────────────────
# The @st.cache_data decorator means this function only runs ONCE per unique
# combination of arguments. After that, Streamlit returns the cached DataFrame.
# Remove this decorator and watch the app slow down on every widget interaction.
@st.cache_data
def load_sales_data(num_records: int) -> pd.DataFrame:
    """Simulate loading sales data from a slow data source."""
    time.sleep(2)  # Pretend this is a slow database query
    rng = np.random.default_rng(seed=42)  # Fixed seed for reproducibility
    return pd.DataFrame({
        "month": pd.date_range(start="2023-01-01", periods=num_records, freq="ME"),
        "revenue": rng.integers(50_000, 200_000, size=num_records),
        "units_sold": rng.integers(100, 1000, size=num_records),
        "region": rng.choice(["North", "South", "East", "West"], size=num_records)
    })

# ── CACHING RESOURCES ─────────────────────────────────────────────────────────
# Use cache_resource for objects that are expensive to create and shouldn't
# be duplicated — like an ML model or a database connection pool.
@st.cache_resource
def load_forecasting_model():
    """Load a pretrained model — only happens once, shared across all users."""
    time.sleep(3)  # Simulate loading a large model file
    st.toast("Model loaded into memory!", icon="🤖")
    # In a real app, this would be: return joblib.load("model.pkl")
    return {"model_name": "LinearForecaster_v2", "accuracy": 0.94}

# ── SESSION STATE ─────────────────────────────────────────────────────────────
# Initialize session state keys safely — only set them if they don't exist yet.
# Doing `st.session_state.click_count = 0` every run would reset the counter!
if "analysis_count" not in st.session_state:
    st.session_state.analysis_count = 0

if "selected_regions" not in st.session_state:
    st.session_state.selected_regions = ["North", "South"]

# ── APP LAYOUT ────────────────────────────────────────────────────────────────
st.title("Sales Dashboard — Caching & State Demo")

num_months = st.slider("How many months of data?", min_value=6, max_value=36, value=12)

# This call benefits from caching — change the slider and only new values
# trigger a fresh load; previously seen values are instant.
with st.spinner("Loading sales data..."):
    sales_df = load_sales_data(num_records=num_months)

# Multi-select widget whose default value is stored in session_state
region_filter = st.multiselect(
    "Filter by region:",
    options=["North", "South", "East", "West"],
    default=st.session_state.selected_regions
)
# Persist the user's selection across re-runs
st.session_state.selected_regions = region_filter

# Filter and display the data
filtered_df = sales_df[sales_df["region"].isin(region_filter)] if region_filter else sales_df

col_left, col_right = st.columns(2)
with col_left:
    st.metric("Total Revenue", f"${filtered_df['revenue'].sum():,.0f}")
with col_right:
    st.metric("Total Units Sold", f"{filtered_df['units_sold'].sum():,}")

st.line_chart(filtered_df.set_index("month")["revenue"], use_container_width=True)

# Track how many times the user has run an analysis in this session
if st.button("Run Forecast"):
    st.session_state.analysis_count += 1
    model = load_forecasting_model()  # Cached — instant after first load
    st.success(f"Forecast complete using {model['model_name']} (accuracy: {model['accuracy']:.0%})")

st.caption(f"You've run {st.session_state.analysis_count} analysis/analyses this session.")
▶ Output
Sales Dashboard — Caching & State Demo

[Slider: 12 months selected]
[Spinner appears for 2 seconds on first load, then instant on revisit]

[Multiselect: North, South selected]

Total Revenue Total Units Sold
$1,432,087 6,847

[Line chart of monthly revenue]

[After clicking 'Run Forecast' for the first time:]
🤖 Toast: Model loaded into memory!
✅ Forecast complete using LinearForecaster_v2 (accuracy: 94%)

You've run 1 analysis/analyses this session.

[Click 'Run Forecast' again — model loads instantly, count becomes 2]
You've run 2 analysis/analyses this session.
⚠️
Watch Out: The Initialization TrapNever initialize session_state keys like `st.session_state.count = 0` at the top of your script unconditionally. Since the script re-runs on every interaction, you'll reset the value every time. Always wrap initialization in `if 'key' not in st.session_state:` — that's the correct pattern.

Building a Real Multi-Page Data App with Layout and Forms

Real data apps aren't single-page scripts — they have navigation, forms with submit buttons, and structured layouts. Streamlit handles all of this natively.

Multi-page apps work by creating a pages/ directory next to your main script. Streamlit auto-discovers any .py files there and adds them to a sidebar navigation menu. File naming controls the menu order: prefix with numbers like 1_Overview.py, 2_Analysis.py. Underscores become spaces in the sidebar.

Forms are critical when you don't want every keystroke to trigger a re-run. Wrap inputs in st.form() and nothing executes until the user clicks the submit button. This is essential for search interfaces, filter panels, and data entry screens.

Columns, tabs, and expanders give you layout control without CSS. st.columns([2, 1]) creates two columns where the left is twice as wide. st.tabs(["Raw Data", "Charts"]) creates a tabbed interface. st.expander() hides content behind a collapsible panel — perfect for advanced options.

The combination of these layout primitives lets you build genuinely professional dashboards entirely in Python.

professional_dashboard.py · PYTHON
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151
import streamlit as st
import pandas as pd
import numpy as np

# ── PAGE CONFIG — must be the very first Streamlit call in your script ─────────
st.set_page_config(
    page_title="Sales Intelligence Hub",
    page_icon="📊",
    layout="wide",          # 'wide' uses the full browser width
    initial_sidebar_state="expanded"
)

# ── GENERATE SAMPLE DATASET ───────────────────────────────────────────────────
@st.cache_data
def generate_product_sales() -> pd.DataFrame:
    rng = np.random.default_rng(seed=7)
    products = ["Laptop Pro", "Wireless Mouse", "USB-C Hub", "Mechanical Keyboard", "Webcam HD"]
    records = []
    for month_offset in range(12):
        month = pd.Timestamp("2024-01-01") + pd.DateOffset(months=month_offset)
        for product in products:
            records.append({
                "month": month,
                "product": product,
                "units_sold": int(rng.integers(20, 300)),
                "unit_price": float(rng.uniform(15.0, 1200.0)),
                "return_rate": float(rng.uniform(0.01, 0.15))
            })
    df = pd.DataFrame(records)
    df["revenue"] = df["units_sold"] * df["unit_price"]
    return df

product_df = generate_product_sales()

# ── SIDEBAR — acts as a global filter panel ───────────────────────────────────
with st.sidebar:
    st.header("🔧 Filters")
    st.markdown("---")

    # Date range filter
    all_months = sorted(product_df["month"].unique())
    date_range = st.select_slider(
        "Select month range:",
        options=[m.strftime("%b %Y") for m in all_months],
        value=(all_months[0].strftime("%b %Y"), all_months[-1].strftime("%b %Y"))
    )

    selected_products = st.multiselect(
        "Products to include:",
        options=product_df["product"].unique().tolist(),
        default=product_df["product"].unique().tolist()
    )

    st.markdown("---")
    st.caption("Dashboard v1.0 — Sales Intelligence Hub")

# ── APPLY FILTERS ─────────────────────────────────────────────────────────────
start_label, end_label = date_range
start_month = pd.Timestamp(start_label)
end_month = pd.Timestamp(end_label)

filtered_df = product_df[
    (product_df["month"] >= start_month) &
    (product_df["month"] <= end_month) &
    (product_df["product"].isin(selected_products))
].copy()

# ── PAGE TITLE ────────────────────────────────────────────────────────────────
st.title("📊 Sales Intelligence Hub")
st.markdown(f"Showing data from **{start_label}** to **{end_label}** for **{len(selected_products)}** product(s).")

# ── KPI ROW — three equal-width columns ──────────────────────────────────────
kpi1, kpi2, kpi3 = st.columns(3)
with kpi1:
    total_revenue = filtered_df["revenue"].sum()
    st.metric("Total Revenue", f"${total_revenue:,.0f}",
              delta=f"+{total_revenue * 0.08:,.0f} vs last period")
with kpi2:
    total_units = filtered_df["units_sold"].sum()
    st.metric("Units Sold", f"{total_units:,}", delta="+12% MoM")
with kpi3:
    avg_return = filtered_df["return_rate"].mean()
    st.metric("Avg Return Rate", f"{avg_return:.1%}", delta="-0.3%", delta_color="inverse")

st.markdown("---")

# ── TABBED CONTENT ────────────────────────────────────────────────────────────
tab_overview, tab_by_product, tab_raw = st.tabs(["📈 Revenue Trend", "🏷️ By Product", "🗂️ Raw Data"])

with tab_overview:
    monthly_revenue = (
        filtered_df.groupby("month")["revenue"]
        .sum()
        .reset_index()
        .set_index("month")
    )
    st.line_chart(monthly_revenue, use_container_width=True)

with tab_by_product:
    # Side-by-side charts using columns
    chart_col, table_col = st.columns([3, 2])

    product_summary = (
        filtered_df.groupby("product")
        .agg(total_revenue=("revenue", "sum"), total_units=("units_sold", "sum"))
        .sort_values("total_revenue", ascending=False)
    )

    with chart_col:
        st.bar_chart(product_summary["total_revenue"], use_container_width=True)
    with table_col:
        st.dataframe(
            product_summary.style.format("{:,.0f}"),
            use_container_width=True
        )

    # Expander for advanced analytics — keeps the UI clean
    with st.expander("📐 Advanced: Return Rate Analysis"):
        return_df = (
            filtered_df.groupby("product")["return_rate"]
            .mean()
            .reset_index()
            .rename(columns={"return_rate": "avg_return_rate"})
        )
        st.dataframe(return_df.style.format({"avg_return_rate": "{:.1%}"}), use_container_width=True)
        st.caption("Products with return rates above 10% may need quality review.")

with tab_raw:
    # A form prevents re-runs on every keystroke in the search box
    with st.form("data_filter_form"):
        search_term = st.text_input("Search product name:", placeholder="e.g. Laptop")
        min_revenue = st.number_input("Minimum revenue per row ($):", min_value=0, value=0, step=500)
        submitted = st.form_submit_button("Apply Filter")  # Only triggers re-run when clicked

    display_df = filtered_df.copy()
    if submitted:
        if search_term:
            display_df = display_df[display_df["product"].str.contains(search_term, case=False)]
        display_df = display_df[display_df["revenue"] >= min_revenue]

    st.dataframe(
        display_df[["month", "product", "units_sold", "unit_price", "revenue", "return_rate"]]
        .sort_values("month")
        .style.format({
            "unit_price": "${:.2f}",
            "revenue": "${:,.0f}",
            "return_rate": "{:.1%}"
        }),
        use_container_width=True
    )
    st.caption(f"Showing {len(display_df):,} of {len(filtered_df):,} records.")
▶ Output
📊 Sales Intelligence Hub
Showing data from Jan 2024 to Dec 2024 for 5 product(s).

┌─────────────────┬─────────────────┬────────────────┐
│ Total Revenue │ Units Sold │ Avg Return Rate│
│ $2,847,392 │ 10,284 │ 7.8% │
│ +$227,791 vs .. │ +12% MoM │ -0.3% ▼ │
└─────────────────┴─────────────────┴────────────────┘

[Tabs: 📈 Revenue Trend | 🏷️ By Product | 🗂️ Raw Data]

[Revenue Trend tab active — line chart showing monthly revenue peaks in Q4]

[By Product tab — bar chart left, summary table right]
Product Rankings by Revenue:
1. Laptop Pro $1,204,847
2. Mechanical Kbd $ 498,231
3. USB-C Hub $ 392,104
...

[▶ Advanced: Return Rate Analysis — collapsed expander]

[Raw Data tab — form with search box and revenue filter]
[Submit 'Apply Filter' — shows filtered table with formatting]
Showing 60 of 60 records.
⚠️
Pro Tip: st.set_page_config() Must Come FirstIf you call any other Streamlit function before st.set_page_config(), you'll get a StreamlitAPIException. It must be the absolute first Streamlit command in your script. A common gotcha: importing a module that itself calls a Streamlit function at import time will trigger this error in a confusing way.

Deploying Your Streamlit App — From Local to Live in 10 Minutes

Building a great app locally is only half the job. Streamlit Community Cloud (formerly Streamlit Sharing) is the fastest way to publish — it's free for public repos and directly integrates with GitHub.

You need three things in your repo root: your main .py file, a requirements.txt listing your dependencies, and optionally a .streamlit/secrets.toml for API keys and credentials. Never hardcode secrets — use st.secrets["api_key"] in your code and add the values through the Streamlit Cloud dashboard UI.

For production or private apps, you have two solid options. Docker containerization gives you full control — create a Dockerfile that installs your requirements and runs streamlit run app.py --server.port 8080. Then deploy to any cloud provider (GCP Cloud Run, AWS ECS, Azure Container Apps) with automatic scaling.

Alternatively, Hugging Face Spaces supports Streamlit natively — just set sdk: streamlit in your README.md front matter and push. It's especially popular in the ML community.

Whichever route you pick, test your requirements.txt in a fresh virtual environment before deploying. The number one deployment failure is 'it works on my machine' because of unpinned or missing dependencies.

secrets_and_deployment.py · PYTHON
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798
# ── FILE STRUCTURE FOR DEPLOYMENT ─────────────────────────────────────────────
# your-app/
# ├── app.py                    ← main Streamlit script
# ├── requirements.txt          ← pinned dependencies
# ├── .gitignore                ← must include .streamlit/secrets.toml!
# └── .streamlit/
#     ├── secrets.toml          ← LOCAL ONLY — never commit this
#     └── config.toml           ← safe to commit — UI theme settings

# ── .streamlit/secrets.toml (LOCAL DEVELOPMENT ONLY) ─────────────────────────
# [database]
# host = "prod-db.company.com"
# username = "readonly_user"
# password = "super_secret_password_123"
#
# [api]
# openai_key = "sk-..."

# ── .streamlit/config.toml (SAFE TO COMMIT) ───────────────────────────────────
# [theme]
# primaryColor = "#FF6B6B"
# backgroundColor = "#FFFFFF"
# secondaryBackgroundColor = "#F0F2F6"
# textColor = "#262730"
# font = "sans serif"
#
# [server]
# maxUploadSize = 50  # MB — increase for large file uploads

# ── requirements.txt (PIN YOUR VERSIONS!) ────────────────────────────────────
# streamlit==1.35.0
# pandas==2.2.2
# numpy==1.26.4
# plotly==5.22.0
# scikit-learn==1.5.0

# ── APP CODE — using secrets safely ──────────────────────────────────────────
import streamlit as st
import pandas as pd

st.set_page_config(page_title="Secrets Demo", page_icon="🔐")
st.title("🔐 Connecting Securely to External Services")

# st.secrets works identically in local dev (reads from secrets.toml)
# and in Streamlit Cloud (reads from the dashboard's secrets manager).
# Your code NEVER changes between environments — only the secrets storage does.
def get_database_connection_string() -> str:
    """Build connection string from secrets — never from hardcoded values."""
    try:
        db_host = st.secrets["database"]["host"]
        db_user = st.secrets["database"]["username"]
        db_pass = st.secrets["database"]["password"]
        # In a real app: return sqlalchemy.create_engine(f"postgresql://{db_user}:{db_pass}@{db_host}/sales")
        return f"postgresql://{db_user}:***@{db_host}/sales"  # Never log real passwords!
    except KeyError as missing_key:
        st.error(f"Missing secret: {missing_key}. Check your secrets.toml or Cloud dashboard.")
        st.stop()  # Halt execution cleanly — don't let the app run in a broken state

# Show connection status without exposing credentials
connection_string = get_database_connection_string()
st.success(f"✅ Database configured: `{connection_string}`")

# ── FILE UPLOADER — common in deployed data apps ──────────────────────────────
st.markdown("---")
st.subheader("Upload Your Own Data")

uploaded_csv = st.file_uploader(
    label="Upload a CSV file to analyze",
    type=["csv"],                          # Restrict to CSV only
    help="Max file size is 50MB. Ensure the first row contains column headers."
)

if uploaded_csv is not None:
    # st.file_uploader returns a BytesIO-like object — pandas reads it directly
    user_data = pd.read_csv(uploaded_csv)
    st.success(f"Loaded {len(user_data):,} rows and {len(user_data.columns)} columns.")
    st.dataframe(user_data.head(10), use_container_width=True)  # Preview only — never show all rows blindly

    # Download button — let users export processed results
    processed_csv = user_data.describe().to_csv().encode("utf-8")
    st.download_button(
        label="📥 Download Summary Statistics",
        data=processed_csv,
        file_name="summary_statistics.csv",
        mime="text/csv"
    )
else:
    st.info("👆 Upload a CSV file to get started.")

# ── DOCKERFILE (as a comment — for containerized deployment) ──────────────────
# FROM python:3.11-slim
# WORKDIR /app
# COPY requirements.txt .
# RUN pip install --no-cache-dir -r requirements.txt
# COPY . .
# EXPOSE 8080
# HEALTHCHECK CMD curl --fail http://localhost:8080/_stcore/health || exit 1
# ENTRYPOINT ["streamlit", "run", "app.py", "--server.port=8080", "--server.address=0.0.0.0"]
▶ Output
🔐 Connecting Securely to External Services

✅ Database configured: `postgresql://readonly_user:***@prod-db.company.com/sales`

───────────────────────────────────────────
Upload Your Own Data

[File uploader widget — drag & drop or browse]
👆 Upload a CSV file to get started.

--- After uploading sales_data.csv ---
✅ Loaded 15,432 rows and 8 columns.

[DataFrame preview showing first 10 rows]

[📥 Download Summary Statistics button]
⚠️
Watch Out: Secrets in Git HistoryAdd `.streamlit/secrets.toml` to your `.gitignore` before your very first commit. If you accidentally commit secrets, removing them from the latest commit isn't enough — they live in git history. You'd need to rotate every exposed credential and rewrite history with `git filter-repo`. Prevention is infinitely easier than the cure.
Feature / AspectStreamlitDash (Plotly)Gradio
Learning curveMinimal — pure Python script styleModerate — callback-based reactive modelMinimal — but ML-interface focused
Re-run modelFull script re-runs on every interactionTargeted callbacks — only affected components updateEvent-driven per component
Best forData dashboards, internal tools, rapid prototypingComplex, production-grade analytics appsML model demos and inference UIs
Layout controlGood — columns, tabs, expandersExcellent — full CSS/HTML controlLimited — opinionated grid layout
State managementst.session_state dictionaryExplicit callback Output/Input wiringImplicit — per-function state
Multi-page supportNative — pages/ directory conventionNative — dcc.Location + layout routingLimited — tabs only
DeploymentStreamlit Community Cloud (free), DockerDash Enterprise or self-hostedHugging Face Spaces (free), Docker
Custom componentsYes — streamlit-component-lib in ReactYes — full React/JS supportLimited — custom JS blocks
Performance ceilingGood for <100 concurrent users without tweaksHigher — callback granularity helpsGood for ML inference demos

🎯 Key Takeaways

  • Streamlit's re-run-on-interaction model is its superpower AND its main gotcha — understanding it is the difference between writing apps that fight you and apps that flow naturally.
  • @st.cache_data is for serializable return values (DataFrames, lists, dicts); @st.cache_resource is for non-serializable shared objects (ML models, DB connections). Using the wrong one causes silent bugs or crashes.
  • st.session_state keys must be initialized with an if 'key' not in st.session_state: guard — doing it unconditionally resets values on every user interaction, which is almost never what you want.
  • Never commit .streamlit/secrets.toml. Use st.secrets[] in code and store values in Streamlit Cloud's secrets dashboard or environment variables — this pattern works identically in both local and deployed environments with zero code changes.

⚠ Common Mistakes to Avoid

  • Mistake 1: Loading data without caching — Symptom: The app becomes noticeably slow after any user interaction (sliders, button clicks, dropdowns) because a database query or file read fires on every re-run — Fix: Wrap your data loading function with @st.cache_data. If the function has no arguments, it caches indefinitely; add a ttl parameter like @st.cache_data(ttl=600) to refresh every 10 minutes for live data sources.
  • Mistake 2: Mutating cached DataFrames — Symptom: You get a CachedObjectMutationWarning and see data corruption across sessions, or the warning disappears but users see each other's filter changes bleed through — Fix: Always work on a copy of cached data. After calling your cached function, do working_df = cached_df.copy() before any filtering, sorting, or column additions. Streamlit warns you for a reason — don't suppress the warning.
  • Mistake 3: Putting st.set_page_config() in the wrong place — Symptom: StreamlitAPIException: set_page_config() can only be called once per app page, and must be called as the first Streamlit command in your script — Fix: Make st.set_page_config() literally the first line of Streamlit code. Even a seemingly innocent st.spinner() or sidebar call in an imported utility module will break this. Check all your imports for any Streamlit calls at module level.

Interview Questions on This Topic

  • QStreamlit re-runs the entire script on every widget interaction. In a production app handling real user traffic, what strategies would you use to keep this performant, and when would you reach for a different framework entirely?
  • QWhat's the difference between @st.cache_data and @st.cache_resource, and what goes wrong if you use the wrong one — for example, caching a database connection with @st.cache_data?
  • QA user reports that changes they make in your Streamlit app seem to affect what another user sees — like their filter selections are showing up for someone else. What's the most likely cause and how do you fix it?

Frequently Asked Questions

Is Streamlit good for production apps or just prototyping?

Streamlit is genuinely production-ready for internal tools, data dashboards, and apps with moderate traffic (dozens to low hundreds of concurrent users). For high-traffic public apps or extremely complex interactivity, Dash or a full React frontend may serve you better. Many companies run Streamlit in production with Docker and Kubernetes successfully.

How do I add authentication to a Streamlit app?

Streamlit Community Cloud supports Google and GitHub OAuth out of the box via the 'Viewers' setting. For custom auth, the streamlit-authenticator library provides a full username/password flow with session management. For enterprise deployments, put Streamlit behind a reverse proxy (nginx or Caddy) with your existing SSO/OAuth2 provider.

Why does my Streamlit app lose all its data when I refresh the page?

Refreshing the browser starts a completely new session, which clears st.session_state. This is by design — Streamlit sessions are ephemeral. If you need to persist data across refreshes or between users, you need external storage: a database (SQLite, PostgreSQL), a file (saved to disk), or a cloud store (S3, GCS). Think of session_state as RAM and a database as a hard drive.

🔥
TheCodeForge Editorial Team Verified Author

Written and reviewed by senior developers with real-world experience across enterprise, startup and open-source projects. Every article on TheCodeForge is written to be clear, accurate and genuinely useful — not just SEO filler.

← PreviousPython Concurrency — asyncio Deep DiveNext →Python Weak References
Forged with 🔥 at TheCodeForge.io — Where Developers Are Forged