Streamlit for Data Apps: Build Interactive Dashboards in Pure Python
- Streamlit's re-run-on-interaction model is its superpower and its main operational risk — mastering caching is non-negotiable before any production deployment.
- @st.cache_data is for serializable values like DataFrames and API responses; @st.cache_resource is for shared non-serializable objects like database connection pools and ML models.
- st.session_state initialization must always be guarded with a 'not in' check — without it, every re-run resets the state and users lose their progress.
- Streamlit re-runs your entire Python script on every widget interaction — no callbacks, no event loop
- @st.cache_data caches serializable returns (DataFrames, API responses) — @st.cache_resource caches non-serializable objects (DB connections, ML models)
- st.session_state persists data across re-runs within a single browser session — but is lost on page refresh
- Use st.form() to batch widget inputs and prevent re-runs on every keystroke during data entry
- The #1 production mistake: loading data outside a cache decorator — every slider move re-queries the database
- Biggest misconception: Streamlit is only for prototypes. With proper caching and Docker deployment, it handles internal tools at enterprise scale
App is slow — need to identify which function is consuming the most execution time.
python -m cProfile -s cumtime -m streamlit run app.py 2>&1 | head -30pip install streamlit-profiler && streamlit-profiler run app.pyCache seems to not be working — data reloads on every interaction.
Add st.write(st.cache_data) to inspect cache stats — shows hits and misses per decorated function.Check if any argument to the cached function is a mutable type — list, dict, DataFrame — these bust the cache silently on every call.Docker container runs but browser shows connection refused.
docker exec <container> curl -s http://localhost:8501/_stcore/healthdocker logs <container> --tail 50Production Incident
st.form() around all input widgets so re-runs only happen on explicit submit, not on every keystroke or slider nudge. 4. Moved the dashboard connection to a read replica instead of hammering the primary. 5. Added a Streamlit-specific connection pooler using st.connection with SQLAlchemy, capped at 5 connections.Production Debug GuideCommon symptoms when a Streamlit app is slow or broken in production.
st.cache_data.clear() programmatically. For user-controlled refresh, add a clearly labeled 'Refresh Data' button that calls the clear function and immediately triggers a re-run.st.error() to surface failures gracefully rather than crashing the session.Most data insights die in Jupyter notebooks. A data scientist builds a forecasting model, but only someone who can run Python can actually see it. Streamlit fixes this — it turns any Python script into a live web app with zero frontend code.
The core trade-off: Streamlit re-runs your entire script on every interaction. This makes the programming model dead simple — your code stays linear, no callback wiring. But it also means un-cached operations like database queries, model loading, and file reads fire on every slider move. Without disciplined caching, your app grinds to a halt after the second click.
Productionizing Streamlit requires three things: caching decorators on every expensive operation, st.session_state for cross-interaction state, and a deployment strategy — Docker, Streamlit Community Cloud, or Kubernetes. Miss any of these and you have a prototype that breaks under real usage.
How Streamlit's Execution Model Actually Works (This Changes Everything)
Before you write a single widget, you need to understand Streamlit's most important — and most surprising — design decision: every time a user interacts with your app, Streamlit re-runs your entire Python script from top to bottom. Every. Single. Time.
This is completely different from how most web frameworks operate. There is no event loop, no callbacks, no onclick handler wiring. When a user moves a slider, Streamlit re-executes your script with the new slider value baked in as the widget's return value. It sounds expensive, and it can be — but it is also what makes Streamlit so easy to reason about.
The upside: your app logic stays linear and readable, exactly like a regular Python script. The downside: if you are loading a 2GB CSV or running a complex SQL query on every re-run, your app will be unusably slow within seconds. That is why caching is not optional — it is the single most critical design decision in any Streamlit app.
import streamlit as st import datetime # io.thecodeforge: Tracking the execution lifecycle # This line runs EVERY time the user interacts with anything in the app. st.write(f"Script last ran at: {datetime.datetime.now().strftime('%H:%M:%S')}") st.title("Understanding Streamlit's Re-Run Model") # When the user moves this slider, the ENTIRE script above AND below re-executes. temperature_celsius = st.slider( label="Set temperature (°C)", min_value=-20, max_value=50, value=22 ) temperature_fahrenheit = (temperature_celsius * 9 / 5) + 32 st.metric( label="Temperature in Fahrenheit", value=f"{temperature_fahrenheit:.1f} °F", delta=f"{temperature_celsius} °C input" ) if temperature_celsius > 35: st.warning("That's dangerously hot. Stay hydrated and limit outdoor exposure.") elif temperature_celsius < 0: st.info("Below freezing — roads may be icy. Check local transport advisories.") else: st.success("Comfortable temperature range.")
[Title: Understanding Streamlit's Re-Run Model]
[Slider rendered at 22°C by default]
Temperature in Fahrenheit: 71.6 °F (+22 °C input)
[Green success box]: Comfortable temperature range.
- No callbacks, no event loop — your code runs top-to-bottom on every interaction
- Widget return values change between re-runs, but the code structure stays identical
- This makes the mental model dead simple: write a script, add widgets, done
- The cost: every uncached operation re-executes — this is why caching is mandatory, not optional
- Think of it as replaying your script with new inputs each time, not patching specific components
Caching and State: Making Your App Fast and Stateful
Streamlit gives you two caching decorators and they solve different problems. @st.cache_data is for functions that return data — CSVs, API responses, processed DataFrames. It serializes the return value using pickle, which means every user session gets its own copy. @st.cache_resource is for non-serializable objects like database connections or ML models — it stores the object reference directly and shares it across all sessions.
Then there is st.session_state — a dictionary that persists for the lifetime of a user's session. It disappears on page refresh, but it survives every re-run within that session. It is how you build multi-step forms, track login status, or accumulate user inputs without losing them between interactions.
import streamlit as st import pandas as pd import time # ── CACHING DATA (io.thecodeforge standard) ─────────────────────────────────── @st.cache_data(ttl=3600) # Cache expires after 1 hour — reduces DB load significantly def load_sales_data(num_records: int) -> pd.DataFrame: """ Simulates a slow data fetch. With @st.cache_data, this only runs once per hour regardless of how many times the user interacts with widgets. """ time.sleep(2) # Simulated network/DB latency return pd.DataFrame({ "record_id": range(num_records), "revenue": [i * 1.5 for i in range(num_records)] }) # ── CACHING RESOURCES ───────────────────────────────────────────────────────── @st.cache_resource def init_connection(): """ @st.cache_resource is for non-serializable infrastructure. This object is created once and shared across all user sessions. Use it for DB pools and ML models — never for DataFrames. """ return {"status": "connected", "provider": "ForgeCloud"} # ── SESSION STATE ───────────────────────────────────────────────────────────── # Always guard initialization — without this check, state resets on every re-run. if "analysis_count" not in st.session_state: st.session_state.analysis_count = 0 st.title("Forge Analytics Dashboard") col1, col2 = st.columns([3, 1]) with col1: record_count = st.slider("Number of records to load", 100, 10000, 1000) with col2: st.metric("Analyses Run", st.session_state.analysis_count) if st.button("Run Analysis"): st.session_state.analysis_count += 1 df = load_sales_data(record_count) # Cached — will not re-query on every click st.dataframe(df.head(10)) st.write(f"Analysis #{st.session_state.analysis_count} complete. Loaded {len(df)} records.") conn = init_connection() # Shared resource — created once, reused across sessions st.caption(f"Connection status: {conn['status']} via {conn['provider']}")
[Slider: Number of records to load — set at 1000]
[Metric: Analyses Run — 0]
[Button: Run Analysis]
After clicking Run Analysis:
Analysis #1 complete. Loaded 1000 records.
[DataFrame: first 10 rows displayed]
Connection status: connected via ForgeCloud
st.cache_data.clear() from a refresh button for manual invalidationBuilding a Real Multi-Page Data App with Layout and Forms
Real data apps require navigation, structured layouts, and forms that do not re-run the entire script on every character the user types. Streamlit handles multi-page navigation through a pages/ directory — any Python file placed there is automatically discovered and shown in the sidebar. Layout primitives like st.columns() and st.tabs() handle visual organization within a page.
Forms are particularly important for production apps. Without st.form(), every keystroke in a text input triggers a full script re-run — which means every keystroke fires your cached data loading function's cache key check, re-renders the entire chart, and redraws the page. With st.form(), all widget changes inside the form are buffered locally and a single re-run fires only when the user explicitly clicks the submit button.
import streamlit as st import pandas as pd import numpy as np # io.thecodeforge: Professional dashboard layout # st.set_page_config MUST be the first Streamlit call — anything before it raises StreamlitAPIException. st.set_page_config( page_title="Forge Intelligence Hub", page_icon="🔬", layout="wide", initial_sidebar_state="expanded" ) # ── SIDEBAR CONTROLS ────────────────────────────────────────────────────────── with st.sidebar: st.header("Control Panel") st.divider() mode = st.radio( "Analysis Mode", ["Standard", "Advanced"], help="Advanced mode enables cohort segmentation and confidence intervals." ) date_range = st.date_input("Reporting Period", []) # ── KPI HEADER ROW ──────────────────────────────────────────────────────────── st.title("Forge Intelligence Hub") kpi1, kpi2, kpi3, kpi4 = st.columns(4) kpi1.metric("Revenue", "$1.24M", delta="+12.3%") kpi2.metric("Active Users", "8,412", delta="+340") kpi3.metric("Churn Rate", "2.4%", delta="-0.5%", delta_color="inverse") kpi4.metric("Model Accuracy", "94.1%", delta="+1.2%") st.divider() # ── TABBED CONTENT ──────────────────────────────────────────────────────────── overview_tab, forecast_tab, settings_tab = st.tabs(["Overview", "Forecast", "Settings"]) with overview_tab: chart_data = pd.DataFrame( np.random.randn(30, 3), columns=["Revenue", "Cost", "Profit"] ) st.line_chart(chart_data) with forecast_tab: # st.form() buffers ALL widget interactions — re-run only fires on submit. # Without this, every slider nudge or text keystroke triggers a full re-run. with st.form("forecast_parameters"): st.subheader("Configure Forecast") col_a, col_b = st.columns(2) with col_a: horizon = st.slider("Forecast horizon (days)", 7, 90, 30) confidence = st.selectbox("Confidence interval", ["80%", "90%", "95%"]) with col_b: target_metric = st.text_input("Target metric", placeholder="e.g. daily_revenue") include_weekends = st.checkbox("Include weekends", value=True) submitted = st.form_submit_button("Run Forecast", type="primary") if submitted: if not target_metric: st.error("Target metric is required before running the forecast.") else: with st.spinner(f"Running {horizon}-day forecast for '{target_metric}'..."): # In production this would call your ML backend API st.success(f"Forecast complete: {horizon} days, {confidence} CI, {'weekends included' if include_weekends else 'weekdays only'}.") with settings_tab: st.info("Settings are persisted per session. Changes here reset on page refresh.") theme = st.selectbox("Dashboard theme", ["Light", "Dark", "System"]) refresh_interval = st.number_input("Auto-refresh interval (seconds)", min_value=30, value=300)
[Sidebar: Analysis Mode radio, date range picker]
[KPI row: Revenue $1.24M +12.3%, Active Users 8412 +340, Churn 2.4% -0.5%, Accuracy 94.1% +1.2%]
[Tabs: Overview | Forecast | Settings]
[Overview tab: line chart rendered]
[Forecast tab: form with slider, selectbox, text input, checkbox, Submit button]
[Settings tab: theme selector, refresh interval input]
st.set_page_config(), you will get a StreamlitAPIException that halts the entire app. This includes st.write(), st.title(), and even importing a module that calls a Streamlit function at import time. Make st.set_page_config() the absolute first line after your imports.st.form() to prevent cascading re-runs during data entry.st.set_page_config() as the absolute first Streamlit command — any call before it raises an exception that crashes the app on startup.st.columns() for side-by-side layout — no multi-page complexity neededst.form() — it prevents re-runs on every keystroke and fires once cleanly on submitst.form() — use individual widgets outside a form so each keystroke triggers the updateData Persistence: The SQL Backend
Streamlit's st.session_state is ephemeral by design. It lives in server memory, scoped to a single browser session, and disappears the moment the user refreshes the page, closes the tab, or the server restarts. For anything that needs to survive beyond a single session — audit logs, saved analysis results, user preferences, cross-session dashboards — you must write to an external persistent store.
At the enterprise level, a structured SQL backend is the standard approach. The pattern is straightforward: use @st.cache_resource to create a shared database connection pool once, and write session events to an audit table on key user actions. This gives you a complete record of dashboard activity without impacting the read performance of your main queries.
-- io.thecodeforge: Persistence Layer for Streamlit Session Activity -- This table captures dashboard interactions that must survive beyond a single session. -- st.session_state cannot be used for this — it is lost on every page refresh. CREATE TABLE IF NOT EXISTS io.thecodeforge.dashboard_activity ( id SERIAL PRIMARY KEY, session_id TEXT NOT NULL, user_email TEXT, action_performed TEXT NOT NULL, widget_context JSONB, -- captures which filters/params were active interaction_ts TIMESTAMP DEFAULT CURRENT_TIMESTAMP ); -- Index on session_id for fast per-session retrieval CREATE INDEX IF NOT EXISTS idx_dashboard_activity_session ON io.thecodeforge.dashboard_activity (session_id); -- Index on interaction_ts for time-range analysis of dashboard usage CREATE INDEX IF NOT EXISTS idx_dashboard_activity_ts ON io.thecodeforge.dashboard_activity (interaction_ts DESC); -- Example: record a forecast run triggered from the Streamlit UI INSERT INTO io.thecodeforge.dashboard_activity (session_id, user_email, action_performed, widget_context) VALUES ( 'sess_882', 'editor@thecodeforge.io', 'run_forecast', '{"horizon_days": 30, "confidence": "95%", "target_metric": "daily_revenue"}' ); -- Retrieve activity for a specific session (useful for debugging user-reported issues) SELECT action_performed, widget_context, interaction_ts FROM io.thecodeforge.dashboard_activity WHERE session_id = 'sess_882' ORDER BY interaction_ts DESC;
Query executed: CREATE INDEX
Query executed: CREATE INDEX
Query executed: INSERT 1 row affected.
SELECT result:
action_performed | widget_context | interaction_ts
------------------+-------------------------------------------------------------+------------------------
run_forecast | {"horizon_days": 30, "confidence": "95%", "target_metric"...} | 2026-04-20 14:32:07
Java Integration: Consuming Dashboards via API
In hybrid infrastructure environments, your Streamlit app often serves as the UI layer for a Java or Go-based compute engine. The pattern is clean: the Java service owns the business logic and heavy computation, exposes it via a REST endpoint, and Streamlit calls that endpoint, caches the response, and handles visualization. This separation of concerns keeps your Streamlit script lightweight and your backend independently testable and deployable.
The critical rule: always cache the API call with @st.cache_data. Without it, Streamlit calls your Java backend on every single re-run — which means every slider move, every checkbox toggle, every character typed fires an HTTP request to your backend service. Under even modest concurrency, this becomes a self-inflicted DDoS.
package io.thecodeforge.api; import org.springframework.http.ResponseEntity; import org.springframework.web.bind.annotation.*; import java.time.Instant; /** * io.thecodeforge: REST API consumed by the Streamlit frontend. * Streamlit calls this endpoint via requests.get() with @st.cache_data applied. * All business logic and heavy computation lives here — not in the dashboard script. */ @RestController @RequestMapping("/api/v1/forge-metrics") public class DashboardController { /** * Returns summary metrics for the Streamlit dashboard KPI row. * Streamlit caches this response — this endpoint typically receives * 1 request per 5 minutes per user, not 1 per interaction. */ @GetMapping("/summary") public ResponseEntity<MetricResponse> getSummary( @RequestParam(defaultValue = "30") int horizonDays) { // In production: query your data warehouse or aggregation service here. // The compute stays in Java; the visualization stays in Streamlit. MetricResponse response = new MetricResponse( 1_204_847.50, "USD", 0.941, horizonDays, Instant.now().toString() ); return ResponseEntity.ok(response); } /** * Record used as the JSON response body. * Streamlit receives this as a dict after requests.get().json(). */ record MetricResponse( double revenue, String currency, double modelAccuracy, int forecastHorizonDays, String generatedAt ) {} }
GET /api/v1/forge-metrics/summary?horizonDays=30
HTTP 200 OK
{
"revenue": 1204847.50,
"currency": "USD",
"modelAccuracy": 0.941,
"forecastHorizonDays": 30,
"generatedAt": "2026-04-20T14:32:07Z"
}
st.empty() with a polling loop as a stopgap, or switch to Dash or Panel for a proper streaming UIDeploying Your Streamlit App — From Local to Live
For production deployments, Docker is the standard. It guarantees that your runtime environment — including system-level dependencies for libraries like OpenCV, PyTorch, or GeoPandas — is identical from local development to production. It also makes secrets management, health checking, and container orchestration straightforward.
The single most common Docker deployment mistake with Streamlit: forgetting --server.address=0.0.0.0. Without it, Streamlit binds to 127.0.0.1 inside the container. The app starts, the process runs, but no external connection can reach it. You see 'connection refused' in the browser and nothing obviously wrong in the logs.
# io.thecodeforge: Production Streamlit Container # Built on python:3.11-slim to minimize image size while retaining pip and venv support. FROM python:3.11-slim WORKDIR /app # Install system-level dependencies. # curl is required for the HEALTHCHECK command below. # Add any system packages your Python libraries need here (e.g., libgdal-dev for GeoPandas). RUN apt-get update \ && apt-get install -y --no-install-recommends curl \ && rm -rf /var/lib/apt/lists/* # Copy requirements first to leverage Docker layer caching. # If requirements.txt does not change, this layer is reused on rebuild. COPY requirements.txt . RUN pip install --no-cache-dir -r requirements.txt # Copy application source after dependencies to keep the source-change rebuild fast. COPY . . # Streamlit's default port. Expose it so orchestrators can route traffic correctly. EXPOSE 8501 # Health check using Streamlit's built-in health endpoint. # interval: how often to check. timeout: how long to wait. retries: failures before unhealthy. HEALTHCHECK \ --interval=30s \ --timeout=5s \ --start-period=10s \ --retries=3 \ CMD curl --fail http://localhost:8501/_stcore/health || exit 1 # --server.address=0.0.0.0 is REQUIRED in Docker. # Without it, Streamlit binds to 127.0.0.1 inside the container # and external connections get 'connection refused' with no obvious error. ENTRYPOINT [ "streamlit", "run", "app.py", "--server.port=8501", "--server.address=0.0.0.0", "--server.headless=true" ]
Step 2/8 : WORKDIR /app
Step 3/8 : RUN apt-get update && apt-get install -y --no-install-recommends curl
Step 4/8 : COPY requirements.txt .
Step 5/8 : RUN pip install --no-cache-dir -r requirements.txt
Step 6/8 : COPY . .
Step 7/8 : EXPOSE 8501
Step 8/8 : HEALTHCHECK ...
Successfully built a8f3c9d12e44
Successfully tagged thecodeforge/streamlit-dashboard:latest
| Feature / Aspect | Streamlit | Dash (Plotly) | Gradio |
|---|---|---|---|
| Learning curve | Minimal — pure Python script style, no frontend knowledge required | Moderate — callback-based reactive model requires understanding Input/Output wiring | Minimal — but strongly opinionated toward ML inference interfaces |
| Re-run model | Full script re-runs on every interaction — simple but requires disciplined caching | Targeted callbacks — only the components affected by an Input update | Event-driven per component — each function maps to specific UI elements |
| Best for | Data dashboards, internal tools, rapid prototyping, ML result visualization | Complex production-grade analytics apps where fine-grained update control matters | ML model demos, inference UIs, sharing models with non-technical stakeholders |
| Layout control | Good — columns, tabs, expanders, sidebar. Limited CSS customization without components | Excellent — full CSS and HTML control, Bootstrap integration, arbitrary component placement | Limited — opinionated grid layout, not suitable for complex multi-section dashboards |
| State management | st.session_state dictionary — simple but ephemeral, lost on refresh | Explicit callback Output/Input wiring — more verbose but gives precise control over what updates | Implicit per-function state — simple for single-function interfaces, awkward for multi-step flows |
| Concurrency model | Each session runs in its own thread — shared objects need @st.cache_resource for safety | Async callbacks supported — better suited for high-concurrency production workloads | Single-user focus by default — sharing a model demo link spins up separate instances |
| Production deployment | Docker, Streamlit Community Cloud, Kubernetes with sticky sessions | Docker, Gunicorn/uWSGI, standard WSGI deployment — same as any Flask app | Hugging Face Spaces (native), Docker, or standalone server |
| Custom JavaScript | Supported via st.components.v1 — but requires wrapping components manually | Native — arbitrary Dash components can include React and JavaScript | Not supported — Gradio controls the entire frontend |
🎯 Key Takeaways
- Streamlit's re-run-on-interaction model is its superpower and its main operational risk — mastering caching is non-negotiable before any production deployment.
- @st.cache_data is for serializable values like DataFrames and API responses; @st.cache_resource is for shared non-serializable objects like database connection pools and ML models.
- st.session_state initialization must always be guarded with a 'not in' check — without it, every re-run resets the state and users lose their progress.
- Use
st.form()to batch widget interactions into a single re-run on submit — it is the single most effective way to prevent re-run storms during multi-input data entry. - Never commit .streamlit/secrets.toml to version control — manage credentials via your platform's native secrets manager and rotate anything that was ever exposed.
Interview Questions on This Topic
- QDescribe three concrete strategies for optimizing a slow Streamlit app that has 50 concurrent users and re-runs taking 8+ seconds.Mid-levelReveal
- QExplain the difference between @st.cache_data and @st.cache_resource. What happens if you try to cache an open file handle or database connection with @st.cache_data?Mid-levelReveal
- QHow does st.session_state facilitate the creation of multi-step wizards or complex data entry forms?JuniorReveal
- QWhat are the security implications of using st.file_uploader in a public-facing dashboard, and how do you mitigate them?Mid-levelReveal
- QCan Streamlit run on AWS Lambda or GCP Cloud Run in request-based serverless mode? Explain the architectural constraints.SeniorReveal
Frequently Asked Questions
Is Streamlit good for production apps or just prototyping?
Streamlit is production-ready for internal tools, data dashboards, and apps with moderate traffic. Teams run it at enterprise scale behind Docker and Kubernetes with proper caching in place. For very high-traffic public apps — thousands of concurrent users — or apps that need fine-grained component-level updates without full script re-runs, Dash or a React plus FastAPI stack may be more appropriate. The limiting factor is not Streamlit's code quality; it is the full-script re-run model, which does not fit every use case.
How do I add authentication to a Streamlit app?
For simple internal tools, the streamlit-authenticator library provides username/password flows with hashed credentials. For apps deployed on Streamlit Community Cloud, you can restrict access to specific GitHub accounts or use OAuth2 with Google or GitHub. For enterprise SSO — SAML, OIDC, Active Directory — the standard approach is to deploy Streamlit behind a reverse proxy like Nginx or Caddy with an authentication layer such as oauth2-proxy or Cloudflare Access. Streamlit itself has no built-in authentication mechanism and should not be exposed to the internet without one.
Why does my Streamlit app lose all its data when I refresh the page?
Refreshing the browser starts a new session, which clears st.session_state entirely. Session state lives in server memory and is scoped to a single browser session — it was never designed to survive a refresh. For data that must persist across page refreshes, browser sessions, or server restarts, write it to an external store: PostgreSQL for structured data, Redis for short-lived key-value state, or S3 for large result files. Load it back at the start of each session.
How can I make my Streamlit app look more professional?
Start with st.set_page_config(layout='wide') to use the full browser width instead of Streamlit's default narrow column. Define a custom theme in .streamlit/config.toml — primary color, background color, and font. Organize content with st.tabs() to reduce vertical scrolling, st.columns() for side-by-side layouts, and st.expander() to hide secondary information behind a toggle. Use st.metric() for KPI cards instead of plain st.write() for numbers. For icons and logos, st.image() accepts URLs and local file paths. If you need more visual control than Streamlit's built-in components allow, st.components.v1.html() lets you inject raw HTML and CSS.
Developer and founder of TheCodeForge. I built this site because I was tired of tutorials that explain what to type without explaining why it works. Every article here is written to make concepts actually click.