Streamlit Data Apps — Uncached Queries Exhaust DB Pool
Uncached queries per slider interaction cause 150+ DB connections and 503 errors.
- Streamlit re-runs your entire Python script on every widget interaction — no callbacks, no event loop
- @st.cache_data caches serializable returns (DataFrames, API responses) — @st.cache_resource caches non-serializable objects (DB connections, ML models)
- st.session_state persists data across re-runs within a single browser session — but is lost on page refresh
- Use st.form() to batch widget inputs and prevent re-runs on every keystroke during data entry
- The #1 production mistake: loading data outside a cache decorator — every slider move re-queries the database
- Biggest misconception: Streamlit is only for prototypes. With proper caching and Docker deployment, it handles internal tools at enterprise scale
Most data insights die in Jupyter notebooks. A data scientist builds a forecasting model, but only someone who can run Python can actually see it. Streamlit fixes this — it turns any Python script into a live web app with zero frontend code.
The core trade-off: Streamlit re-runs your entire script on every interaction. This makes the programming model dead simple — your code stays linear, no callback wiring. But it also means un-cached operations like database queries, model loading, and file reads fire on every slider move. Without disciplined caching, your app grinds to a halt after the second click.
Productionizing Streamlit requires three things: caching decorators on every expensive operation, st.session_state for cross-interaction state, and a deployment strategy — Docker, Streamlit Community Cloud, or Kubernetes. Miss any of these and you have a prototype that breaks under real usage.
How Streamlit's Execution Model Actually Works (This Changes Everything)
Before you write a single widget, you need to understand Streamlit's most important — and most surprising — design decision: every time a user interacts with your app, Streamlit re-runs your entire Python script from top to bottom. Every. Single. Time.
This is completely different from how most web frameworks operate. There is no event loop, no callbacks, no onclick handler wiring. When a user moves a slider, Streamlit re-executes your script with the new slider value baked in as the widget's return value. It sounds expensive, and it can be — but it is also what makes Streamlit so easy to reason about.
The upside: your app logic stays linear and readable, exactly like a regular Python script. The downside: if you are loading a 2GB CSV or running a complex SQL query on every re-run, your app will be unusably slow within seconds. That is why caching is not optional — it is the single most critical design decision in any Streamlit app.
Caching and State: Making Your App Fast and Stateful
Streamlit gives you two caching decorators and they solve different problems. @st.cache_data is for functions that return data — CSVs, API responses, processed DataFrames. It serializes the return value using pickle, which means every user session gets its own copy. @st.cache_resource is for non-serializable objects like database connections or ML models — it stores the object reference directly and shares it across all sessions.
Then there is st.session_state — a dictionary that persists for the lifetime of a user's session. It disappears on page refresh, but it survives every re-run within that session. It is how you build multi-step forms, track login status, or accumulate user inputs without losing them between interactions.
Building a Real Multi-Page Data App with Layout and Forms
Real data apps require navigation, structured layouts, and forms that do not re-run the entire script on every character the user types. Streamlit handles multi-page navigation through a pages/ directory — any Python file placed there is automatically discovered and shown in the sidebar. Layout primitives like st.columns() and st.tabs() handle visual organization within a page.
Forms are particularly important for production apps. Without st.form(), every keystroke in a text input triggers a full script re-run — which means every keystroke fires your cached data loading function's cache key check, re-renders the entire chart, and redraws the page. With st.form(), all widget changes inside the form are buffered locally and a single re-run fires only when the user explicitly clicks the submit button.
Data Persistence: The SQL Backend
Streamlit's st.session_state is ephemeral by design. It lives in server memory, scoped to a single browser session, and disappears the moment the user refreshes the page, closes the tab, or the server restarts. For anything that needs to survive beyond a single session — audit logs, saved analysis results, user preferences, cross-session dashboards — you must write to an external persistent store.
At the enterprise level, a structured SQL backend is the standard approach. The pattern is straightforward: use @st.cache_resource to create a shared database connection pool once, and write session events to an audit table on key user actions. This gives you a complete record of dashboard activity without impacting the read performance of your main queries.
Java Integration: Consuming Dashboards via API
In hybrid infrastructure environments, your Streamlit app often serves as the UI layer for a Java or Go-based compute engine. The pattern is clean: the Java service owns the business logic and heavy computation, exposes it via a REST endpoint, and Streamlit calls that endpoint, caches the response, and handles visualization. This separation of concerns keeps your Streamlit script lightweight and your backend independently testable and deployable.
The critical rule: always cache the API call with @st.cache_data. Without it, Streamlit calls your Java backend on every single re-run — which means every slider move, every checkbox toggle, every character typed fires an HTTP request to your backend service. Under even modest concurrency, this becomes a self-inflicted DDoS.
Deploying Your Streamlit App — From Local to Live
For production deployments, Docker is the standard. It guarantees that your runtime environment — including system-level dependencies for libraries like OpenCV, PyTorch, or GeoPandas — is identical from local development to production. It also makes secrets management, health checking, and container orchestration straightforward.
The single most common Docker deployment mistake with Streamlit: forgetting --server.address=0.0.0.0. Without it, Streamlit binds to 127.0.0.1 inside the container. The app starts, the process runs, but no external connection can reach it. You see 'connection refused' in the browser and nothing obviously wrong in the logs.
| Feature / Aspect | Streamlit | Dash (Plotly) | Gradio |
|---|---|---|---|
| Learning curve | Minimal — pure Python script style, no frontend knowledge required | Moderate — callback-based reactive model requires understanding Input/Output wiring | Minimal — but strongly opinionated toward ML inference interfaces |
| Re-run model | Full script re-runs on every interaction — simple but requires disciplined caching | Targeted callbacks — only the components affected by an Input update | Event-driven per component — each function maps to specific UI elements |
| Best for | Data dashboards, internal tools, rapid prototyping, ML result visualization | Complex production-grade analytics apps where fine-grained update control matters | ML model demos, inference UIs, sharing models with non-technical stakeholders |
| Layout control | Good — columns, tabs, expanders, sidebar. Limited CSS customization without components | Excellent — full CSS and HTML control, Bootstrap integration, arbitrary component placement | Limited — opinionated grid layout, not suitable for complex multi-section dashboards |
| State management | st.session_state dictionary — simple but ephemeral, lost on refresh | Explicit callback Output/Input wiring — more verbose but gives precise control over what updates | Implicit per-function state — simple for single-function interfaces, awkward for multi-step flows |
| Concurrency model | Each session runs in its own thread — shared objects need @st.cache_resource for safety | Async callbacks supported — better suited for high-concurrency production workloads | Single-user focus by default — sharing a model demo link spins up separate instances |
| Production deployment | Docker, Streamlit Community Cloud, Kubernetes with sticky sessions | Docker, Gunicorn/uWSGI, standard WSGI deployment — same as any Flask app | Hugging Face Spaces (native), Docker, or standalone server |
| Custom JavaScript | Supported via st.components.v1 — but requires wrapping components manually | Native — arbitrary Dash components can include React and JavaScript | Not supported — Gradio controls the entire frontend |
Key Takeaways
- Streamlit's re-run-on-interaction model is its superpower and its main operational risk — mastering caching is non-negotiable before any production deployment.
- @st.cache_data is for serializable values like DataFrames and API responses; @st.cache_resource is for shared non-serializable objects like database connection pools and ML models.
- st.session_state initialization must always be guarded with a 'not in' check — without it, every re-run resets the state and users lose their progress.
- Use
st.form()to batch widget interactions into a single re-run on submit — it is the single most effective way to prevent re-run storms during multi-input data entry. - Never commit .streamlit/secrets.toml to version control — manage credentials via your platform's native secrets manager and rotate anything that was ever exposed.
Interview Questions on This Topic
- QDescribe three concrete strategies for optimizing a slow Streamlit app that has 50 concurrent users and re-runs taking 8+ seconds.Mid-levelReveal
- QExplain the difference between @st.cache_data and @st.cache_resource. What happens if you try to cache an open file handle or database connection with @st.cache_data?Mid-levelReveal
- QHow does st.session_state facilitate the creation of multi-step wizards or complex data entry forms?JuniorReveal
- QWhat are the security implications of using st.file_uploader in a public-facing dashboard, and how do you mitigate them?Mid-levelReveal
- QCan Streamlit run on AWS Lambda or GCP Cloud Run in request-based serverless mode? Explain the architectural constraints.SeniorReveal
Frequently Asked Questions
Is Streamlit good for production apps or just prototyping?
Streamlit is production-ready for internal tools, data dashboards, and apps with moderate traffic. Teams run it at enterprise scale behind Docker and Kubernetes with proper caching in place. For very high-traffic public apps — thousands of concurrent users — or apps that need fine-grained component-level updates without full script re-runs, Dash or a React plus FastAPI stack may be more appropriate. The limiting factor is not Streamlit's code quality; it is the full-script re-run model, which does not fit every use case.
How do I add authentication to a Streamlit app?
For simple internal tools, the streamlit-authenticator library provides username/password flows with hashed credentials. For apps deployed on Streamlit Community Cloud, you can restrict access to specific GitHub accounts or use OAuth2 with Google or GitHub. For enterprise SSO — SAML, OIDC, Active Directory — the standard approach is to deploy Streamlit behind a reverse proxy like Nginx or Caddy with an authentication layer such as oauth2-proxy or Cloudflare Access. Streamlit itself has no built-in authentication mechanism and should not be exposed to the internet without one.
Why does my Streamlit app lose all its data when I refresh the page?
Refreshing the browser starts a new session, which clears st.session_state entirely. Session state lives in server memory and is scoped to a single browser session — it was never designed to survive a refresh. For data that must persist across page refreshes, browser sessions, or server restarts, write it to an external store: PostgreSQL for structured data, Redis for short-lived key-value state, or S3 for large result files. Load it back at the start of each session.
How can I make my Streamlit app look more professional?
Start with st.set_page_config(layout='wide') to use the full browser width instead of Streamlit's default narrow column. Define a custom theme in .streamlit/config.toml — primary color, background color, and font. Organize content with st.tabs() to reduce vertical scrolling, st.columns() for side-by-side layouts, and st.expander() to hide secondary information behind a toggle. Use st.metric() for KPI cards instead of plain st.write() for numbers. For icons and logos, st.image() accepts URLs and local file paths. If you need more visual control than Streamlit's built-in components allow, st.components.v1.html() lets you inject raw HTML and CSS.
That's Python Libraries. Mark it forged?
4 min read · try the examples if you haven't