Intermediate 4 min · March 06, 2026

Streamlit Data Apps — Uncached Queries Exhaust DB Pool

Uncached queries per slider interaction cause 150+ DB connections and 503 errors.

N
Naren · Founder
Plain-English first. Then code. Then the interview question.
About
Quick Answer
  • Streamlit re-runs your entire Python script on every widget interaction — no callbacks, no event loop
  • @st.cache_data caches serializable returns (DataFrames, API responses) — @st.cache_resource caches non-serializable objects (DB connections, ML models)
  • st.session_state persists data across re-runs within a single browser session — but is lost on page refresh
  • Use st.form() to batch widget inputs and prevent re-runs on every keystroke during data entry
  • The #1 production mistake: loading data outside a cache decorator — every slider move re-queries the database
  • Biggest misconception: Streamlit is only for prototypes. With proper caching and Docker deployment, it handles internal tools at enterprise scale

Most data insights die in Jupyter notebooks. A data scientist builds a forecasting model, but only someone who can run Python can actually see it. Streamlit fixes this — it turns any Python script into a live web app with zero frontend code.

The core trade-off: Streamlit re-runs your entire script on every interaction. This makes the programming model dead simple — your code stays linear, no callback wiring. But it also means un-cached operations like database queries, model loading, and file reads fire on every slider move. Without disciplined caching, your app grinds to a halt after the second click.

Productionizing Streamlit requires three things: caching decorators on every expensive operation, st.session_state for cross-interaction state, and a deployment strategy — Docker, Streamlit Community Cloud, or Kubernetes. Miss any of these and you have a prototype that breaks under real usage.

How Streamlit's Execution Model Actually Works (This Changes Everything)

Before you write a single widget, you need to understand Streamlit's most important — and most surprising — design decision: every time a user interacts with your app, Streamlit re-runs your entire Python script from top to bottom. Every. Single. Time.

This is completely different from how most web frameworks operate. There is no event loop, no callbacks, no onclick handler wiring. When a user moves a slider, Streamlit re-executes your script with the new slider value baked in as the widget's return value. It sounds expensive, and it can be — but it is also what makes Streamlit so easy to reason about.

The upside: your app logic stays linear and readable, exactly like a regular Python script. The downside: if you are loading a 2GB CSV or running a complex SQL query on every re-run, your app will be unusably slow within seconds. That is why caching is not optional — it is the single most critical design decision in any Streamlit app.

Caching and State: Making Your App Fast and Stateful

Streamlit gives you two caching decorators and they solve different problems. @st.cache_data is for functions that return data — CSVs, API responses, processed DataFrames. It serializes the return value using pickle, which means every user session gets its own copy. @st.cache_resource is for non-serializable objects like database connections or ML models — it stores the object reference directly and shares it across all sessions.

Then there is st.session_state — a dictionary that persists for the lifetime of a user's session. It disappears on page refresh, but it survives every re-run within that session. It is how you build multi-step forms, track login status, or accumulate user inputs without losing them between interactions.

Building a Real Multi-Page Data App with Layout and Forms

Real data apps require navigation, structured layouts, and forms that do not re-run the entire script on every character the user types. Streamlit handles multi-page navigation through a pages/ directory — any Python file placed there is automatically discovered and shown in the sidebar. Layout primitives like st.columns() and st.tabs() handle visual organization within a page.

Forms are particularly important for production apps. Without st.form(), every keystroke in a text input triggers a full script re-run — which means every keystroke fires your cached data loading function's cache key check, re-renders the entire chart, and redraws the page. With st.form(), all widget changes inside the form are buffered locally and a single re-run fires only when the user explicitly clicks the submit button.

Data Persistence: The SQL Backend

Streamlit's st.session_state is ephemeral by design. It lives in server memory, scoped to a single browser session, and disappears the moment the user refreshes the page, closes the tab, or the server restarts. For anything that needs to survive beyond a single session — audit logs, saved analysis results, user preferences, cross-session dashboards — you must write to an external persistent store.

At the enterprise level, a structured SQL backend is the standard approach. The pattern is straightforward: use @st.cache_resource to create a shared database connection pool once, and write session events to an audit table on key user actions. This gives you a complete record of dashboard activity without impacting the read performance of your main queries.

Java Integration: Consuming Dashboards via API

In hybrid infrastructure environments, your Streamlit app often serves as the UI layer for a Java or Go-based compute engine. The pattern is clean: the Java service owns the business logic and heavy computation, exposes it via a REST endpoint, and Streamlit calls that endpoint, caches the response, and handles visualization. This separation of concerns keeps your Streamlit script lightweight and your backend independently testable and deployable.

The critical rule: always cache the API call with @st.cache_data. Without it, Streamlit calls your Java backend on every single re-run — which means every slider move, every checkbox toggle, every character typed fires an HTTP request to your backend service. Under even modest concurrency, this becomes a self-inflicted DDoS.

Deploying Your Streamlit App — From Local to Live

For production deployments, Docker is the standard. It guarantees that your runtime environment — including system-level dependencies for libraries like OpenCV, PyTorch, or GeoPandas — is identical from local development to production. It also makes secrets management, health checking, and container orchestration straightforward.

The single most common Docker deployment mistake with Streamlit: forgetting --server.address=0.0.0.0. Without it, Streamlit binds to 127.0.0.1 inside the container. The app starts, the process runs, but no external connection can reach it. You see 'connection refused' in the browser and nothing obviously wrong in the logs.

Streamlit vs Dash vs Gradio
Feature / AspectStreamlitDash (Plotly)Gradio
Learning curveMinimal — pure Python script style, no frontend knowledge requiredModerate — callback-based reactive model requires understanding Input/Output wiringMinimal — but strongly opinionated toward ML inference interfaces
Re-run modelFull script re-runs on every interaction — simple but requires disciplined cachingTargeted callbacks — only the components affected by an Input updateEvent-driven per component — each function maps to specific UI elements
Best forData dashboards, internal tools, rapid prototyping, ML result visualizationComplex production-grade analytics apps where fine-grained update control mattersML model demos, inference UIs, sharing models with non-technical stakeholders
Layout controlGood — columns, tabs, expanders, sidebar. Limited CSS customization without componentsExcellent — full CSS and HTML control, Bootstrap integration, arbitrary component placementLimited — opinionated grid layout, not suitable for complex multi-section dashboards
State managementst.session_state dictionary — simple but ephemeral, lost on refreshExplicit callback Output/Input wiring — more verbose but gives precise control over what updatesImplicit per-function state — simple for single-function interfaces, awkward for multi-step flows
Concurrency modelEach session runs in its own thread — shared objects need @st.cache_resource for safetyAsync callbacks supported — better suited for high-concurrency production workloadsSingle-user focus by default — sharing a model demo link spins up separate instances
Production deploymentDocker, Streamlit Community Cloud, Kubernetes with sticky sessionsDocker, Gunicorn/uWSGI, standard WSGI deployment — same as any Flask appHugging Face Spaces (native), Docker, or standalone server
Custom JavaScriptSupported via st.components.v1 — but requires wrapping components manuallyNative — arbitrary Dash components can include React and JavaScriptNot supported — Gradio controls the entire frontend

Key Takeaways

  • Streamlit's re-run-on-interaction model is its superpower and its main operational risk — mastering caching is non-negotiable before any production deployment.
  • @st.cache_data is for serializable values like DataFrames and API responses; @st.cache_resource is for shared non-serializable objects like database connection pools and ML models.
  • st.session_state initialization must always be guarded with a 'not in' check — without it, every re-run resets the state and users lose their progress.
  • Use st.form() to batch widget interactions into a single re-run on submit — it is the single most effective way to prevent re-run storms during multi-input data entry.
  • Never commit .streamlit/secrets.toml to version control — manage credentials via your platform's native secrets manager and rotate anything that was ever exposed.

Interview Questions on This Topic

  • QDescribe three concrete strategies for optimizing a slow Streamlit app that has 50 concurrent users and re-runs taking 8+ seconds.Mid-levelReveal
    First and most impactful: wrap every data loading function with @st.cache_data. Without caching, every re-run by every user re-executes the full data pipeline. With caching, 50 users sharing the same parameters hit the cache instead of the database — this alone can reduce load by 98%. Second: add st.form() around multi-widget inputs — Streamlit Community Cloud, Streamlit Cloud connections, and ML models loaded via @st.cache_resource. This is non-negotiable. Third: use st.form() for multi-input workflows so a re-run fires once on submit instead of once per keystroke. Additionally, minimize work in the script body — move expensive setup into cached functions, keep the top-level script as lightweight as possible. Use st.empty() and st.container() for partial UI updates where appropriate, and point the dashboard at a read replica rather than the production primary database.
  • QExplain the difference between @st.cache_data and @st.cache_resource. What happens if you try to cache an open file handle or database connection with @st.cache_data?Mid-levelReveal
    @st.cache_data serializes the return value using pickle and stores a copy per cache key. It is designed for data: DataFrames, lists, dicts, primitives. If you try to cache an open file handle or database connection with @st.cache_data, it will raise a serialization error at runtime because file handles and connection objects cannot be pickled. @st.cache_resource does not serialize — it stores the original object reference directly in memory. It is designed for infrastructure objects: database connection pools, ML models loaded into GPU memory, open file handles. The critical behavioral difference: @st.cache_data creates a separate copy per session, making it safe for multi-user apps. @st.cache_resource shares the same object instance across all sessions — which is what you want for a single connection pool, but it also means you need thread-safe objects.
  • QHow does st.session_state facilitate the creation of multi-step wizards or complex data entry forms?JuniorReveal
    st.session_state is a dictionary that persists across script re-runs within a single browser session. For a multi-step wizard, you store the current step index and all accumulated user inputs in session state. Each re-run reads the current step from state, renders the appropriate UI for that step, and updates state when the user advances or goes back. The critical implementation detail: always initialize session state keys with a 'if key not in st.session_state' guard. Without this guard, the initialization line executes on every re-run and resets whatever the user entered on previous steps — this is the most common bug in multi-step Streamlit flows. st.form() pairs naturally with this pattern by ensuring a re-run fires once on submit rather than once per keystroke, so partial form data does not trigger premature state updates.
  • QWhat are the security implications of using st.file_uploader in a public-facing dashboard, and how do you mitigate them?Mid-levelReveal
    st.file_uploader allows users to upload arbitrary files to your server. The risks are: (1) uploading malicious executables or scripts disguised as data files — an attacker uploads a .csv that is actually a Python script and hopes the app executes it; (2) denial-of-service via extremely large file uploads that exhaust disk space or server memory; (3) path traversal attacks if the file is saved to a predictable or user-controlled location on disk. Mitigations: validate file extensions and MIME types immediately after upload using the Python-magic library, not just the filename extension which is trivially spoofed. Set a hard file size limit using the maxUploadSize server config option. Process uploaded files entirely in memory — never write them to a predictable path on disk. If disk persistence is required, write to a sandboxed temporary directory with a randomized name. For public-facing apps, add rate limiting at the reverse proxy layer to prevent upload flooding.
  • QCan Streamlit run on AWS Lambda or GCP Cloud Run in request-based serverless mode? Explain the architectural constraints.SeniorReveal
    Streamlit cannot run on AWS Lambda at all. Lambda is a request-response serverless model with a hard execution timeout of 15 minutes and no support for long-lived WebSocket connections. Streamlit's frontend communicates with the server via a persistent WebSocket — this is how widget interactions trigger re-runs without full HTTP round-trips. Lambda terminates connections between requests, which fundamentally breaks the Streamlit communication model. GCP Cloud Run can work, but only in always-allocated CPU mode — not in the default request-based mode. In request-based mode, Cloud Run pauses container CPU between HTTP requests, which drops the WebSocket connection and causes the browser to show 'Connection lost'. With always-allocated CPU and a minimum instance count of 1, Cloud Run keeps the container alive and WebSocket connections stay open. For production Streamlit deployments, the correct targets are ECS Fargate, GKE or Kubernetes, a plain VM behind Nginx, or Streamlit Community Cloud — all of which support long-lived TCP connections without the constraints of pure serverless.

Frequently Asked Questions

Is Streamlit good for production apps or just prototyping?

Streamlit is production-ready for internal tools, data dashboards, and apps with moderate traffic. Teams run it at enterprise scale behind Docker and Kubernetes with proper caching in place. For very high-traffic public apps — thousands of concurrent users — or apps that need fine-grained component-level updates without full script re-runs, Dash or a React plus FastAPI stack may be more appropriate. The limiting factor is not Streamlit's code quality; it is the full-script re-run model, which does not fit every use case.

How do I add authentication to a Streamlit app?

For simple internal tools, the streamlit-authenticator library provides username/password flows with hashed credentials. For apps deployed on Streamlit Community Cloud, you can restrict access to specific GitHub accounts or use OAuth2 with Google or GitHub. For enterprise SSO — SAML, OIDC, Active Directory — the standard approach is to deploy Streamlit behind a reverse proxy like Nginx or Caddy with an authentication layer such as oauth2-proxy or Cloudflare Access. Streamlit itself has no built-in authentication mechanism and should not be exposed to the internet without one.

Why does my Streamlit app lose all its data when I refresh the page?

Refreshing the browser starts a new session, which clears st.session_state entirely. Session state lives in server memory and is scoped to a single browser session — it was never designed to survive a refresh. For data that must persist across page refreshes, browser sessions, or server restarts, write it to an external store: PostgreSQL for structured data, Redis for short-lived key-value state, or S3 for large result files. Load it back at the start of each session.

How can I make my Streamlit app look more professional?

Start with st.set_page_config(layout='wide') to use the full browser width instead of Streamlit's default narrow column. Define a custom theme in .streamlit/config.toml — primary color, background color, and font. Organize content with st.tabs() to reduce vertical scrolling, st.columns() for side-by-side layouts, and st.expander() to hide secondary information behind a toggle. Use st.metric() for KPI cards instead of plain st.write() for numbers. For icons and logos, st.image() accepts URLs and local file paths. If you need more visual control than Streamlit's built-in components allow, st.components.v1.html() lets you inject raw HTML and CSS.

🔥

That's Python Libraries. Mark it forged?

4 min read · try the examples if you haven't

Previous
Pydantic for Data Validation
22 / 51 · Python Libraries
Next
Playwright Python — Browser Automation and Testing