Intermediate 3 min · July 05, 2026

FastAPI Streaming & File Responses: Stop Loading Everything Into Memory

Q: How do I stream a large CSV file from a database query in FastAPI without loading it all into memory?

Use StreamingResponse with an async generator that yields each CSV row as a string. For example, iterate over a database cursor (using async SQLAlchemy or databases library) and yield `f"{row.id},{row.email}\n"`. Set media_type to 'text/csv' and add Content-Disposition header for download.

Q: What's the difference between FileResponse and StreamingResponse in FastAPI?

FileResponse serves a file from disk using the OS sendfile syscall, which is zero-copy and automatically handles Range headers. StreamingResponse sends data from a generator chunk by chunk, suitable for dynamically generated content. Use FileResponse for static files, StreamingResponse for live data.

Q: How do I set Content-Length header in a StreamingResponse?

Pass it in the headers parameter: `StreamingResponse(generator(), headers={'Content-Length': str(total_size)})`. You must know the total size in advance (e.g., from database count or file size). Without it, clients see 'chunked' encoding and can't show progress.

Q: Can I use StreamingResponse to serve a video file with seeking support?

No, StreamingResponse does not handle Range headers. For video seeking, use FileResponse which automatically parses Range headers and returns 206 Partial Content. If you must use StreamingResponse, you'd need to manually parse the Range header and seek in your generator, which is complex and error-prone.

FastAPI streaming responses and file responses explained with production patterns.

Naren Founder & Principal Engineer

20+ years shipping production Python across data and backend systems. Lessons pulled from things that broke in production.

✓ Production

production tested

July 05, 2026

last updated

141

articles · all by Naren

Before you start⏱ 25 min

✓Basic FastAPI app setup
✓Understanding of async/await in Python
✓Familiarity with HTTP responses

● Production Incident 🔎 Debug Guide ⚙ Triage Commands

⚡Quick Answer

Use StreamingResponse when you need to stream dynamically generated or large data chunk by chunk. Use FileResponse to serve static files directly from disk without buffering in Python. Both prevent memory exhaustion under load.

✦ Definition~90s read

What is FastAPI Streaming Responses and File Responses?

FastAPI StreamingResponse sends data in chunks as it becomes available, without loading the entire payload into memory. FileResponse serves files efficiently using the OS's sendfile syscall, avoiding Python's memory overhead.

★

Imagine you're filling a swimming pool with a bucket.

Plain-English First

Imagine you're filling a swimming pool with a bucket. Normal Response fills the bucket completely, walks to the pool, dumps it, and repeats. StreamingResponse is a hose — water flows continuously as you turn the tap. FileResponse is a pipe connected directly to the reservoir — the water never touches your bucket at all.

⚙ Browser compatibility

Latest versions — ✓ supported

Chrome	Firefox	Safari	Edge
✓	✓	✓	✓

Most FastAPI tutorials show you returning a list of dictionaries or a Pydantic model. That works fine for a JSON API returning 100 users. But the moment you try to return a 2GB CSV export or stream a live video feed, your server falls over. Memory spikes, workers crash, and you're debugging at 2 AM why your container got OOM-killed. The problem isn't FastAPI — it's that you're buffering the entire response in memory before sending a single byte. StreamingResponse and FileResponse are the tools that fix this. By the end of this article, you'll know exactly when to use each, how to avoid the common pitfalls that burn production systems, and how to serve large data without breaking a sweat.

Why You Shouldn't Return a List When You Can Stream

The default FastAPI response pattern — return a list or dict, let it serialize to JSON — works by building the entire response body in memory before sending. For small payloads, this is fine. For large ones, it's a disaster. Every concurrent request doubles memory usage. At 100 concurrent requests for a 500MB CSV, you're looking at 50GB RAM. StreamingResponse lets you send data as it's produced. The client sees the first bytes almost instantly, and your server memory stays flat regardless of response size. The trade-off is that you lose automatic Content-Length headers and error handling becomes trickier — if the stream fails mid-way, the client gets a truncated response. But for large datasets, video, or real-time data, there's no alternative.

csv_streaming.pyPYTHON

# io.thecodeforge — Python tutorial

from fastapi import FastAPI
from fastapi.responses import StreamingResponse
import csv
import io

app = FastAPI()

async def generate_csv_rows():
    # Simulate a database cursor yielding rows
    for i in range(1000000):
        yield f"{i},user_{i}@example.com,active\n"

@app.get("/export/users")
async def export_users():
    # StreamingResponse takes an async generator or iterable
    # Content-Type is set manually; no Content-Length unless you know it
    return StreamingResponse(
        generate_csv_rows(),
        media_type="text/csv",
        headers={"Content-Disposition": "attachment; filename=users.csv"}
    )

Output

HTTP/1.1 200 OK

Content-Type: text/csv

Content-Disposition: attachment; filename=users.csv

Transfer-Encoding: chunked

0,user_0@example.com

1,user_1@example.com

... (streaming)

Production Trap: Missing Content-Length

Without Content-Length, clients can't show download progress. For large files, pre-calculate the size (e.g., from database count) and set it in headers. Otherwise, browsers show 'unknown size' and proxies may buffer the entire stream.

FileResponse: The OS Does the Heavy Lifting

When you need to serve a static file from disk, FileResponse is your friend. It uses the operating system's sendfile syscall to transfer data directly from the file descriptor to the network socket, bypassing Python's memory entirely. This means zero copy — the file never enters your application's memory space. For large files (videos, ISOs, logs), this is orders of magnitude more efficient than reading the file into a bytes object and returning it. FileResponse also automatically handles range requests for partial content, enabling pause/resume downloads and video seeking. The catch: it only works with actual files on disk, not dynamically generated content. And it blocks the event loop if the file is on a slow filesystem (NFS, network mounts) — use it with async file I/O or offload to a thread pool.

file_response_example.pyPYTHON

# io.thecodeforge — Python tutorial

from fastapi import FastAPI
from fastapi.responses import FileResponse
import os

app = FastAPI()

@app.get("/download/{filename}")
async def download_file(filename: str):
    file_path = f"/data/uploads/{filename}"
    if not os.path.exists(file_path):
        return {"error": "File not found"}, 404
    # FileResponse automatically sets Content-Type, Content-Length, and handles Range headers
    return FileResponse(
        path=file_path,
        filename=filename,  # Override the download name
        media_type="application/octet-stream"  # Force download
    )

Output

HTTP/1.1 200 OK

Content-Type: application/octet-stream

Content-Disposition: attachment; filename="report.pdf"

Content-Length: 2048576

Accept-Ranges: bytes

(binary data streamed via sendfile)

Senior Shortcut: Range Requests for Free

FileResponse automatically handles HTTP Range headers. This means video players can seek, download managers can resume, and you don't write a single line of range-parsing code. Test it with curl -H "Range: bytes=0-1023" http://localhost:8000/download/video.mp4.

Streaming JSON: Don't Wait for the Whole Array

Returning a large JSON array as a single Response forces the client to wait for the entire serialization to complete. For paginated APIs, this is fine. But for real-time dashboards or log streams, you want to send JSON objects as they become available. The trick is to use StreamingResponse with a generator that yields JSON-encoded objects separated by newlines (JSON Lines format) or as a continuous array. The client can parse each line as it arrives. This is how Twitter's streaming API works. The downside: standard JSON parsers expect a complete document. You'll need a streaming JSON parser on the client side (e.g., ijson for Python, or JSON.parse on each line in JavaScript).

json_streaming.pyPYTHON

# io.thecodeforge — Python tutorial

from fastapi import FastAPI
from fastapi.responses import StreamingResponse
import json
import asyncio

app = FastAPI()

async def event_stream():
    # Simulate a stream of events from a queue
    for i in range(10):
        event = {"id": i, "message": f"Event {i}"}
        yield json.dumps(event) + "\n"
        await asyncio.sleep(0.5)

@app.get("/events")
async def stream_events():
    return StreamingResponse(
        event_stream(),
        media_type="application/x-ndjson"  # Newline-delimited JSON
    )

Output

HTTP/1.1 200 OK

Content-Type: application/x-ndjson

Transfer-Encoding: chunked

{"id": 0, "message": "Event 0"}

{"id": 1, "message": "Event 1"}

... (one per 0.5s)

Interview Gold: JSON Lines vs Regular JSON

JSON Lines (NDJSON) is streamable and append-friendly. Regular JSON arrays require closing the bracket at the end, making them non-streamable. For high-throughput APIs, prefer NDJSON or Server-Sent Events over a single JSON array.

When Streaming Breaks: The Gotchas You'll Hit

StreamingResponse isn't magic. It has sharp edges. First, if your generator raises an exception mid-stream, the client gets a truncated response with no error indication. Always wrap generator logic in try/except and log errors. Second, without a Content-Length, proxies like Nginx or Cloudflare may buffer the entire response before forwarding it to the client, negating the memory benefit. Set the X-Accel-Buffering: no header for Nginx, or disable buffering in your reverse proxy. Third, streaming is incompatible with some middleware that reads the entire body (e.g., compression middleware). If you need compression, compress chunks yourself. Finally, async generators must not perform blocking I/O — use async libraries or run blocking calls in a thread pool.

streaming_with_error_handling.pyPYTHON

# io.thecodeforge — Python tutorial

from fastapi import FastAPI
from fastapi.responses import StreamingResponse
import logging

logger = logging.getLogger(__name__)

app = FastAPI()

async def safe_generator():
    try:
        for i in range(100):
            if i == 50:
                raise ValueError("Simulated failure at row 50")
            yield f"row {i}\n"
    except Exception as e:
        logger.error(f"Stream failed at row {i}: {e}")
        # Yield an error message so client knows something went wrong
        yield f"\nERROR: {e}\n"

@app.get("/export/unsafe")
async def export_unsafe():
    return StreamingResponse(safe_generator(), media_type="text/plain")

Output

HTTP/1.1 200 OK

...

row 0

row 1

...

row 49

ERROR: Simulated failure at row 50

(Note: HTTP status is 200 even though stream failed. Client must parse error from body.)

Never Do This: Blocking I/O in Async Generator

If your generator calls time.sleep() or requests.get(), the entire event loop blocks. Use asyncio.sleep() and httpx.AsyncClient. For CPU-bound work, use asyncio.to_thread() or a process pool. Otherwise, your streaming endpoint becomes a single-threaded bottleneck.

FileResponse vs StreamingResponse: When to Use Which

FileResponse is for static files on disk. It's zero-copy, handles range requests, and sets Content-Type automatically. Use it for serving uploaded files, static assets, or any file that exists before the request. StreamingResponse is for dynamically generated content: database query results, real-time data, or data that doesn't exist as a file. It gives you full control over the output format and timing. The rule of thumb: if the data is already on disk, use FileResponse. If you're generating it on the fly, use StreamingResponse. Never read a file into memory just to return it as a Response — that's the classic rookie mistake that costs you memory.

Production Patterns: Rate Limiting and Backpressure

Streaming responses can overwhelm clients or downstream services if you produce data faster than they can consume it. In a production system, you need backpressure. One pattern is to use an asyncio.Queue with a max size: if the queue is full, the producer waits. Another is to yield chunks at a controlled rate using asyncio.sleep(). For file downloads, the OS's TCP stack provides natural backpressure — sendfile blocks when the socket buffer is full. But for generated streams, you must implement it yourself. I've seen a logging pipeline crash a Kafka cluster because the producer streamed events faster than the broker could ingest them. The fix was to add a sliding window that limited the number of outstanding unacknowledged events.

backpressure_example.pyPYTHON

# io.thecodeforge — Python tutorial

from fastapi import FastAPI
from fastapi.responses import StreamingResponse
import asyncio

app = FastAPI()

async def rate_limited_stream():
    queue = asyncio.Queue(maxsize=10)
    
    async def producer():
        for i in range(1000):
            await queue.put(f"event {i}\n")
            # Producer waits if queue is full
        await queue.put(None)  # Sentinel
    
    async def consumer():
        while True:
            item = await queue.get()
            if item is None:
                break
            yield item
    
    asyncio.create_task(producer())
    async for chunk in consumer():
        yield chunk

@app.get("/stream/backpressure")
async def stream_with_backpressure():
    return StreamingResponse(rate_limited_stream(), media_type="text/plain")

Output

Streams events with backpressure. If client is slow, producer blocks when queue is full.

Production Trap: Unbounded Queues

Never use an unbounded asyncio.Queue in a streaming producer. If the client disconnects, the producer keeps filling memory until OOM. Always set a maxsize and handle the asyncio.QueueFull exception.

When Not to Stream: The Case for Simple Responses

Streaming is not free. It adds complexity: error handling, missing Content-Length, proxy buffering issues, and client-side parsing requirements. If your response fits comfortably in memory (say, under 10MB), just return a normal Response. The overhead of setting up a generator and managing the stream isn't worth it. Also, if your API consumers expect a complete JSON document (most do), streaming JSON requires them to use a streaming parser. For internal APIs where both sides are under your control, streaming is fine. For public APIs, stick with pagination or standard responses unless you have a clear need for low latency or huge payloads.

Senior Shortcut: The 10MB Rule

If your response is under 10MB, don't stream. The memory cost is negligible, and you avoid all the streaming headaches. Profile your endpoints to know your typical payload sizes.

● Production incidentPOST-MORTEMseverity: high

The 4GB Container That Kept Dying

Symptom

A CSV export endpoint crashed the server every time a user requested data spanning 6 months. The container had 4GB RAM and would OOM-kill after 30 seconds.

Assumption

The team assumed the database query was too slow. They added indexes and pagination, but the crash persisted.

Root cause

The endpoint was building the entire CSV in a list of strings in memory, then joining them into one giant string, and finally returning it as a Response. For a 3GB CSV, that meant 3GB for the list + 3GB for the joined string = 6GB peak memory. The container only had 4GB.

Fix

Replaced the Response with a StreamingResponse that yields CSV rows one at a time. Memory dropped from 6GB to under 50MB. Added a Content-Length header by pre-calculating the size from the query count.

Key lesson

Never build the entire response body in memory.
If you can generate it iteratively, stream it.

Production debug guideSystematic recovery paths for the failure modes engineers actually hit.3 entries

Symptom · 01

OOM killer kills container during large file download

→

Fix

1. Check if using FileResponse (zero-copy) or reading file into memory. 2. Replace return Response(content=open('file','rb').read()) with return FileResponse(path='file'). 3. Monitor memory with docker stats or ps aux.

Symptom · 02

Client gets truncated response without error

→

Fix

1. Check server logs for exceptions in generator. 2. Add try/except in generator and yield error message. 3. Set Content-Length if possible so client detects truncation.

Symptom · 03

Streaming endpoint is slow under load

→

Fix

1. Check if generator is doing blocking I/O. 2. Replace time.sleep() with asyncio.sleep(). 3. Use asyncio.to_thread() for CPU-bound work. 4. Profile with cProfile or py-spy.

★ FastAPI Streaming Responses and File Responses Triage Cheat SheetFirst-response commands for when things go wrong — copy-paste ready.

Memory grows with response size−

Immediate action

Check if using Response instead of StreamingResponse or FileResponse

Commands

`curl -s -o /dev/null -w '%{size_download}' http://localhost:8000/export`

`docker stats <container>`

Fix now

Replace return Response(content=data) with return StreamingResponse(generator()) or return FileResponse(path)

Client reports 'unknown size' in download+

Stream stops mid-response, no error+

All requests hang when streaming is active+

Feature	FileResponse	StreamingResponse
Memory usage	Zero-copy (sendfile)	Minimal (chunk buffer)
Range requests	Automatic	Manual implementation required
Dynamic content	No	Yes
Content-Length	Automatic	Must set manually if known
Error handling	HTTP error codes	Must handle in generator
Proxy compatibility	Works with buffering proxies	May need X-Accel-Buffering: no

⚙ Quick Reference

5 commands from this guide

File	Command / Code	Purpose
csv_streaming.py	from fastapi import FastAPI	Why You Shouldn't Return a List When You Can Stream
file_response_example.py	from fastapi import FastAPI	FileResponse
json_streaming.py	from fastapi import FastAPI	Streaming JSON
streaming_with_error_handling.py	from fastapi import FastAPI	When Streaming Breaks
backpressure_example.py	from fastapi import FastAPI	Production Patterns

Key takeaways

Never build the entire response body in memory

use StreamingResponse for dynamic data, FileResponse for static files.

FileResponse uses zero-copy sendfile

it's always more efficient than reading a file into memory.

Streaming without backpressure or error handling will crash in production

always wrap generators in try/except and limit queue sizes.

The 10MB rule

if your response fits in 10MB, don't stream. The complexity isn't worth it.

INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01SENIOR

How does FileResponse handle HTTP Range requests internally, and what ha...

Q02SENIOR

When would you choose StreamingResponse over FileResponse for serving a ...

Q03SENIOR

What happens to a StreamingResponse if the client disconnects mid-stream...

Q04JUNIOR

What is the difference between StreamingResponse and a normal Response i...

Q05SENIOR

You're debugging a production issue where a StreamingResponse endpoint w...

Q06SENIOR

Design a system that serves 10GB video files to thousands of concurrent ...

Q01 of 06SENIOR

How does FileResponse handle HTTP Range requests internally, and what happens if the file is modified between the request and the response?

ANSWER

FileResponse uses the OS sendfile syscall, which reads the file at the time of the call. If the file is modified after the response starts, the client may receive a mix of old and new data. To prevent this, serve files from immutable storage or use ETags/Last-Modified headers. Range requests are parsed by Starlette's FileResponse, which seeks to the requested offset and sends only the requested bytes.

FAQ · 4 QUESTIONS

Frequently Asked Questions

How do I stream a large CSV file from a database query in FastAPI without loading it all into memory?

What's the difference between FileResponse and StreamingResponse in FastAPI?

How do I set Content-Length header in a StreamingResponse?

Can I use StreamingResponse to serve a video file with seeking support?

COMPLETE GUIDE

FastAPI Complete Guide — Interactive Tutorial for Production APIs →

Every FastAPI concept with runnable in-browser examples — params, Pydantic, dependency injection, JWT auth, async, SQLAlchemy, testing, WebSockets, and Docker deployment. The interactive reference for production engineers.

Naren Founder & Principal Engineer

20+ years shipping production Python across data and backend systems. Lessons pulled from things that broke in production.

✓ Verified

production tested

July 05, 2026

last updated

141

articles · all by Naren

🔥

That's Python Libraries. Mark it forged?

3 min read · try the examples if you haven't