Python File Handling — 'w' Mode Truncates on Open
Using 'w' mode truncates files on open(), not write().
20+ years shipping production Python across data and backend systems. Everything here is grounded in real deployments.
- The
withstatement guarantees file close even on exceptions 'r'mode reads;'w'truncates existing file on open;'a'appends- Use
encoding='utf-8'to avoid UnicodeDecodeError on Windows - Iterating the file object reads one line at a time — O(1) memory per line
- For large files, avoid
read()— it loads the whole file into RAM - Biggest trap: opening with
'w'and crashing — file is already empty
Think of a file on your computer like a physical notebook locked in a drawer. To use it, you need to unlock the drawer (open the file), do your work — read a page, write a note, add to the end — and then lock it back up (close the file). If you forget to lock the drawer and walk away, things can go missing or get corrupted. Python's file handling is just that workflow, but automated so you never forget to lock the drawer.
File handling in Python is the bridge between your runtime logic and persistent data. Without it, every variable you compute vanishes the second the script ends. Most developers learn open(), read(), and write() by rote, then ship code that silently corrupts files, hangs on large payloads, or crashes when a CSV lacks a header. Getting file I/O right means understanding modes, context managers, path objects, and the real gotchas that separate a prototype from production-ready code.
What File Handling in Python Actually Does
File handling in Python is the set of operations to read from or write to files on disk, mediated by the built-in open() function. The core mechanic is that open() returns a file object, which acts as a stream between your program and the underlying file descriptor. The mode string you pass — 'r', 'w', 'a', 'x', and their binary variants — determines the initial file pointer position, whether the file is truncated, and whether writes are appended or overwritten.
When you open a file in 'w' mode, Python truncates the file to zero length immediately upon opening — before any write() call. This means the file's existing content is destroyed the moment open() executes, not when you first write data. The file pointer starts at position 0, so every subsequent write overwrites from the beginning. This behavior is consistent across all operating systems because it maps directly to the POSIX open() system call with O_WRONLY | O_CREAT | O_TRUNC flags.
Use 'w' mode when you intend to replace an entire file with new content — for example, writing a fresh configuration file, a log rotation output, or a serialized snapshot. Never use 'w' if you need to preserve existing data or append to a file; that's what 'a' (append) or 'r+' (read-write without truncation) are for. In production systems, accidentally using 'w' instead of 'a' is a common cause of data loss in log aggregators and state dumps.
write(), the file is already empty — your data is gone.open() time, not at write() time — the data is gone before your first write.Opening and Reading Files — and Why the 'with' Statement Exists
Before you can do anything with a file, Python needs a file object — a live connection to that file on disk. You create one with the built-in function. The two most important arguments are the file path and the mode: open()'r' for read, 'w' for write, 'a' for append, and 'b' tacked on for binary (e.g., 'rb').
Here's the thing most tutorials gloss over: acquires an operating-system resource. The OS gives your process a file descriptor — a numbered slot in a limited table. If you open files without closing them, you eventually exhaust that table and get a open()Too many open files error. Worse, unflushed writes may never reach disk.
The with statement solves this by acting as a guaranteed cleanup mechanism. It calls the instant the indented block exits — whether that exit is normal, via file.close()return, or via a crashing exception. You should use with open(...) every single time, with no exceptions. The only reason the two-line / open() pattern still exists in docs is historical; treat it as legacy.close()
Reading has three flavours: loads the entire file into one string (fine for small files, dangerous for large ones), read() fetches one line at a time, and readline() returns a list of all lines. For most real work, iterating the file object directly is the cleanest and most memory-efficient approach — Python streams one line at a time without loading everything.readlines()
encoding='utf-8' to open(). Without it, Python uses the platform's default encoding — which is UTF-8 on Mac/Linux but often CP1252 on Windows. That mismatch is the cause of countless 'UnicodeDecodeError' bugs that only appear on certain machines. Make UTF-8 your default and move on.encoding='utf-8' — the file was actually UTF-8, but the default assumption broke everything.chardet.detect() on the first bytes and pass the result to open(). Never trust platform defaults.with open(...) every single time.encoding='utf-8' explicitly.read() on large files.Writing and Appending — Understanding Why 'w' Mode Is a Trap
Writing to a file sounds simple — and it is, once you understand the critical difference between 'w' (write) and 'a' (append) mode, because confusing them is one of the most common ways to destroy your own data.
Opening a file in 'w' mode does two things: it creates the file if it doesn't exist, and it truncates the file to zero bytes if it does exist — immediately, before you write a single character. That means opening an existing file with 'w' and then crashing before writing anything leaves you with an empty file. The original content is gone.
Append mode ('a') is safer for ongoing data: the file is created if absent, but if it exists, the write cursor starts at the very end. Existing content is untouched. This is exactly what you want for log files, audit trails, or any data you're accumulating over time.
For writing structured data — think generating a report or saving configuration — 'w' is correct because you intentionally want a fresh file each run. For recording events as they happen over time, 'a' is correct. Choosing wrong corrupts data silently, which is why understanding the WHY matters more than memorising the letters.
write() takes a single string. Unlike print(), it does not add a newline automatically — you must include ' ' yourself. writelines() accepts a list of strings and also skips automatic newlines, so each string in your list must already end with ' ' if you want line breaks.
open() is called — not when you call write(). If your next line crashes, the file is already empty. For any file you care about, consider writing to a temporary file first and renaming it over the original only after a successful write. This is the atomic write pattern and it's how professional tools like text editors protect your data.'w' mode by default. A connection timeout hit between open() and write() — the output file was truncated, and the downstream system ingested an empty file, overwriting the previous day's correct data. Recovery required a database restore..tmp path, then os.rename() to the final name.'w' truncates on open, not on write.'a' preserves existing content.Error Handling and File Existence — Writing Code That Doesn't Embarrass You in Production
A file operation in production will eventually fail. The path doesn't exist, the disk is full, the process doesn't have permission, the file is locked by another process. Ignoring this is fine in a throwaway script and a fireable offence in production code.
Python raises specific exceptions for file errors. FileNotFoundError fires when you try to read a file that doesn't exist. PermissionError fires when the OS blocks access. IsADirectoryError fires when you accidentally pass a directory path where a file path was expected. All three are subclasses of OSError, so catching OSError covers all of them — but catching the specific type gives you better error messages.
A common anti-pattern is checking before opening a file. This looks safe but it's a race condition: between your check and your os.path.exists(), another process can delete or create that file. The correct pattern is EAFP (Easier to Ask Forgiveness than Permission) — just open()try to open it and handle the exception. Python was designed around this idiom.
For reading a file that might not exist yet (like a config file on first run), a clean pattern is to catch FileNotFoundError and return a sensible default rather than crashing. For writing, you often want the opposite: verify the destination directory exists and create it if not, using pathlib.Path.mkdir(parents=True, exist_ok=True) before writing.
if os.path.exists(path): open(path). EAFP (Easier to Ask Forgiveness than Permission) just tries it: try: open(path) except FileNotFoundError. Python officially prefers EAFP because it's faster (no double stat call), race-condition-free, and more readable. Interviewers love this distinction — knowing the names and reasons will set you apart.os.path.exists() before opening a user-uploaded file. Under high load, a race condition occurred: two requests processed the same upload simultaneously — one deleted the file after the existence check but before open(). The second request crashed with a 500 error. The user got an opaque error page.os.path.exists() is a TOCTOU race condition.Binary Files and pathlib — Handling Images, PDFs and Modern Path Management
Not all files are text. Images, PDFs, audio, compiled code, and serialised data are binary — they contain bytes that aren't valid UTF-8 text. Open them in text mode and you'll get a UnicodeDecodeError at best, or silently corrupted data at worst. Binary mode ('rb', 'wb') tells Python to skip all encoding/decoding and work with raw bytes.
A very common real-world task is copying or processing binary files — resizing images, attaching files to emails, or storing uploaded files from a web form. The pattern is identical to text mode but you read bytes objects instead of strings.
For path manipulation, the modern way in Python 3.4+ is pathlib.Path. Forget string concatenation with — os.path.join()pathlib lets you build paths with / operator, check existence with .exists(), get the file extension with .suffix, list directory contents with .iterdir(), and open files directly via . It's more readable, cross-platform by default, and object-oriented in a way that makes your intent clear.path.open()
When you're processing a directory full of files — a common automation task — pathlib with a generator expression is significantly cleaner than combined with string filtering. The pattern os.listdir()Path('data').glob('*.csv') gives you an iterator of all CSV files in the directory, ready to open.
os.path.join(base_dir, 'subfolder', filename), switch to pathlib today. Path(base_dir) / 'subfolder' / filename does exactly the same thing and is immediately readable. pathlib objects also work directly with open(), shutil, and most standard library functions, so there's no conversion overhead — it's a straight upgrade.os.path.join with hardcoded backslashes for a Windows path. When they moved to Linux Docker containers, the path broke because os.path.join doesn't handle mixed separators. Switching to pathlib.Path fixed it — the code now runs identically on both platforms.pathlib for any path that might cross operating system boundaries. It's not just cleaner — it's portable.'rb'/'wb' — text mode corrupts them.pathlib is the modern, cross-platform way to handle paths..glob() and .iterdir() for bulk file operations.Working with CSV Files — The Most Common Production File Format
CSV files are everywhere. Exports from databases, spreadsheets, logs, API responses — CSV is the lingua franca of data exchange. But CSV handling has traps: quoting, encoding, newlines inside fields, and missing headers.
Python's csv module handles most of this correctly if you use it right. The common beginner mistake is reading a CSV file manually with for line in file: and splitting on commas — this breaks the moment a field contains a comma or a quoted string. Always use csv.reader or csv.DictReader.
For production, always specify quoting=csv.QUOTE_MINIMAL (the default) for writing, and quoting=csv.QUOTE_NONNUMERIC for reading if all fields should be strings. Use newline='' when opening the file — otherwise the CSV module's newline handling can double up \r on Windows.
Encoding is the biggest silent killer. A CSV file from a French office might be in cp1252 or latin-1. Always pass encoding explicitly, and if you don't know the origin, use encoding='utf-8-sig' to handle the BOM that Excel loves to add.
newline=''. Without it, the csv module's internal newline handling conflicts with the file object's default newline translation on Windows, causing doubled \r\n or truncated rows. This bug is extremely subtle because it only shows up on Windows or when the file contains \r\n line endings.cp1252 encoding and a BOM. The pipeline assumed UTF-8 and failed on every accent character. encoding='utf-8-sig' fixed it — it strips the BOM and falls back to UTF-8 for the rest. Always probe encoding or use utf-8-sig for files that may come from Windows.chardet for detection or default to utf-8-sig for Windows-origin files.csv.DictReader — never split manually.newline=''.utf-8-sig for Windows exports.Closing Files Without `with`? You’re Leaking Resources
Manually calling is the single biggest source of production file bugs I see. If an exception occurs before the close call, the file handle stays open. This silently locks resources — especially on Windows where open files prevent renames or deletes. The file.close()with statement guarantees closure even if the block raises. It calls __exit__ which flushes buffers and releases the OS handle. Never open a file without with. It’s not style — it’s correctness. If you must manage lifecycle manually (rare), wrap operations in try/finally and close in the finally block. But with is idiomatic. Use it.
file.close() outside a with block, that’s a code smell. Refactor it.File Existence Checks Are for Amateurs — Use Exception Handling
New devs check if a file exists before reading it. That’s a TOCTOU bug — Time of Check to Time of Use. Between and os.path.exists(), the file can be deleted. Instead, open the file and catch open()FileNotFoundError (or FileExistsError for exclusive creation). This is atomic and safe. Also, never use os.path for paths. Python 3.4+ gives us pathlib which is cleaner and cross-platform. Build paths with Path objects, check with if you must, but prefer try/except. Production code is resilient against race conditions. Be resilient.path.exists()
Silent Log Loss: The Production File That Vanished
'a' mode, so log lines should accumulate. The rollback should only affect new code, not historical data.'w' to regenerate a test file. The change was deployed, the file was truncated on the first open() call after deploy, and the rollback didn't recover the old content because 'w' destroyed it before any write completed.logging module handler that explicitly uses 'a' with no chance of mode override via config. Added a lint rule banning 'w' mode in audit paths.'w'mode truncates the file at, not atopen().write()- If your application has any local file output that must survive across runs, use
'a'mode or date-stamped filenames. - Better yet, centralize logging to a service — never trust file mode conventions.
lsof -p <pid> | wc -l. Check for missing with statements — every open() without with is a leak. Use ulimit -n to see the system limit. Add a monitoring alert on file descriptor usage.chardet.detect(open('file','rb').read(10000)). If you can't detect, fall back to binary mode 'rb' and decode manually with error handling. Never assume UTF-8; always specify encoding in production.open() raises FileNotFoundErrorrepr(path) to verify.'w' mode by accident — log the mode string at open time. Use file.tell() before writing to confirm cursor position. For safety, always write to a .tmp path and then os.rename() to the target — rename is atomic on most filesystems.lsof -p $(pgrep -f your_script.py) | grep -c 'REG'ulimit -nopen() calls with with statement. For long-running apps, use contextlib.closing() for resources that don't support with.Key takeaways
open() is called, before any write() happensread()open(), glob(), mkdir(), and the rest of the standard library.Common mistakes to avoid
5 patternsOpening a file without 'with' and forgetting to call close()
with open(...) as f: with no exceptions. If you're in a class and must store the file object, implement __enter__ and __exit__ properly, or use contextlib.closing().Using 'w' mode on a file you meant to append to
'a'. If no (you're regenerating the file intentionally), use 'w'. Never default to 'w' without thinking about it.Reading an entire large file into memory with read()
for line in file_object:) for text files, or use file.read(chunk_size) in a loop for binary files. This keeps memory usage flat regardless of file size.Not specifying encoding and getting UnicodeDecodeError on a different platform
encoding='utf-8' explicitly. For files from external sources, detect encoding with chardet or use encoding='utf-8-sig' to handle BOM.Using `os.path.exists()` before opening a file (TOCTOU race)
open(). Especially common in concurrent applications.if os.path.exists(path): open(path) with try: open(path) except FileNotFoundError: handle() — EAFP style.Interview Questions on This Topic
What is the difference between opening a file in 'w' and 'a' mode, and what happens to an existing file's content the moment you call open() with 'w'?
'w' mode truncates the file to zero bytes immediately when open() is called — before any write. 'a' mode preserves existing content and positions the write cursor at the end. The key distinction is that truncation happens at open time, not at write time. So if you open in 'w' and then crash, the file is already empty.Frequently Asked Questions
20+ years shipping production Python across data and backend systems. Everything here is grounded in real deployments.
That's File Handling. Mark it forged?
7 min read · try the examples if you haven't