Senior 7 min · March 05, 2026

Python File Handling — 'w' Mode Truncates on Open

Using 'w' mode truncates files on open(), not write().

N
Naren Founder & Principal Engineer

20+ years shipping production Python across data and backend systems. Everything here is grounded in real deployments.

Follow
Production
production tested
May 24, 2026
last updated
1,554
articles · all by Naren
 ● Production Incident 🔎 Debug Guide ⚙ Triage Commands
Quick Answer
  • The with statement guarantees file close even on exceptions
  • 'r' mode reads; 'w' truncates existing file on open; 'a' appends
  • Use encoding='utf-8' to avoid UnicodeDecodeError on Windows
  • Iterating the file object reads one line at a time — O(1) memory per line
  • For large files, avoid read() — it loads the whole file into RAM
  • Biggest trap: opening with 'w' and crashing — file is already empty
✦ Definition~90s read
What is File Handling in Python?

File handling in Python is the mechanism for reading from and writing to files on disk, bridging your program's in-memory data with persistent storage. At its core, Python wraps operating system file descriptors into file objects, giving you methods like .read(), .write(), and .close().

Think of a file on your computer like a physical notebook locked in a drawer.

The open() function is your entry point, accepting a file path and a mode string — 'r' for read, 'w' for write (which truncates the file on open), 'a' for append, and 'b' for binary. The 'w' mode is a common trap: it silently empties the file the moment you call open(), before you write a single byte.

This is by design for overwriting files, but it destroys data if you only intended to modify or append. The with statement solves the resource leak problem by ensuring the file is closed automatically, even if an exception occurs — without it, you risk hitting the OS file descriptor limit in production.

For modern Python (3.4+), pathlib.Path offers a cleaner, object-oriented approach to paths and file operations, replacing brittle string concatenation. In production, you'll pair file handling with error handling (try/except for FileNotFoundError, PermissionError) and often use the csv module for structured data, which handles quoting and delimiters that raw string splitting cannot.

The ecosystem alternatives include pandas for large datasets, sqlite3 for relational data, and io.BytesIO for in-memory binary streams — choose file handling only when you need direct, sequential access to flat files.

Plain-English First

Think of a file on your computer like a physical notebook locked in a drawer. To use it, you need to unlock the drawer (open the file), do your work — read a page, write a note, add to the end — and then lock it back up (close the file). If you forget to lock the drawer and walk away, things can go missing or get corrupted. Python's file handling is just that workflow, but automated so you never forget to lock the drawer.

File handling in Python is the bridge between your runtime logic and persistent data. Without it, every variable you compute vanishes the second the script ends. Most developers learn open(), read(), and write() by rote, then ship code that silently corrupts files, hangs on large payloads, or crashes when a CSV lacks a header. Getting file I/O right means understanding modes, context managers, path objects, and the real gotchas that separate a prototype from production-ready code.

What File Handling in Python Actually Does

File handling in Python is the set of operations to read from or write to files on disk, mediated by the built-in open() function. The core mechanic is that open() returns a file object, which acts as a stream between your program and the underlying file descriptor. The mode string you pass — 'r', 'w', 'a', 'x', and their binary variants — determines the initial file pointer position, whether the file is truncated, and whether writes are appended or overwritten.

When you open a file in 'w' mode, Python truncates the file to zero length immediately upon opening — before any write() call. This means the file's existing content is destroyed the moment open() executes, not when you first write data. The file pointer starts at position 0, so every subsequent write overwrites from the beginning. This behavior is consistent across all operating systems because it maps directly to the POSIX open() system call with O_WRONLY | O_CREAT | O_TRUNC flags.

Use 'w' mode when you intend to replace an entire file with new content — for example, writing a fresh configuration file, a log rotation output, or a serialized snapshot. Never use 'w' if you need to preserve existing data or append to a file; that's what 'a' (append) or 'r+' (read-write without truncation) are for. In production systems, accidentally using 'w' instead of 'a' is a common cause of data loss in log aggregators and state dumps.

Truncation Happens at open(), Not at write()
If you open a file with 'w' mode and the program crashes before any write(), the file is already empty — your data is gone.
Production Insight
A log shipping daemon opened its output file with 'w' mode on each rotation, but a race condition caused two instances to open the same file simultaneously — the second instance truncated the file that the first had just written, losing 30 seconds of logs.
Symptom: zero-byte output files at rotation boundaries with no error in logs.
Rule: For any file that multiple processes or threads may write to, use 'a' mode or implement file locking; never use 'w' unless you own the file exclusively.
Key Takeaway
1. 'w' mode truncates the file at open() time, not at write() time — the data is gone before your first write.
2. Use 'a' mode for appending, 'r+' for read-write without truncation, and 'x' for exclusive creation.
3. In production, 'w' mode is safe only when you are the sole writer and you intend to replace the entire file.
Python File Modes: 'w' Truncates on Open THECODEFORGE.IO Python File Modes: 'w' Truncates on Open Flow from open to close, highlighting the truncation trap Open with 'w' Mode Truncates file immediately on open Write Data Overwrites content from start Close File (with) Auto-closes via context manager Missing 'with' Leaks resource, data may be lost Check Existence First Race condition; use try-except ⚠ 'w' mode truncates before you write anything Use 'x' or check with try-except to avoid data loss THECODEFORGE.IO
thecodeforge.io
Python File Modes: 'w' Truncates on Open
File Handling Python

Opening and Reading Files — and Why the 'with' Statement Exists

Before you can do anything with a file, Python needs a file object — a live connection to that file on disk. You create one with the built-in open() function. The two most important arguments are the file path and the mode: 'r' for read, 'w' for write, 'a' for append, and 'b' tacked on for binary (e.g., 'rb').

Here's the thing most tutorials gloss over: open() acquires an operating-system resource. The OS gives your process a file descriptor — a numbered slot in a limited table. If you open files without closing them, you eventually exhaust that table and get a Too many open files error. Worse, unflushed writes may never reach disk.

The with statement solves this by acting as a guaranteed cleanup mechanism. It calls file.close() the instant the indented block exits — whether that exit is normal, via return, or via a crashing exception. You should use with open(...) every single time, with no exceptions. The only reason the two-line open() / close() pattern still exists in docs is historical; treat it as legacy.

Reading has three flavours: read() loads the entire file into one string (fine for small files, dangerous for large ones), readline() fetches one line at a time, and readlines() returns a list of all lines. For most real work, iterating the file object directly is the cleanest and most memory-efficient approach — Python streams one line at a time without loading everything.

read_server_log.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
# Scenario: parse a server log file and count how many lines are ERROR level

log_file_path = "server.log"

# Create a sample log file to work with so this script is self-contained
with open(log_file_path, "w", encoding="utf-8") as log_file:
    log_file.write("INFO  2024-06-01 08:00:01 Server started\n")
    log_file.write("INFO  2024-06-01 08:01:15 Request received from 192.168.1.10\n")
    log_file.write("ERROR 2024-06-01 08:01:16 Database connection timeout\n")
    log_file.write("INFO  2024-06-01 08:02:00 Retrying connection\n")
    log_file.write("ERROR 2024-06-01 08:02:05 Max retries exceeded\n")
    log_file.write("INFO  2024-06-01 08:02:06 Falling back to cache\n")

error_count = 0
error_lines = []

# The 'with' block guarantees the file is closed when we're done,
# even if an exception is raised inside the block.
with open(log_file_path, "r", encoding="utf-8") as log_file:
    # Iterating the file object directly reads ONE line at a time —
    # this works correctly even for a 10 GB log file because Python
    # never loads the whole thing into memory at once.
    for line in log_file:
        stripped_line = line.strip()  # remove trailing newline characters
        if stripped_line.startswith("ERROR"):
            error_count += 1
            error_lines.append(stripped_line)

print(f"Total ERROR lines found: {error_count}")
print("\nError details:")
for error in error_lines:
    print(f"  -> {error}")
Output
Total ERROR lines found: 2
Error details:
-> ERROR 2024-06-01 08:01:16 Database connection timeout
-> ERROR 2024-06-01 08:02:05 Max retries exceeded
Pro Tip: Always Specify Encoding
Always pass encoding='utf-8' to open(). Without it, Python uses the platform's default encoding — which is UTF-8 on Mac/Linux but often CP1252 on Windows. That mismatch is the cause of countless 'UnicodeDecodeError' bugs that only appear on certain machines. Make UTF-8 your default and move on.
Production Insight
A production batch job reading CSV files on a Windows server crashed daily at 2 PM. The input file was created on a Mac and contained en-dash characters. Python's default encoding on Windows is CP1252, which fails on en-dash (0x2013). The fix was adding encoding='utf-8' — the file was actually UTF-8, but the default assumption broke everything.
Rule: if you don't control file creation, always call chardet.detect() on the first bytes and pass the result to open(). Never trust platform defaults.
Key Takeaway
Use with open(...) every single time.
Specify encoding='utf-8' explicitly.
Iterate lines, don't call read() on large files.

Writing and Appending — Understanding Why 'w' Mode Is a Trap

Writing to a file sounds simple — and it is, once you understand the critical difference between 'w' (write) and 'a' (append) mode, because confusing them is one of the most common ways to destroy your own data.

Opening a file in 'w' mode does two things: it creates the file if it doesn't exist, and it truncates the file to zero bytes if it does exist — immediately, before you write a single character. That means opening an existing file with 'w' and then crashing before writing anything leaves you with an empty file. The original content is gone.

Append mode ('a') is safer for ongoing data: the file is created if absent, but if it exists, the write cursor starts at the very end. Existing content is untouched. This is exactly what you want for log files, audit trails, or any data you're accumulating over time.

For writing structured data — think generating a report or saving configuration — 'w' is correct because you intentionally want a fresh file each run. For recording events as they happen over time, 'a' is correct. Choosing wrong corrupts data silently, which is why understanding the WHY matters more than memorising the letters.

write() takes a single string. Unlike print(), it does not add a newline automatically — you must include ' ' yourself. writelines() accepts a list of strings and also skips automatic newlines, so each string in your list must already end with ' ' if you want line breaks.

user_activity_logger.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
import datetime

activity_log_path = "user_activity.log"

def log_activity(username: str, action: str) -> None:
    """Append a timestamped activity record to the log file.

    We use 'a' mode so that every call adds to the existing log
    rather than overwriting it. This function is safe to call
    from multiple places in a long-running application.
    """
    timestamp = datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S")
    # Build the full log entry as a single string with its own newline
    log_entry = f"[{timestamp}] USER={username} ACTION={action}\n"

    with open(activity_log_path, "a", encoding="utf-8") as activity_log:
        activity_log.write(log_entry)  # 'a' mode: cursor is always at end of file

def generate_daily_report(report_date: str, summary_lines: list[str]) -> None:
    """Write a fresh daily report, replacing any previous report for the same name.

    We intentionally use 'w' mode here because the report is always
    regenerated from scratch — old content should not carry over.
    """
    report_path = f"report_{report_date}.txt"
    header = f"=== Daily Report for {report_date} ===\n\n"

    with open(report_path, "w", encoding="utf-8") as report_file:
        report_file.write(header)
        # writelines() writes each string as-is — no automatic newlines added
        # so we ensure each summary line already ends with '\n'
        report_file.writelines(line + "\n" for line in summary_lines)

    print(f"Report saved to {report_path}")

# Simulate three user events arriving over time
log_activity("alice", "LOGIN")
log_activity("bob", "VIEWED_DASHBOARD")
log_activity("alice", "EXPORTED_CSV")

# Read back the log to confirm all three entries were preserved
print("--- Current activity log ---")
with open(activity_log_path, "r", encoding="utf-8") as activity_log:
    print(activity_log.read())

# Generate a report (uses 'w' mode — intentionally fresh each time)
generate_daily_report(
    report_date="2024-06-01",
    summary_lines=[
        "Total logins: 1",
        "Total page views: 1",
        "Total exports: 1"
    ]
)
Output
--- Current activity log ---
[2024-06-01 09:15:01] USER=alice ACTION=LOGIN
[2024-06-01 09:15:02] USER=bob ACTION=VIEWED_DASHBOARD
[2024-06-01 09:15:03] USER=alice ACTION=EXPORTED_CSV
Report saved to report_2024-06-01.txt
Watch Out: 'w' Mode Deletes First, Asks Questions Later
The file truncation in 'w' mode happens the instant open() is called — not when you call write(). If your next line crashes, the file is already empty. For any file you care about, consider writing to a temporary file first and renaming it over the original only after a successful write. This is the atomic write pattern and it's how professional tools like text editors protect your data.
Production Insight
An ETL pipeline that wrote daily CSV exports used 'w' mode by default. A connection timeout hit between open() and write() — the output file was truncated, and the downstream system ingested an empty file, overwriting the previous day's correct data. Recovery required a database restore.
Rule: use atomic writes for any file that another system consumes. Write to a .tmp path, then os.rename() to the final name.
Key Takeaway
'w' truncates on open, not on write.
'a' preserves existing content.
For critical files, use atomic write pattern with rename.

Error Handling and File Existence — Writing Code That Doesn't Embarrass You in Production

A file operation in production will eventually fail. The path doesn't exist, the disk is full, the process doesn't have permission, the file is locked by another process. Ignoring this is fine in a throwaway script and a fireable offence in production code.

Python raises specific exceptions for file errors. FileNotFoundError fires when you try to read a file that doesn't exist. PermissionError fires when the OS blocks access. IsADirectoryError fires when you accidentally pass a directory path where a file path was expected. All three are subclasses of OSError, so catching OSError covers all of them — but catching the specific type gives you better error messages.

A common anti-pattern is checking os.path.exists() before opening a file. This looks safe but it's a race condition: between your check and your open(), another process can delete or create that file. The correct pattern is EAFP (Easier to Ask Forgiveness than Permission) — just try to open it and handle the exception. Python was designed around this idiom.

For reading a file that might not exist yet (like a config file on first run), a clean pattern is to catch FileNotFoundError and return a sensible default rather than crashing. For writing, you often want the opposite: verify the destination directory exists and create it if not, using pathlib.Path.mkdir(parents=True, exist_ok=True) before writing.

config_manager.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
import json
from pathlib import Path

DEFAULT_CONFIG = {
    "theme": "dark",
    "language": "en",
    "auto_save_interval_seconds": 30
}

CONFIG_PATH = Path("app_data") / "config.json"

def load_config() -> dict:
    """Load config from disk. Returns defaults if the file doesn't exist yet.

    Uses EAFP style: try the operation, handle the specific failure.
    This avoids the race condition that os.path.exists() creates.
    """
    try:
        with open(CONFIG_PATH, "r", encoding="utf-8") as config_file:
            loaded_config = json.load(config_file)
            print(f"Config loaded from {CONFIG_PATH}")
            return loaded_config

    except FileNotFoundError:
        # This is expected on first run — not an error, just a first boot
        print(f"No config file found at {CONFIG_PATH}. Using defaults.")
        return DEFAULT_CONFIG.copy()

    except json.JSONDecodeError as parse_error:
        # The file exists but is malformed — this IS an error worth reporting
        print(f"WARNING: Config file is corrupted ({parse_error}). Using defaults.")
        return DEFAULT_CONFIG.copy()

    except PermissionError:
        # We can't read the file — fail loudly, don't silently use defaults
        raise RuntimeError(
            f"Cannot read config at {CONFIG_PATH}. Check file permissions."
        )

def save_config(config: dict) -> None:
    """Save config to disk, creating the directory structure if it doesn't exist."""
    # mkdir with parents=True creates 'app_data/' if it's missing
    # exist_ok=True means no error if the directory is already there
    CONFIG_PATH.parent.mkdir(parents=True, exist_ok=True)

    with open(CONFIG_PATH, "w", encoding="utf-8") as config_file:
        # indent=2 makes the JSON human-readable — important for config files
        json.dump(config, config_file, indent=2)
        print(f"Config saved to {CONFIG_PATH}")

# First load — file doesn't exist yet
current_config = load_config()
print(f"Theme: {current_config['theme']}")

# User changes a setting
current_config["theme"] = "light"
save_config(current_config)

# Second load — now reads from disk
current_config = load_config()
print(f"Theme after reload: {current_config['theme']}")
Output
No config file found at app_data/config.json. Using defaults.
Theme: dark
Config saved to app_data/config.json
Config loaded from app_data/config.json
Theme after reload: light
Interview Gold: EAFP vs LBYL
Python has two philosophies for handling uncertain operations. LBYL (Look Before You Leap) checks conditions first: if os.path.exists(path): open(path). EAFP (Easier to Ask Forgiveness than Permission) just tries it: try: open(path) except FileNotFoundError. Python officially prefers EAFP because it's faster (no double stat call), race-condition-free, and more readable. Interviewers love this distinction — knowing the names and reasons will set you apart.
Production Insight
A Django app used os.path.exists() before opening a user-uploaded file. Under high load, a race condition occurred: two requests processed the same upload simultaneously — one deleted the file after the existence check but before open(). The second request crashed with a 500 error. The user got an opaque error page.
Rule: never check-then-open. Always use try/except with the specific exception. For uploads, copy the file to a temp location before processing.
Key Takeaway
Use EAFP: try to open, catch exceptions.
os.path.exists() is a TOCTOU race condition.
Always create parent directories before writing.

Binary Files and pathlib — Handling Images, PDFs and Modern Path Management

Not all files are text. Images, PDFs, audio, compiled code, and serialised data are binary — they contain bytes that aren't valid UTF-8 text. Open them in text mode and you'll get a UnicodeDecodeError at best, or silently corrupted data at worst. Binary mode ('rb', 'wb') tells Python to skip all encoding/decoding and work with raw bytes.

A very common real-world task is copying or processing binary files — resizing images, attaching files to emails, or storing uploaded files from a web form. The pattern is identical to text mode but you read bytes objects instead of strings.

For path manipulation, the modern way in Python 3.4+ is pathlib.Path. Forget string concatenation with os.path.join()pathlib lets you build paths with / operator, check existence with .exists(), get the file extension with .suffix, list directory contents with .iterdir(), and open files directly via path.open(). It's more readable, cross-platform by default, and object-oriented in a way that makes your intent clear.

When you're processing a directory full of files — a common automation task — pathlib with a generator expression is significantly cleaner than os.listdir() combined with string filtering. The pattern Path('data').glob('*.csv') gives you an iterator of all CSV files in the directory, ready to open.

file_organiser.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
from pathlib import Path
import shutil

# Scenario: scan a 'downloads' folder and copy image files into an 'images' archive.
# This pattern works identically on Windows, macOS and Linux because pathlib
# handles the slash vs backslash difference for you automatically.

DOWNLOADS_DIR = Path("sample_downloads")
IMAGES_ARCHIVE_DIR = Path("organised") / "images"

# Create sample directory with mixed file types so the script is self-contained
DOWNLOADS_DIR.mkdir(exist_ok=True)
(DOWNLOADS_DIR / "holiday_photo.jpg").write_bytes(b"\xff\xd8\xff" + b"\x00" * 10)
(DOWNLOADS_DIR / "budget.xlsx").write_bytes(b"PK\x03\x04" + b"\x00" * 10)
(DOWNLOADS_DIR / "profile_pic.png").write_bytes(b"\x89PNG" + b"\x00" * 10)
(DOWNLOADS_DIR / "notes.txt").write_text("Some text notes", encoding="utf-8")
(DOWNLOADS_DIR / "logo.jpg").write_bytes(b"\xff\xd8\xff" + b"\x00" * 10)

IMAGES_ARCHIVE_DIR.mkdir(parents=True, exist_ok=True)

image_extensions = {".jpg", ".jpeg", ".png", ".gif", ".webp"}
copy_count = 0

# Path.iterdir() yields Path objects for every item in the directory —
# no string path manipulation needed, no os.path.join() required.
for file_path in DOWNLOADS_DIR.iterdir():
    # .is_file() filters out subdirectories
    # .suffix gives the file extension including the dot, e.g. '.jpg'
    if file_path.is_file() and file_path.suffix.lower() in image_extensions:
        destination = IMAGES_ARCHIVE_DIR / file_path.name

        # shutil.copy2 copies file content AND metadata (timestamps etc.)
        shutil.copy2(file_path, destination)
        copy_count += 1
        print(f"  Copied: {file_path.name} -> {destination}")

print(f"\nDone. {copy_count} image file(s) archived to {IMAGES_ARCHIVE_DIR}")

# Demonstrate reading binary content back
for image_path in IMAGES_ARCHIVE_DIR.glob("*.jpg"):
    # 'rb' mode returns raw bytes — no encoding involved
    with image_path.open("rb") as image_file:
        first_bytes = image_file.read(3)  # JPEG magic bytes are FF D8 FF
        print(f"  {image_path.name} magic bytes: {first_bytes.hex().upper()}")
Output
Copied: holiday_photo.jpg -> organised/images/holiday_photo.jpg
Copied: profile_pic.png -> organised/images/profile_pic.png
Copied: logo.jpg -> organised/images/logo.jpg
Done. 3 image file(s) archived to organised/images
holiday_photo.jpg magic bytes: FFD8FF
logo.jpg magic bytes: FFD8FF
Pro Tip: Use pathlib for Everything Path-Related
If you're still writing os.path.join(base_dir, 'subfolder', filename), switch to pathlib today. Path(base_dir) / 'subfolder' / filename does exactly the same thing and is immediately readable. pathlib objects also work directly with open(), shutil, and most standard library functions, so there's no conversion overhead — it's a straight upgrade.
Production Insight
A team's deployment script used os.path.join with hardcoded backslashes for a Windows path. When they moved to Linux Docker containers, the path broke because os.path.join doesn't handle mixed separators. Switching to pathlib.Path fixed it — the code now runs identically on both platforms.
Rule: use pathlib for any path that might cross operating system boundaries. It's not just cleaner — it's portable.
Key Takeaway
Binary files need 'rb'/'wb' — text mode corrupts them.
pathlib is the modern, cross-platform way to handle paths.
Use .glob() and .iterdir() for bulk file operations.

Working with CSV Files — The Most Common Production File Format

CSV files are everywhere. Exports from databases, spreadsheets, logs, API responses — CSV is the lingua franca of data exchange. But CSV handling has traps: quoting, encoding, newlines inside fields, and missing headers.

Python's csv module handles most of this correctly if you use it right. The common beginner mistake is reading a CSV file manually with for line in file: and splitting on commas — this breaks the moment a field contains a comma or a quoted string. Always use csv.reader or csv.DictReader.

For production, always specify quoting=csv.QUOTE_MINIMAL (the default) for writing, and quoting=csv.QUOTE_NONNUMERIC for reading if all fields should be strings. Use newline='' when opening the file — otherwise the CSV module's newline handling can double up \r on Windows.

Encoding is the biggest silent killer. A CSV file from a French office might be in cp1252 or latin-1. Always pass encoding explicitly, and if you don't know the origin, use encoding='utf-8-sig' to handle the BOM that Excel loves to add.

clean_csv.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
import csv
from pathlib import Path

# Scenario: read a messy CSV exported from Excel (has BOM, inconsistent quotes)
# and write a clean UTF-8 CSV with standard formatting.

input_path = Path("sales_export.csv")
output_path = Path("sales_cleaned.csv")

# Create a sample 'messy' CSV file for demonstration
with open(input_path, "w", encoding="utf-8-sig") as f:
    f.write("\ufeffProduct,Price,Quantity\r\n")
    f.write('Widget A,"$12.50",3\r\n')
    f.write('Widget B,"$24.99",1\r\n')
    f.write('Widget C,"$5.00",10\r\n')

def clean_price(price_str: str) -> float:
    """Remove currency symbols and whitespace, return float."""
    return float(price_str.replace("$", "").replace(",", "").strip())

# Open with newline='' — critical for csv module portability
with open(input_path, "r", encoding="utf-8-sig", newline='') as infile, \
     open(output_path, "w", encoding="utf-8", newline='') as outfile:

    reader = csv.DictReader(infile)
    fieldnames = reader.fieldnames
    writer = csv.DictWriter(outfile, fieldnames=fieldnames)
    writer.writeheader()

    for row in reader:
        # Clean price field
        row["Price"] = f'{clean_price(row["Price"]):.2f}'
        writer.writerow(row)

print(f"Cleaned CSV written to {output_path}")

# Verify output
with open(output_path, "r", encoding="utf-8") as f:
    print(f.read())
Output
Cleaned CSV written to sales_cleaned.csv
Product,Price,Quantity
Widget A,12.50,3
Widget B,24.99,1
Widget C,5.00,10
CSV Pitfall: newline='' is Not Optional
When opening a CSV file for reading or writing, always pass newline=''. Without it, the csv module's internal newline handling conflicts with the file object's default newline translation on Windows, causing doubled \r\n or truncated rows. This bug is extremely subtle because it only shows up on Windows or when the file contains \r\n line endings.
Production Insight
A data pipeline ingested CSV files from a customer's ERP system. The files were generated on a Windows server with cp1252 encoding and a BOM. The pipeline assumed UTF-8 and failed on every accent character. encoding='utf-8-sig' fixed it — it strips the BOM and falls back to UTF-8 for the rest. Always probe encoding or use utf-8-sig for files that may come from Windows.
Rule: never assume the encoding of externally sourced CSV files. Use chardet for detection or default to utf-8-sig for Windows-origin files.
Key Takeaway
Use csv.DictReader — never split manually.
Open CSV files with newline=''.
Handle encoding with utf-8-sig for Windows exports.

Closing Files Without `with`? You’re Leaking Resources

Manually calling file.close() is the single biggest source of production file bugs I see. If an exception occurs before the close call, the file handle stays open. This silently locks resources — especially on Windows where open files prevent renames or deletes. The with statement guarantees closure even if the block raises. It calls __exit__ which flushes buffers and releases the OS handle. Never open a file without with. It’s not style — it’s correctness. If you must manage lifecycle manually (rare), wrap operations in try/finally and close in the finally block. But with is idiomatic. Use it.

resource_leak.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
# Bad: leak on exception
f = open("data.txt", "r")
data = f.read()
raise ValueError("simulated crash")
f.close()  # never reached, handle stays open

# Good: guaranteed close
with open("data.txt", "r") as f:
    data = f.read()
    raise ValueError("simulated crash")
# f is closed even after exception
Output
Traceback (most recent call last):
File "resource_leak.py", line 4, in <module>
raise ValueError("simulated crash")
ValueError: simulated crash
# No FileNotFoundError on next run - handle was released
Production Trap:
On Windows, an unclosed file handle prevents other processes from deleting or renaming the file. In CI/CD pipelines, this causes cryptic 'access denied' errors that waste hours debugging.
Key Takeaway
If you see file.close() outside a with block, that’s a code smell. Refactor it.

File Existence Checks Are for Amateurs — Use Exception Handling

New devs check if a file exists before reading it. That’s a TOCTOU bug — Time of Check to Time of Use. Between os.path.exists() and open(), the file can be deleted. Instead, open the file and catch FileNotFoundError (or FileExistsError for exclusive creation). This is atomic and safe. Also, never use os.path for paths. Python 3.4+ gives us pathlib which is cleaner and cross-platform. Build paths with Path objects, check with path.exists() if you must, but prefer try/except. Production code is resilient against race conditions. Be resilient.

safe_open.pyPYTHON
1
2
3
4
5
6
7
8
9
10
from pathlib import Path

cfg_path = Path("/etc/app/config.json")

try:
    with open(cfg_path, "r") as f:
        config = f.read()
except FileNotFoundError:
    print(f"Config not found at {cfg_path}. Using defaults.")
    config = "{}"
Output
Config not found at /etc/app/config.json. Using defaults.
Senior Dev Insight:
For write operations, use the 'x' mode (exclusive creation) instead of 'w'. It raises FileExistsError if the file exists — perfect for preventing accidental overwrites in log rotation or cache files.
Key Takeaway
Check by opening, not by asking permission. TOCTOU bugs are silent killers in distributed systems.
● Production incidentPOST-MORTEMseverity: high

Silent Log Loss: The Production File That Vanished

Symptom
After a hotfix deploy and immediate rollback, the audit log file from the previous three hours was empty. New logs started appearing, but the prior window's data was gone.
Assumption
The audit logger used 'a' mode, so log lines should accumulate. The rollback should only affect new code, not historical data.
Root cause
During the hotfix, the dev mistakenly changed the file open mode to 'w' to regenerate a test file. The change was deployed, the file was truncated on the first open() call after deploy, and the rollback didn't recover the old content because 'w' destroyed it before any write completed.
Fix
Implemented an audit log rotation with date-based filenames and a dedicated logging module handler that explicitly uses 'a' with no chance of mode override via config. Added a lint rule banning 'w' mode in audit paths.
Key lesson
  • 'w' mode truncates the file at open(), not at write().
  • If your application has any local file output that must survive across runs, use 'a' mode or date-stamped filenames.
  • Better yet, centralize logging to a service — never trust file mode conventions.
Production debug guideSymptom → Action guide for real problems you'll face in production4 entries
Symptom · 01
Too many open files error (OSError 24)
Fix
Count open file handles: lsof -p <pid> | wc -l. Check for missing with statements — every open() without with is a leak. Use ulimit -n to see the system limit. Add a monitoring alert on file descriptor usage.
Symptom · 02
UnicodeDecodeError when reading a file that looks like text
Fix
Print the file's encoding: chardet.detect(open('file','rb').read(10000)). If you can't detect, fall back to binary mode 'rb' and decode manually with error handling. Never assume UTF-8; always specify encoding in production.
Symptom · 03
File exists but open() raises FileNotFoundError
Fix
Check whether it's a symlink pointing to a non-existent path, or the file is inside a directory mounted as a filesystem that's not available. Also check if the file path contains trailing whitespace or invisible characters — print repr(path) to verify.
Symptom · 04
File written but content is empty or truncated
Fix
Check if you're using 'w' mode by accident — log the mode string at open time. Use file.tell() before writing to confirm cursor position. For safety, always write to a .tmp path and then os.rename() to the target — rename is atomic on most filesystems.
★ Quick Debug Cheat Sheet: File HandlingRun these commands when file operations behave unexpectedly in production
File descriptor limit exceeded
Immediate action
Find PID of the Python process, then list open files.
Commands
lsof -p $(pgrep -f your_script.py) | grep -c 'REG'
ulimit -n
Fix now
Wrap all open() calls with with statement. For long-running apps, use contextlib.closing() for resources that don't support with.
UnicodeDecodeError crash+
Immediate action
Find the file encoding and open with the correct encoding.
Commands
python -c 'import chardet; print(chardet.detect(open("problematic.txt","rb").read()))'
file problematic.txt
Fix now
Always pass encoding='utf-8' as default. If the file might have mixed encodings, read in 'rb' and use .decode() with errors='backslashreplace'.
Write didn't appear in file+
Immediate action
Check if file was flushed; ensure the `with` block completed.
Commands
python -c 'import os; print(os.stat("output.txt").st_size)'
cat output.txt | wc -l
Fix now
Wrap write logic in try/finally or use with to guarantee flush. For high-reliability writes, call file.flush() and os.fsync(file.fileno()) after critical writes.
`file.read()` uses 2 GB RAM for a 2 GB file+
Immediate action
Stop using `read()` without argument. Switch to line-iterating or chunked reading.
Commands
wc -l hugefile.txt
python -c 'for i, line in enumerate(open("hugefile.txt","r")): pass; print(i+1)'
Fix now
Replace data = file.read() with for chunk in iter(lambda: file.read(65536), ''): process(chunk) for binary, or iterate lines for text.
Text Mode vs Binary Mode
AspectText Mode ('r', 'w', 'a')Binary Mode ('rb', 'wb', 'ab')
Data type returnedstrbytes
Encoding appliedYes — uses specified or platform encodingNo — raw bytes only
Newline translationYes — '\r\n' on Windows becomes '\n'No — bytes are untouched
Use caseLogs, config, CSV, JSON, source codeImages, PDFs, audio, executables, pickled data
Error on bad bytesUnicodeDecodeError if file isn't valid textNever — all byte sequences are valid
File size considerationSlightly smaller in memory (str interning)Exact byte-for-byte copy in memory

Key takeaways

1
Always use 'with open(...) as f:'
it's not just style, it's a resource safety guarantee that prevents file descriptor leaks and ensures buffers are flushed to disk.
2
'w' mode truncates the file to zero bytes the instant open() is called, before any write() happens
if you need existing content to survive, you want 'a' mode instead.
3
Iterate the file object directly ('for line in file:') rather than calling read()
this keeps memory usage constant for files of any size, which is the difference between a script that scales and one that doesn't.
4
Use pathlib.Path instead of os.path string operations
it's cross-platform, reads like English, and integrates cleanly with open(), glob(), mkdir(), and the rest of the standard library.
5
For CSV files, always open with newline='' and specify encoding explicitly
the defaults cause subtle cross-platform corruption.

Common mistakes to avoid

5 patterns
×

Opening a file without 'with' and forgetting to call close()

Symptom
Intermittent 'Too many open files' OSError in long-running apps, or writes that never appear on disk because the buffer was never flushed.
Fix
Always use with open(...) as f: with no exceptions. If you're in a class and must store the file object, implement __enter__ and __exit__ properly, or use contextlib.closing().
×

Using 'w' mode on a file you meant to append to

Symptom
The file exists after your script runs but contains only the most recent run's data; all previous data is silently gone.
Fix
Ask yourself: should old content survive this write? If yes, use 'a'. If no (you're regenerating the file intentionally), use 'w'. Never default to 'w' without thinking about it.
×

Reading an entire large file into memory with read()

Symptom
Your script works fine on a 10 KB test file, then crashes with MemoryError or causes the server to swap when pointed at a 2 GB production log.
Fix
Iterate the file object line by line (for line in file_object:) for text files, or use file.read(chunk_size) in a loop for binary files. This keeps memory usage flat regardless of file size.
×

Not specifying encoding and getting UnicodeDecodeError on a different platform

Symptom
A script works on your Mac but throws UnicodeDecodeError when run on a Windows server or vice versa.
Fix
Always pass encoding='utf-8' explicitly. For files from external sources, detect encoding with chardet or use encoding='utf-8-sig' to handle BOM.
×

Using `os.path.exists()` before opening a file (TOCTOU race)

Symptom
Intermittent crash when a file is deleted or created between the existence check and open(). Especially common in concurrent applications.
Fix
Replace if os.path.exists(path): open(path) with try: open(path) except FileNotFoundError: handle() — EAFP style.
INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01JUNIOR
What is the difference between opening a file in 'w' and 'a' mode, and w...
Q02SENIOR
Why should you use a 'with' statement when working with files in Python,...
Q03SENIOR
A production script reads a config file that sometimes doesn't exist on ...
Q01 of 03JUNIOR

What is the difference between opening a file in 'w' and 'a' mode, and what happens to an existing file's content the moment you call open() with 'w'?

ANSWER
'w' mode truncates the file to zero bytes immediately when open() is called — before any write. 'a' mode preserves existing content and positions the write cursor at the end. The key distinction is that truncation happens at open time, not at write time. So if you open in 'w' and then crash, the file is already empty.
FAQ · 5 QUESTIONS

Frequently Asked Questions

01
What is the difference between read(), readline() and readlines() in Python?
02
How do I check if a file exists before opening it in Python?
03
Why do I get a UnicodeDecodeError when opening a file that looks like a text file?
04
Is 'a+' mode safe for reading and appending to the same file?
05
How do I handle binary files like images with Python?
N
Naren Founder & Principal Engineer

20+ years shipping production Python across data and backend systems. Everything here is grounded in real deployments.

Follow
Verified
production tested
May 24, 2026
last updated
1,554
articles · all by Naren
🔥

That's File Handling. Mark it forged?

7 min read · try the examples if you haven't

Previous
Context Managers in Python
1 / 6 · File Handling
Next
Reading and Writing Files in Python