Python Intermediate

Python File Handling Explained — Read, Write, Append and Real-World Patterns

📅 March 2026 ⏱ 8 min read 🎯 Intermediate

In Plain English 🔥

Think of a file on your computer like a physical notebook locked in a drawer. To use it, you need to unlock the drawer (open the file), do your work — read a page, write a note, add to the end — and then lock it back up (close the file). If you forget to lock the drawer and walk away, things can go missing or get corrupted. Python's file handling is just that workflow, but automated so you never forget to lock the drawer.

⚡ Quick Answer

Every meaningful program eventually needs to talk to the outside world — not just the screen, but actual storage. Log files, configuration files, CSV exports, user-uploaded text, cached API responses — all of it lives on disk as files. If your Python script can't read or write files, it's essentially a calculator that resets every time you switch it off. File handling is the bridge between your running program and data that survives after the program exits.

The problem Python file handling solves is deceptively simple: how do you safely open a resource, use it, and guarantee it gets released — even when something goes wrong? Languages that don't enforce this leave files locked open, causing data corruption, permission errors, and crashes that only appear under load. Python's answer is the context manager (the with statement), which makes safe file handling the path of least resistance rather than an afterthought.

By the end of this article you'll know how to open files for reading, writing, and appending; understand exactly what happens behind the scenes when you do; handle errors like a professional; work with both text and binary files; and recognise the two or three patterns that cover 95% of real-world file work. You'll also know the mistakes that trip up even experienced developers — so you can skip straight past them.

Opening and Reading Files — and Why the 'with' Statement Exists

Before you can do anything with a file, Python needs a file object — a live connection to that file on disk. You create one with the built-in open() function. The two most important arguments are the file path and the mode: 'r' for read, 'w' for write, 'a' for append, and 'b' tacked on for binary (e.g., 'rb').

Here's the thing most tutorials gloss over: open() acquires an operating-system resource. The OS gives your process a file descriptor — a numbered slot in a limited table. If you open files without closing them, you eventually exhaust that table and get a Too many open files error. Worse, unflushed writes may never reach disk.

The with statement solves this by acting as a guaranteed cleanup mechanism. It calls file.close() the instant the indented block exits — whether that exit is normal, via return, or via a crashing exception. You should use with open(...) every single time, with no exceptions. The only reason the two-line open() / close() pattern still exists in docs is historical; treat it as legacy.

Reading has three flavours: read() loads the entire file into one string (fine for small files, dangerous for large ones), readline() fetches one line at a time, and readlines() returns a list of all lines. For most real work, iterating the file object directly is the cleanest and most memory-efficient approach — Python streams one line at a time without loading everything.

read_server_log.py · PYTHON

1234567891011121314151617181920212223242526272829303132

# Scenario: parse a server log file and count how many lines are ERROR level

log_file_path = "server.log"

# Create a sample log file to work with so this script is self-contained
with open(log_file_path, "w", encoding="utf-8") as log_file:
    log_file.write("INFO  2024-06-01 08:00:01 Server started\n")
    log_file.write("INFO  2024-06-01 08:01:15 Request received from 192.168.1.10\n")
    log_file.write("ERROR 2024-06-01 08:01:16 Database connection timeout\n")
    log_file.write("INFO  2024-06-01 08:02:00 Retrying connection\n")
    log_file.write("ERROR 2024-06-01 08:02:05 Max retries exceeded\n")
    log_file.write("INFO  2024-06-01 08:02:06 Falling back to cache\n")

error_count = 0
error_lines = []

# The 'with' block guarantees the file is closed when we're done,
# even if an exception is raised inside the block.
with open(log_file_path, "r", encoding="utf-8") as log_file:
    # Iterating the file object directly reads ONE line at a time —
    # this works correctly even for a 10 GB log file because Python
    # never loads the whole thing into memory at once.
    for line in log_file:
        stripped_line = line.strip()  # remove trailing newline characters
        if stripped_line.startswith("ERROR"):
            error_count += 1
            error_lines.append(stripped_line)

print(f"Total ERROR lines found: {error_count}")
print("\nError details:")
for error in error_lines:
    print(f"  -> {error}")

▶ Output

Total ERROR lines found: 2

Error details:
-> ERROR 2024-06-01 08:01:16 Database connection timeout
-> ERROR 2024-06-01 08:02:05 Max retries exceeded

⚠️

Pro Tip: Always Specify EncodingAlways pass `encoding='utf-8'` to `open()`. Without it, Python uses the platform's default encoding — which is UTF-8 on Mac/Linux but often CP1252 on Windows. That mismatch is the cause of countless 'UnicodeDecodeError' bugs that only appear on certain machines. Make UTF-8 your default and move on.

Writing and Appending — Understanding Why 'w' Mode Is a Trap

Writing to a file sounds simple — and it is, once you understand the critical difference between 'w' (write) and 'a' (append) mode, because confusing them is one of the most common ways to destroy your own data.

Opening a file in 'w' mode does two things: it creates the file if it doesn't exist, and it truncates the file to zero bytes if it does exist — immediately, before you write a single character. That means opening an existing file with 'w' and then crashing before writing anything leaves you with an empty file. The original content is gone.

Append mode ('a') is safer for ongoing data: the file is created if absent, but if it exists, the write cursor starts at the very end. Existing content is untouched. This is exactly what you want for log files, audit trails, or any data you're accumulating over time.

For writing structured data — think generating a report or saving configuration — 'w' is correct because you intentionally want a fresh file each run. For recording events as they happen over time, 'a' is correct. Choosing wrong corrupts data silently, which is why understanding the WHY matters more than memorising the letters.

write() takes a single string. Unlike print(), it does not add a newline automatically — you must include ' ' yourself. writelines() accepts a list of strings and also skips automatic newlines, so each string in your list must already end with ' ' if you want line breaks.

user_activity_logger.py · PYTHON

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354

import datetime

activity_log_path = "user_activity.log"

def log_activity(username: str, action: str) -> None:
    """Append a timestamped activity record to the log file.

    We use 'a' mode so that every call adds to the existing log
    rather than overwriting it. This function is safe to call
    from multiple places in a long-running application.
    """
    timestamp = datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S")
    # Build the full log entry as a single string with its own newline
    log_entry = f"[{timestamp}] USER={username} ACTION={action}\n"

    with open(activity_log_path, "a", encoding="utf-8") as activity_log:
        activity_log.write(log_entry)  # 'a' mode: cursor is always at end of file

def generate_daily_report(report_date: str, summary_lines: list[str]) -> None:
    """Write a fresh daily report, replacing any previous report for the same name.

    We intentionally use 'w' mode here because the report is always
    regenerated from scratch — old content should not carry over.
    """
    report_path = f"report_{report_date}.txt"
    header = f"=== Daily Report for {report_date} ===\n\n"

    with open(report_path, "w", encoding="utf-8") as report_file:
        report_file.write(header)
        # writelines() writes each string as-is — no automatic newlines added
        # so we ensure each summary line already ends with '\n'
        report_file.writelines(line + "\n" for line in summary_lines)

    print(f"Report saved to {report_path}")

# Simulate three user events arriving over time
log_activity("alice", "LOGIN")
log_activity("bob", "VIEWED_DASHBOARD")
log_activity("alice", "EXPORTED_CSV")

# Read back the log to confirm all three entries were preserved
print("--- Current activity log ---")
with open(activity_log_path, "r", encoding="utf-8") as activity_log:
    print(activity_log.read())

# Generate a report (uses 'w' mode — intentionally fresh each time)
generate_daily_report(
    report_date="2024-06-01",
    summary_lines=[
        "Total logins: 1",
        "Total page views: 1",
        "Total exports: 1"
    ]
)

▶ Output

--- Current activity log ---
[2024-06-01 09:15:01] USER=alice ACTION=LOGIN
[2024-06-01 09:15:02] USER=bob ACTION=VIEWED_DASHBOARD
[2024-06-01 09:15:03] USER=alice ACTION=EXPORTED_CSV

Report saved to report_2024-06-01.txt

⚠️

Watch Out: 'w' Mode Deletes First, Asks Questions LaterThe file truncation in 'w' mode happens the instant `open()` is called — not when you call `write()`. If your next line crashes, the file is already empty. For any file you care about, consider writing to a temporary file first and renaming it over the original only after a successful write. This is the atomic write pattern and it's how professional tools like text editors protect your data.

Error Handling and File Existence — Writing Code That Doesn't Embarrass You in Production

A file operation in production will eventually fail. The path doesn't exist, the disk is full, the process doesn't have permission, the file is locked by another process. Ignoring this is fine in a throwaway script and a fireable offence in production code.

Python raises specific exceptions for file errors. FileNotFoundError fires when you try to read a file that doesn't exist. PermissionError fires when the OS blocks access. IsADirectoryError fires when you accidentally pass a directory path where a file path was expected. All three are subclasses of OSError, so catching OSError covers all of them — but catching the specific type gives you better error messages.

A common anti-pattern is checking os.path.exists() before opening a file. This looks safe but it's a race condition: between your check and your open(), another process can delete or create that file. The correct pattern is EAFP (Easier to Ask Forgiveness than Permission) — just try to open it and handle the exception. Python was designed around this idiom.

For reading a file that might not exist yet (like a config file on first run), a clean pattern is to catch FileNotFoundError and return a sensible default rather than crashing. For writing, you often want the opposite: verify the destination directory exists and create it if not, using pathlib.Path.mkdir(parents=True, exist_ok=True) before writing.

config_manager.py · PYTHON

12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061

import json
from pathlib import Path

DEFAULT_CONFIG = {
    "theme": "dark",
    "language": "en",
    "auto_save_interval_seconds": 30
}

CONFIG_PATH = Path("app_data") / "config.json"

def load_config() -> dict:
    """Load config from disk. Returns defaults if the file doesn't exist yet.

    Uses EAFP style: try the operation, handle the specific failure.
    This avoids the race condition that os.path.exists() creates.
    """
    try:
        with open(CONFIG_PATH, "r", encoding="utf-8") as config_file:
            loaded_config = json.load(config_file)
            print(f"Config loaded from {CONFIG_PATH}")
            return loaded_config

    except FileNotFoundError:
        # This is expected on first run — not an error, just a first boot
        print(f"No config file found at {CONFIG_PATH}. Using defaults.")
        return DEFAULT_CONFIG.copy()

    except json.JSONDecodeError as parse_error:
        # The file exists but is malformed — this IS an error worth reporting
        print(f"WARNING: Config file is corrupted ({parse_error}). Using defaults.")
        return DEFAULT_CONFIG.copy()

    except PermissionError:
        # We can't read the file — fail loudly, don't silently use defaults
        raise RuntimeError(
            f"Cannot read config at {CONFIG_PATH}. Check file permissions."
        )

def save_config(config: dict) -> None:
    """Save config to disk, creating the directory structure if it doesn't exist."""
    # mkdir with parents=True creates 'app_data/' if it's missing
    # exist_ok=True means no error if the directory is already there
    CONFIG_PATH.parent.mkdir(parents=True, exist_ok=True)

    with open(CONFIG_PATH, "w", encoding="utf-8") as config_file:
        # indent=2 makes the JSON human-readable — important for config files
        json.dump(config, config_file, indent=2)
        print(f"Config saved to {CONFIG_PATH}")

# First load — file doesn't exist yet
current_config = load_config()
print(f"Theme: {current_config['theme']}")

# User changes a setting
current_config["theme"] = "light"
save_config(current_config)

# Second load — now reads from disk
current_config = load_config()
print(f"Theme after reload: {current_config['theme']}")

▶ Output

No config file found at app_data/config.json. Using defaults.
Theme: dark
Config saved to app_data/config.json
Config loaded from app_data/config.json
Theme after reload: light

🔥

Interview Gold: EAFP vs LBYLPython has two philosophies for handling uncertain operations. LBYL (Look Before You Leap) checks conditions first: `if os.path.exists(path): open(path)`. EAFP (Easier to Ask Forgiveness than Permission) just tries it: `try: open(path) except FileNotFoundError`. Python officially prefers EAFP because it's faster (no double stat call), race-condition-free, and more readable. Interviewers love this distinction — knowing the names and reasons will set you apart.

Binary Files and pathlib — Handling Images, PDFs and Modern Path Management

Not all files are text. Images, PDFs, audio, compiled code, and serialised data are binary — they contain bytes that aren't valid UTF-8 text. Open them in text mode and you'll get a UnicodeDecodeError at best, or silently corrupted data at worst. Binary mode ('rb', 'wb') tells Python to skip all encoding/decoding and work with raw bytes.

A very common real-world task is copying or processing binary files — resizing images, attaching files to emails, or storing uploaded files from a web form. The pattern is identical to text mode but you read bytes objects instead of strings.

For path manipulation, the modern way in Python 3.4+ is pathlib.Path. Forget string concatenation with os.path.join() — pathlib lets you build paths with / operator, check existence with .exists(), get the file extension with .suffix, list directory contents with .iterdir(), and open files directly via path.open(). It's more readable, cross-platform by default, and object-oriented in a way that makes your intent clear.

When you're processing a directory full of files — a common automation task — pathlib with a generator expression is significantly cleaner than os.listdir() combined with string filtering. The pattern Path('data').glob('*.csv') gives you an iterator of all CSV files in the directory, ready to open.

file_organiser.py · PYTHON

1234567891011121314151617181920212223242526272829303132333435363738394041424344

from pathlib import Path
import shutil

# Scenario: scan a 'downloads' folder and copy image files into an 'images' archive.
# This pattern works identically on Windows, macOS and Linux because pathlib
# handles the slash vs backslash difference for you automatically.

DOWNLOADS_DIR = Path("sample_downloads")
IMAGES_ARCHIVE_DIR = Path("organised") / "images"

# Create sample directory with mixed file types so the script is self-contained
DOWNLOADS_DIR.mkdir(exist_ok=True)
(DOWNLOADS_DIR / "holiday_photo.jpg").write_bytes(b"\xff\xd8\xff" + b"\x00" * 10)
(DOWNLOADS_DIR / "budget.xlsx").write_bytes(b"PK\x03\x04" + b"\x00" * 10)
(DOWNLOADS_DIR / "profile_pic.png").write_bytes(b"\x89PNG" + b"\x00" * 10)
(DOWNLOADS_DIR / "notes.txt").write_text("Some text notes", encoding="utf-8")
(DOWNLOADS_DIR / "logo.jpg").write_bytes(b"\xff\xd8\xff" + b"\x00" * 10)

IMAGES_ARCHIVE_DIR.mkdir(parents=True, exist_ok=True)

image_extensions = {".jpg", ".jpeg", ".png", ".gif", ".webp"}
copy_count = 0

# Path.iterdir() yields Path objects for every item in the directory —
# no string path manipulation needed, no os.path.join() required.
for file_path in DOWNLOADS_DIR.iterdir():
    # .is_file() filters out subdirectories
    # .suffix gives the file extension including the dot, e.g. '.jpg'
    if file_path.is_file() and file_path.suffix.lower() in image_extensions:
        destination = IMAGES_ARCHIVE_DIR / file_path.name

        # shutil.copy2 copies file content AND metadata (timestamps etc.)
        shutil.copy2(file_path, destination)
        copy_count += 1
        print(f"  Copied: {file_path.name} -> {destination}")

print(f"\nDone. {copy_count} image file(s) archived to {IMAGES_ARCHIVE_DIR}")

# Demonstrate reading binary content back
for image_path in IMAGES_ARCHIVE_DIR.glob("*.jpg"):
    # 'rb' mode returns raw bytes — no encoding involved
    with image_path.open("rb") as image_file:
        first_bytes = image_file.read(3)  # JPEG magic bytes are FF D8 FF
        print(f"  {image_path.name} magic bytes: {first_bytes.hex().upper()}")

▶ Output

Copied: holiday_photo.jpg -> organised/images/holiday_photo.jpg
Copied: profile_pic.png -> organised/images/profile_pic.png
Copied: logo.jpg -> organised/images/logo.jpg

Done. 3 image file(s) archived to organised/images
holiday_photo.jpg magic bytes: FFD8FF
logo.jpg magic bytes: FFD8FF

⚠️

Pro Tip: Use pathlib for Everything Path-RelatedIf you're still writing `os.path.join(base_dir, 'subfolder', filename)`, switch to pathlib today. `Path(base_dir) / 'subfolder' / filename` does exactly the same thing and is immediately readable. pathlib objects also work directly with `open()`, `shutil`, and most standard library functions, so there's no conversion overhead — it's a straight upgrade.

Aspect	Text Mode ('r', 'w', 'a')	Binary Mode ('rb', 'wb', 'ab')
Data type returned	str	bytes
Encoding applied	Yes — uses specified or platform encoding	No — raw bytes only
Newline translation	Yes — '\r\n' on Windows becomes '\n'	No — bytes are untouched
Use case	Logs, config, CSV, JSON, source code	Images, PDFs, audio, executables, pickled data
Error on bad bytes	UnicodeDecodeError if file isn't valid text	Never — all byte sequences are valid
File size consideration	Slightly smaller in memory (str interning)	Exact byte-for-byte copy in memory

🎯 Key Takeaways

Always use 'with open(...) as f:' — it's not just style, it's a resource safety guarantee that prevents file descriptor leaks and ensures buffers are flushed to disk.
'w' mode truncates the file to zero bytes the instant open() is called, before any write() happens — if you need existing content to survive, you want 'a' mode instead.
Iterate the file object directly ('for line in file:') rather than calling read() — this keeps memory usage constant for files of any size, which is the difference between a script that scales and one that doesn't.
Use pathlib.Path instead of os.path string operations — it's cross-platform, reads like English, and integrates cleanly with open(), glob(), mkdir(), and the rest of the standard library.

⚠ Common Mistakes to Avoid

✕Mistake 1: Opening a file without 'with' and forgetting to call close() — Symptom: intermittent 'Too many open files' OSError in long-running apps, or writes that never appear on disk because the buffer was never flushed — Fix: always use 'with open(...) as f:' with no exceptions. If you're in a class and must store the file object, implement __enter__ and __exit__ properly, or use contextlib.closing().
✕Mistake 2: Using 'w' mode on a file you meant to append to — Symptom: the file exists after your script runs but contains only the most recent run's data; all previous data is silently gone — Fix: ask yourself 'should old content survive this write?' If yes, use 'a'. If no (you're regenerating the file intentionally), use 'w'. Never default to 'w' without thinking about it.
✕Mistake 3: Reading an entire large file into memory with read() — Symptom: your script works fine on a 10 KB test file, then crashes with MemoryError or causes the server to swap when pointed at a 2 GB production log — Fix: iterate the file object line by line ('for line in file_object:') for text files, or use 'file.read(chunk_size)' in a loop for binary files. This keeps memory usage flat regardless of file size.

Interview Questions on This Topic

QWhat is the difference between opening a file in 'w' and 'a' mode, and what happens to an existing file's content the moment you call open() with 'w'?
QWhy should you use a 'with' statement when working with files in Python, and what specifically happens under the hood when the 'with' block exits — even if an exception is raised?
QA production script reads a config file that sometimes doesn't exist on first boot. A junior dev wrote 'if os.path.exists(path): open(path)' to handle this. What is wrong with that approach, and how would you rewrite it correctly?

Frequently Asked Questions

What is the difference between read(), readline() and readlines() in Python?

read() loads the entire file into a single string — convenient for small files but dangerous for large ones. readline() fetches exactly one line and moves the cursor forward, useful when you need manual control. readlines() returns a list of all lines as strings. In practice, iterating the file object directly ('for line in file:') is better than all three for large files because it reads one line at a time without loading everything into memory.

How do I check if a file exists before opening it in Python?

The Pythonic way is not to check first — just try to open it and catch FileNotFoundError. This avoids a race condition where the file could be deleted between your check and your open(). Use 'try: open(path) except FileNotFoundError: handle_it()'. If you only need to check without opening, Path('yourfile.txt').exists() from pathlib is the cleanest syntax.

Why do I get a UnicodeDecodeError when opening a file that looks like a text file?

This happens when the file's actual encoding doesn't match what Python is using to decode it. Python defaults to the platform encoding (often Windows-1252 on Windows, UTF-8 elsewhere). The fix is to specify the encoding explicitly: open('file.txt', 'r', encoding='utf-8'). If you don't know the file's encoding, install the 'chardet' library and use it to detect the encoding before opening. For files that might contain arbitrary bytes, open in binary mode ('rb') and handle decoding yourself.

🔥

TheCodeForge Editorial Team Verified Author

Written and reviewed by senior developers with real-world experience across enterprise, startup and open-source projects. Every article on TheCodeForge is written to be clear, accurate and genuinely useful — not just SEO filler.

About Our Team Editorial Standards

Forged with 🔥 at TheCodeForge.io — Where Developers Are Forged