Python Intermediate

Reading and Writing Files in Python — The Complete Practical Guide

📅 March 2026 ⏱ 8 min read 🎯 Intermediate

In Plain English 🔥

Think of a file on your computer like a physical notebook. Opening a file in Python is like picking up that notebook. Reading it is like flipping through the pages. Writing to it is like picking up a pen and adding new content. And closing it is like putting the notebook back on the shelf so someone else can use it. Python just automates all of that for you.

⚡ Quick Answer

Every real-world application eventually needs to talk to the file system. Whether you're saving user preferences, processing a CSV of sales data, writing application logs, or reading a configuration file on startup — file I/O is the plumbing that holds software together. Skip this skill and you're building programs that forget everything the moment they stop running.

Before Python's modern file handling existed, developers had to manually track whether files were open, remember to close them after every operation, and write error-handling boilerplate just to read a single line of text. One missed close() call could lock a file for the entire session, corrupt data, or leak memory. Python solved this with the context manager pattern — the 'with' statement — which handles cleanup automatically, no matter what goes wrong.

By the end of this article you'll know how to confidently open, read, write, and append to files in Python. You'll understand exactly which file mode to reach for in each situation, why the 'with' statement is non-negotiable in production code, and how to avoid the three silent mistakes that corrupt data and confuse even experienced developers.

File Modes Explained — Picking the Right Tool Before You Touch the File

Every time you open a file in Python, you're making a contract with the operating system. That contract is defined by the mode string you pass to open(). Get it wrong and you'll either overwrite data you meant to keep, get a FileNotFoundError you didn't expect, or silently append garbage to a file you thought was clean.

The four modes you'll use 90% of the time are: 'r' (read only — the file must already exist), 'w' (write — creates the file if it doesn't exist, but DESTROYS existing content if it does), 'a' (append — adds to the end without touching what's already there), and 'r+' (read and write — file must exist, cursor starts at the beginning).

There's also a binary variant for each: 'rb', 'wb', 'ab'. Use binary mode when working with images, PDFs, audio files, or any non-text data. Text mode ('r', 'w') automatically handles newline translation across operating systems — binary mode does not, which is exactly what you want for raw byte data.

The single most destructive mistake beginners make is opening a file in 'w' mode when they meant 'a'. Your log file from the last three months? Gone in one line. Understand the modes first — everything else builds on them.

file_modes_demo.py · PYTHON

123456789101112131415161718192021222324252627282930313233343536

import os

# --- Demonstrate all four primary file modes ---

log_file_path = "app_events.log"

# MODE 'w': Write mode — creates the file fresh every time.
# WARNING: If app_events.log already existed, its content is now gone.
with open(log_file_path, "w") as log_file:
    log_file.write("[INFO] Application started\n")
    log_file.write("[INFO] Loading configuration\n")
print("Step 1 — Written initial log entries.")

# MODE 'a': Append mode — adds to the END without touching existing lines.
# Safe to call multiple times; previous content is always preserved.
with open(log_file_path, "a") as log_file:
    log_file.write("[INFO] User logged in: alice@example.com\n")
print("Step 2 — Appended a new log entry.")

# MODE 'r': Read-only mode — safest for reading; raises FileNotFoundError
# if the file doesn't exist, which protects you from silent failures.
with open(log_file_path, "r") as log_file:
    full_contents = log_file.read()  # reads the entire file as one string
print("Step 3 — Full log file contents:")
print(full_contents)

# MODE 'r+': Read + Write — cursor starts at position 0.
# Useful when you need to read, then update a value in the same session.
with open(log_file_path, "r+") as log_file:
    first_line = log_file.readline()   # reads just the first line
    print(f"Step 4 — First log entry: {first_line.strip()}")
    log_file.write("[DEBUG] r+ mode appends after current cursor position\n")

# Clean up the demo file so this script is safe to re-run
os.remove(log_file_path)
print("Step 5 — Demo file removed.")

▶ Output

Step 1 — Written initial log entries.
Step 2 — Appended a new log entry.
Step 3 — Full log file contents:
[INFO] Application started
[INFO] Loading configuration
[INFO] User logged in: alice@example.com

Step 4 — First log entry: [INFO] Application started
Step 5 — Demo file removed.

⚠️

Watch Out:Opening a file in 'w' mode is irreversible — Python does not ask for confirmation. If you're appending to a log or adding records to a data file, always use 'a'. Reserve 'w' only for cases where you intentionally want a clean slate, like regenerating a report from scratch.

The 'with' Statement — Why Every Production File Open Uses It

Here's a scenario that breaks real applications: your code opens a file, starts processing its contents, and then raises an unexpected exception halfway through. If you opened the file with a plain open() call and relied on a manual file.close() at the bottom of your function, that close() never runs. The file handle stays open, the OS-level lock stays locked, and depending on your OS and how long the program runs, you can exhaust the system's limit on open file descriptors.

The 'with' statement solves this with a pattern called a context manager. The moment the 'with' block exits — whether normally or because an exception was raised — Python automatically calls the file's __exit__ method, which closes the file handle. You get guaranteed cleanup with zero extra code.

This isn't just good practice — it's the difference between code that works in a demo and code that survives a production environment. A web server handling thousands of requests per minute that doesn't close file handles will degrade and crash. A data pipeline that processes millions of rows and leaks file descriptors will hit OS limits and silently fail.

Always use 'with'. No exceptions (pun intended).

with_statement_demo.py · PYTHON

1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253

# --- The RIGHT way vs the RISKY way to handle files ---

config_path = "server_config.txt"

# First, create a sample config file to work with
with open(config_path, "w") as config_file:
    config_file.write("host=localhost\n")
    config_file.write("port=8080\n")
    config_file.write("debug=True\n")


# ❌ THE RISKY WAY — manual open/close
# If an exception fires between open() and close(), close() never runs.
def read_config_risky(filepath):
    config_file = open(filepath, "r")          # file is now open
    raw_content = config_file.read()           # what if THIS raises an error?
    config_file.close()                        # this line might never execute
    return raw_content


# ✅ THE SAFE WAY — context manager guarantees cleanup
# The file is closed the instant the 'with' block ends, no matter what.
def read_config_safe(filepath):
    with open(filepath, "r") as config_file:  # __enter__ opens the file
        raw_content = config_file.read()       # reads all content at once
    # config_file.__exit__ has already been called — file is 100% closed here
    return raw_content


# ✅ PARSING CONFIG — turning raw text into a usable dictionary
def parse_config(filepath):
    settings = {}
    with open(filepath, "r") as config_file:
        for line in config_file:               # iterates line-by-line (memory efficient)
            line = line.strip()                # removes leading/trailing whitespace and \n
            if "=" in line:                    # skip blank lines or malformed entries
                key, value = line.split("=", 1)  # maxsplit=1 protects values containing "="
                settings[key] = value
    return settings


raw = read_config_safe(config_path)
print("Raw file content:")
print(raw)

parsed = parse_config(config_path)
print("Parsed config dictionary:")
for setting_key, setting_value in parsed.items():
    print(f"  {setting_key} → {setting_value}")

# Clean up
import os
os.remove(config_path)

▶ Output

Raw file content:
host=localhost
port=8080
debug=True

Parsed config dictionary:
host → localhost
port → 8080
debug → True

⚠️

Pro Tip:You can open multiple files in a single 'with' statement using a comma: 'with open(source) as src, open(dest, "w") as dst'. This is cleaner than nesting two 'with' blocks and guarantees both files close even if one raises an exception.

Reading Strategies — readline vs readlines vs Iteration (and When to Use Each)

Python gives you three different ways to read a file's content, and picking the wrong one for your data size is one of the most common performance mistakes in Python scripts.

file.read() pulls the entire file into a single string in memory. It's convenient for small config files or templates, but loading a 2GB log file into a string will spike your RAM and potentially crash the process.

file.readlines() reads the entire file and returns a list of strings — one per line, newlines included. It has the same memory problem as read() because the whole file is loaded at once. It's handy when you need random access by line index, like 'give me line 47', but that use case is rarer than you'd think.

Iterating over the file object directly (for line in file) is almost always the right answer. Python reads the file one chunk at a time under the hood, so your memory usage stays flat regardless of file size. This is how you process gigabyte-scale files without breaking a sweat.

For writing, writelines() accepts an iterable of strings but does NOT add newlines for you — a detail that bites almost everyone the first time.

reading_strategies_demo.py · PYTHON

12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667

import os

sales_data_path = "quarterly_sales.csv"

# Create a realistic sample CSV file
with open(sales_data_path, "w") as sales_file:
    sales_file.write("date,product,units_sold,revenue\n")
    sales_file.write("2024-01-15,Widget Pro,120,2400.00\n")
    sales_file.write("2024-01-22,Widget Pro,95,1900.00\n")
    sales_file.write("2024-02-03,Gadget Plus,200,6000.00\n")
    sales_file.write("2024-02-18,Widget Pro,150,3000.00\n")
    sales_file.write("2024-03-07,Gadget Plus,175,5250.00\n")


# STRATEGY 1: read() — entire file as one string
# Best for: small files, templates, hashing file contents
with open(sales_data_path, "r") as sales_file:
    entire_content = sales_file.read()
print("=== Strategy 1: read() ===")
print(f"Type: {type(entire_content)}, Characters: {len(entire_content)}")
print()


# STRATEGY 2: readlines() — list of line strings (newlines included)
# Best for: when you need line numbers or random access by index
with open(sales_data_path, "r") as sales_file:
    all_lines = sales_file.readlines()         # ["date,product,...\n", "2024-01-15,...\n", ...]
print("=== Strategy 2: readlines() ===")
print(f"Type: {type(all_lines)}, Line count: {len(all_lines)}")
print(f"Line at index 2: {all_lines[2].strip()}")  # .strip() removes the trailing \n
print()


# STRATEGY 3: Line-by-line iteration — memory efficient for large files
# Best for: data processing, filtering rows, aggregation — use this by default
def calculate_total_revenue(filepath):
    total_revenue = 0.0
    with open(filepath, "r") as sales_file:
        next(sales_file)                        # skip the header row
        for data_line in sales_file:            # reads one line at a time — constant memory
            columns = data_line.strip().split(",")
            revenue_column = float(columns[3])  # index 3 is the 'revenue' column
            total_revenue += revenue_column
    return total_revenue

total = calculate_total_revenue(sales_data_path)
print("=== Strategy 3: Line-by-line iteration ===")
print(f"Total revenue across all sales: ${total:,.2f}")
print()


# writelines() DEMO — note: you must add your own newlines!
results_path = "revenue_summary.txt"
summary_lines = [
    "=== Q1 2024 Revenue Summary ===\n",  # \n must be explicit — writelines() won't add it
    f"Total Revenue: ${total:,.2f}\n",
    "Source: quarterly_sales.csv\n",
]
with open(results_path, "w") as results_file:
    results_file.writelines(summary_lines)     # writes each string in the list, no auto-newline

with open(results_path, "r") as results_file:
    print("=== Written summary file ===")
    print(results_file.read())

os.remove(sales_data_path)
os.remove(results_path)

▶ Output

=== Strategy 1: read() ===
Type: <class 'str'>, Characters: 171

=== Strategy 2: readlines() ===
Type: <class 'list'>, Line count: 6
Line at index 2: 2024-01-22,Widget Pro,95,1900.00

=== Strategy 3: Line-by-line iteration ===
Total revenue across all sales: $18,550.00

=== Written summary file ===
=== Q1 2024 Revenue Summary ===
Total Revenue: $18,550.00
Source: quarterly_sales.csv

🔥

Interview Gold:When asked 'how would you process a 10GB log file in Python?', the answer interviewers want is line-by-line iteration with a 'with' statement — not read() or readlines(). Explain that iterating over the file object streams data in chunks, keeping memory usage O(1) relative to file size.

Real-World Pattern — Building a Persistent Task Manager with File I/O

Reading and writing individual lines is one thing. Putting it together into a coherent, real application is what separates tutorial knowledge from practical skill. Let's build a minimal persistent task manager — one that saves tasks to a file, loads them on startup, marks them complete, and never loses data between runs.

This pattern appears everywhere: shopping cart persistence, user preference files, simple caches, CI pipeline state files, and configuration management. The core loop is always the same — load state from disk at startup, modify in memory, write back to disk on change.

The key design decision here is using 'a' mode for adding new tasks (safe, non-destructive) but 'w' mode to rewrite the entire file when we mark a task complete — because you can't efficiently delete a line from the middle of a file without rewriting it. This read-modify-write pattern is fundamental to file-based persistence.

task_manager.py · PYTHON

1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677

import os
from datetime import datetime

TASKS_FILE = "my_tasks.txt"
COMPLETED_MARKER = "[DONE]"
PENDING_MARKER = "[TODO]"


def load_tasks(filepath):
    """Read all tasks from disk. Returns an empty list if file doesn't exist yet."""
    if not os.path.exists(filepath):            # first run — no file yet, that's fine
        return []
    tasks = []
    with open(filepath, "r") as task_file:
        for raw_line in task_file:              # memory-efficient line iteration
            stripped = raw_line.strip()
            if stripped:                        # ignore any blank lines in the file
                tasks.append(stripped)
    return tasks


def add_task(filepath, task_description):
    """Append a new task to the file. Uses 'a' mode — never overwrites existing tasks."""
    timestamp = datetime.now().strftime("%Y-%m-%d %H:%M")
    task_entry = f"{PENDING_MARKER} [{timestamp}] {task_description}\n"
    with open(filepath, "a") as task_file:     # 'a' is safe — no data loss possible
        task_file.write(task_entry)
    print(f"  ✅ Added: '{task_description}'")


def complete_task(filepath, task_index):
    """Mark a task as done. Must rewrite the whole file — no way to edit a single line in place."""
    all_tasks = load_tasks(filepath)
    if task_index < 0 or task_index >= len(all_tasks):
        print(f"  ❌ No task at index {task_index}")
        return
    target_task = all_tasks[task_index]
    if COMPLETED_MARKER in target_task:
        print(f"  ⚠️  Task {task_index} is already complete.")
        return
    # Replace the TODO marker with DONE — read-modify-write pattern
    all_tasks[task_index] = target_task.replace(PENDING_MARKER, COMPLETED_MARKER, 1)
    with open(filepath, "w") as task_file:     # 'w' intentionally overwrites — we have all data in memory
        for updated_task in all_tasks:
            task_file.write(updated_task + "\n")
    print(f"  ✅ Marked task {task_index} as complete.")


def display_tasks(filepath):
    """Print all tasks with their index so the user knows what index to pass to complete_task."""
    all_tasks = load_tasks(filepath)
    if not all_tasks:
        print("  📋 No tasks yet.")
        return
    print("  📋 Current tasks:")
    for index, task_line in enumerate(all_tasks):
        print(f"    [{index}] {task_line}")


# --- Run a simulated session ---
print("--- Adding tasks ---")
add_task(TASKS_FILE, "Write unit tests for the payment module")
add_task(TASKS_FILE, "Review PR #47 from the backend team")
add_task(TASKS_FILE, "Update README with new API endpoints")

print("\n--- Current task list ---")
display_tasks(TASKS_FILE)

print("\n--- Completing task 1 ---")
complete_task(TASKS_FILE, 1)

print("\n--- Updated task list ---")
display_tasks(TASKS_FILE)

# Clean up demo file
os.remove(TASKS_FILE)
print("\n--- Demo complete. Task file removed. ---")

▶ Output

--- Adding tasks ---
✅ Added: 'Write unit tests for the payment module'
✅ Added: 'Review PR #47 from the backend team'
✅ Added: 'Update README with new API endpoints'

--- Current task list ---
📋 Current tasks:
[0] [TODO] [2024-06-10 14:23] Write unit tests for the payment module
[1] [TODO] [2024-06-10 14:23] Review PR #47 from the backend team
[2] [TODO] [2024-06-10 14:23] Update README with new API endpoints

--- Completing task 1 ---
✅ Marked task 1 as complete.

--- Updated task list ---
📋 Current tasks:
[0] [TODO] [2024-06-10 14:23] Write unit tests for the payment module
[1] [DONE] [2024-06-10 14:23] Review PR #47 from the backend team
[2] [TODO] [2024-06-10 14:23] Update README with new API endpoints

--- Demo complete. Task file removed. ---

⚠️

Pro Tip:The read-modify-write pattern (load all → change in memory → write all back) is safe for small files but risky on large ones if your process crashes mid-write. For production persistence, write to a temp file first, then use os.replace() to atomically swap it in. This guarantees you never end up with a half-written, corrupted file.

Method	Returns	Memory Usage	Best For	Newlines Included?
file.read()	Single string	Entire file in RAM	Small files, hashing, templates	Yes — as \n in string
file.readlines()	List of strings	Entire file in RAM	Line-indexed random access	Yes — each string ends with \n
for line in file	One string per iteration	Constant — one chunk at a time	Large files, data processing	Yes — strip() to remove
file.readline()	One line string	Single line in RAM	Reading headers, state machines	Yes — strip() to remove
file.write()	Characters written (int)	Only what you write	Writing single strings	No — you must add \n
file.writelines()	None	Only what you write	Writing a list of strings	No — you must add \n in each item

🎯 Key Takeaways

Always use the 'with' statement — it guarantees the file closes even if an exception fires, preventing file descriptor leaks that silently degrade production systems.
Mode 'w' destroys existing content the instant you call open() — no confirmation, no recovery. Use 'a' for appending and 'w' only when you explicitly need a clean file.
Iterating over a file object line-by-line is O(1) in memory — it's the correct default for any file processing. Only use read() or readlines() when the file is guaranteed to be small.
writelines() does not add newlines between items — you must include '\n' in each string yourself. Forgetting this merges all your lines into one long string with no separator.

⚠ Common Mistakes to Avoid

✕Mistake 1: Using 'w' mode instead of 'a' mode for appending — Symptom: every time the script runs, the file only contains content from the latest run; all previous data silently vanishes — Fix: audit every open() call and ask 'do I want to destroy existing content?' If no, change 'w' to 'a'.
✕Mistake 2: Calling file.read() on a large file — Symptom: script works fine on small test files, then crashes with a MemoryError or hangs indefinitely in production when the file is gigabytes in size — Fix: switch to iterating directly over the file object ('for line in file') which streams data in OS-level chunks and keeps memory usage flat.
✕Mistake 3: Forgetting to strip newlines after reading lines — Symptom: string comparisons fail silently, dictionary lookups return None, and printed output has unexpected blank lines — Fix: always call .strip() on lines you read (e.g., 'line.strip()'), or .rstrip('\n') if you want to preserve leading whitespace. This is especially critical when using readline values as dictionary keys.

Interview Questions on This Topic

QWhat is the difference between opening a file in 'r+' mode and 'w+' mode, and when would you choose one over the other?
QYou have a 50GB log file and need to find all lines containing the string 'ERROR'. Walk me through how you'd write that in Python and explain why your approach is memory-efficient.
QIf an exception is raised inside a 'with open(...)' block, is the file guaranteed to be closed? How does that work under the hood, and how does it compare to wrapping a manual close() in a finally block?

Frequently Asked Questions

What is the difference between read(), readline(), and readlines() in Python?

read() returns the entire file as a single string. readline() returns one line at a time — each call advances the cursor. readlines() returns a list where each element is one line including its newline character. For large files, iterating directly over the file object ('for line in file') is more memory-efficient than all three.

Do I need to close a file in Python if I use the 'with' statement?

No — that's the entire point of using 'with'. The context manager protocol guarantees that __exit__ is called when the block ends, which closes the file handle automatically. You only need an explicit close() if you opened the file without 'with', which you should avoid in production code.

How do I read a file that might not exist yet without getting an error?

Check for the file's existence before opening it using os.path.exists(filepath). Alternatively, wrap the open() call in a try/except block and catch FileNotFoundError specifically — this is actually the more Pythonic approach because it avoids a race condition where the file could be deleted between the check and the open.

🔥

TheCodeForge Editorial Team Verified Author

Written and reviewed by senior developers with real-world experience across enterprise, startup and open-source projects. Every article on TheCodeForge is written to be clear, accurate and genuinely useful — not just SEO filler.

About Our Team Editorial Standards

Forged with 🔥 at TheCodeForge.io — Where Developers Are Forged