Python File I/O — Descriptor Leaks Without `with`
with leaked 2 fds per rotation cycle — after 42 hours the server hit the OS 1,024 limit and crashed.- Always use the 'with' statement — it guarantees the file closes on all exit paths including exceptions, early returns, and raised errors. Every bare
open()call in production code is a potential file descriptor leak that accumulates silently until the OS hard limit is hit and everything fails simultaneously. - Mode 'w' destroys existing content the instant you call
open()— no confirmation, no warning, no recovery. Use 'a' for appending and 'w' only when you explicitly and intentionally need a clean file. When in doubt, use 'a' and verify the behavior is correct before switching to 'w'. - Iterating over a file object line-by-line is O(1) in memory regardless of file size — it is the correct default for any file that might grow in production. Only use
read()orreadlines()when the file size is bounded, known, and genuinely small.
- Python file I/O uses open() with a mode string: 'r' (read), 'w' (write/destroy existing content), 'a' (append), 'r+' (read+write without truncation)
- Always use the 'with' statement — it guarantees file closure even when exceptions fire, preventing OS-level file descriptor leaks that silently degrade production systems
- Iterate line-by-line with 'for line in file' for O(1) memory usage — never use read() or readlines() on files larger than your available RAM
- Mode 'w' destroys existing content the instant open() is called with no confirmation and no recovery — use 'a' for appending unless you explicitly need a clean slate
- writelines() does NOT add newlines between items — you must include '\n' in each string yourself or all output merges into one unreadable line
- Biggest production mistake: using 'w' mode when you meant 'a' — months of log data vanishes in a single open() call with no error, no warning, and no undo
Python File I/O Debug Cheat Sheet
Process hitting 'Too many open files' error
ls -la /proc/<pid>/fd/ | wc -llsof -p <pid> | sort -k9 | head -50File content appears corrupted, truncated, or partially written after a crash
python3 -c "import os; stat = os.stat('data.txt'); print(f'Size: {stat.st_size} bytes')"xxd data.txt | tail -10PermissionError: [Errno 13] Permission denied when writing to a file
ls -la /path/to/file.txtnamei -l /path/to/file.txtOSError: [Errno 28] No space left on device during a file write
df -h /path/to/mountdu -sh /path/to/directory/* 2>/dev/null | sort -rh | head -10Production Incident
open() call without a 'with' statement. Each rotation cycle — triggered every five minutes by a cron job — opened the old log file for reading and a new log file for writing. The manual close() calls were placed after a conditional return statement that fired when the old log file was detected as empty. When the rotation correctly identified an empty log, it returned early and skipped both close() calls. Two file handles leaked every five minutes. After approximately 500 rotation cycles — roughly 42 hours of cumulative uptime — the process hit the OS hard limit of 1,024 open file descriptors and all subsequent file operations failed simultaneously. The server had been leaking silently the entire time with no warning.open() calls in the log rotation module with 'with' statements. The context manager guarantees __exit__ is called and the file is closed regardless of whether the function returns early, raises an exception, or completes normally. Also raised the process ulimit to 4,096 as a buffer against future leaks being caught before they cascade: ulimit -n 4096. Added a Prometheus gauge monitoring the open file descriptor count at the process level using os.sysconf('SC_OPEN_MAX'), with an alert threshold at 80% of the limit so the team gets warning long before the next hard failure.open() call without a 'with' statement is a potential file descriptor leak — even when close() exists, early returns and exceptions can bypass it entirely, and the OS will not warn you until the hard limit is hitFile descriptor leaks are silent by design — the OS does not throttle or warn you as you approach the limit; it simply fails all at once when you cross it, at which point every file operation in the process fails simultaneouslyMonitor open file descriptors in production as a first-class metric: ls -la /proc/<pid>/fd/ | wc -l or lsof -p <pid> | wc -l gives you a count; a count that grows monotonically over hours is a leakRaising ulimits proactively for file-heavy services buys time for alerting to catch leaks before they become incidents, but it is not a substitute for fixing the leakProduction Debug GuideCommon symptoms when Python file operations behave unexpectedly in production — ordered by frequency of occurrence
open() call with a 'with' statement. The pattern is almost always a code path that returns early or raises an exception before reaching the manual close() call.open() is called — there is no confirmation and no recovery. Search your codebase for open(filepath, 'w') or open(filepath, 'w+') in any context where you intend to preserve existing content. Change those to 'a' for append-only access.file.read() or file.readlines(), both of which load the entire file into RAM before you can process any of it. Switch to line-by-line iteration: for line in file. Python reads the file in OS-level buffer chunks and yields one line at a time. Memory usage stays constant at a few kilobytes regardless of whether the file is 50MB or 50GB.Every real-world application eventually needs to talk to the file system. Whether you're saving user preferences, processing a CSV of sales data, writing application logs, reading a configuration file on startup, or building a data pipeline — file I/O is the plumbing that holds software together. Skip this skill and you're building programs that forget everything the moment they stop running.
Before Python's modern file handling existed, developers had to manually track whether files were open, remember to close them after every operation, and write error-handling boilerplate just to read a single line of text safely. One missed close() call could lock a file for the entire session, corrupt data, or exhaust the operating system's limit on open file descriptors — often after hours of flawless operation, which is the worst possible time to discover the bug.
Python solved this with the context manager pattern — the 'with' statement — which handles cleanup automatically regardless of what goes wrong inside the block. It is not a stylistic preference. It is the difference between code that works in a demo environment and code that survives a production server running for weeks.
By the end of this guide you will know how to confidently open, read, write, and append to files. You will understand exactly which file mode to reach for in each situation, why the 'with' statement is non-negotiable in any code you ship, how to process files that are larger than your available RAM, and how to avoid the silent mistakes that corrupt data and confuse experienced developers who should have known better.
File Modes Explained — Picking the Right Tool Before You Touch the File
Every time you open a file in Python, you are making a contract with the operating system. That contract is defined by the mode string you pass to open(). Get it wrong and you will either overwrite data you meant to keep, get a FileNotFoundError you did not expect, or silently append garbage to a file you thought was clean. There is no undo button and no confirmation prompt.
The four modes you will use in 90% of real work are: 'r' (read only — the file must already exist and you cannot modify it), 'w' (write — creates the file if it does not exist, but destroys all existing content if it does, immediately, with no warning), 'a' (append — adds new content to the end without touching what is already there, and creates the file if it does not exist), and 'r+' (read and write — the file must exist, the cursor starts at position 0, and writing does not truncate existing content — it overwrites bytes at the current cursor position).
There is also a binary variant for each mode: 'rb', 'wb', 'ab', 'r+b'. Use binary mode when working with images, PDFs, audio files, pickled Python objects, or any data that is not human-readable text. Text mode ('r', 'w') automatically handles newline translation across operating systems — on Windows, ' ' in your Python string becomes '\r ' on disk. Binary mode bypasses all of that, which is exactly what you need when you are working with raw bytes that must be preserved exactly as-is.
There is also 'x' mode — exclusive creation — which creates a new file but raises FileExistsError if the file already exists. This is useful when you need to guarantee you are creating a fresh file and want the operation to fail rather than silently overwrite something. It is the safe alternative to 'w' in situations where overwriting would be a bug rather than an intended operation.
The single most destructive mistake in Python file I/O is opening a file in 'w' mode when you meant 'a'. Your log file from the last three months? Gone in one open() call. Understand the modes before you write a single open() call — everything else in file I/O builds on top of this foundation.
import os # --- Demonstrate all primary file modes with a realistic log file scenario --- log_file_path = "app_events.log" # MODE 'w': Write mode — creates the file fresh every time it is called. # CRITICAL: If app_events.log already existed with content, that content is # now gone. Python does not ask. It does not warn. It just truncates. with open(log_file_path, "w") as log_file: log_file.write("[INFO] Application started\n") log_file.write("[INFO] Loading configuration from /etc/app/config.yml\n") print("Step 1 — Written initial log entries with 'w' mode.") # MODE 'a': Append mode — the only safe mode for growing log files. # Adds to the END of the file without touching any existing content. # If the file does not exist, it creates it. If it does exist, it adds to it. # This is what you want for every logging, audit trail, and event recording use case. with open(log_file_path, "a") as log_file: log_file.write("[INFO] User authenticated: alice@example.com\n") log_file.write("[WARN] Rate limit approaching for endpoint /api/orders\n") print("Step 2 — Appended new log entries with 'a' mode. Existing content intact.") # MODE 'r': Read-only mode — the safest mode for reading. # Raises FileNotFoundError if the file does not exist, which protects you # from silently processing an empty or default state. with open(log_file_path, "r") as log_file: full_contents = log_file.read() # entire file as one string — fine for small files print("Step 3 — Full log file contents via 'r' mode:") print(full_contents) # MODE 'r+': Read + Write — file must exist (no creation), no truncation. # The cursor starts at position 0. Use when you need to read state # and then update it within the same file handle. with open(log_file_path, "r+") as log_file: first_line = log_file.readline() # reads just the first line, advances cursor print(f"Step 4 — First log entry: {first_line.strip()}") # Writing now happens after the cursor, not at the beginning log_file.write("[DEBUG] r+ mode writes at current cursor position\n") # MODE 'x': Exclusive creation — fails if file already exists. # Use this when you need to guarantee you are not overwriting an existing file. # More defensive than 'w' for one-time setup files. new_lock_path = "process.lock" try: with open(new_lock_path, "x") as lock_file: lock_file.write(f"PID: {os.getpid()}\n") print("Step 5 — Created process lock file with 'x' mode.") except FileExistsError: print("Step 5 — Lock file already exists — another process may be running.") # Clean up demo files so this script is safe to re-run for path in [log_file_path, new_lock_path]: if os.path.exists(path): os.remove(path) print("Step 6 — Demo files removed.")
Step 2 — Appended new log entries with 'a' mode. Existing content intact.
Step 3 — Full log file contents via 'r' mode:
[INFO] Application started
[INFO] Loading configuration from /etc/app/config.yml
[INFO] User authenticated: alice@example.com
[WARN] Rate limit approaching for endpoint /api/orders
Step 4 — First log entry: [INFO] Application started
Step 5 — Created process lock file with 'x' mode.
Step 6 — Demo files removed.
open() is called — before you write a single character. Python does not ask for confirmation, does not back up the existing content, and does not raise any exception. If you opened the wrong file or used the wrong mode, the data is gone. The only safe reflex: audit every open() call before you ship it. If the intent is to add new content without destroying old content — logs, audit trails, accumulated data — the mode must be 'a', not 'w'. Reserve 'w' exclusively for situations where you deliberately want a clean file: regenerating a report from scratch, creating a new configuration on first run, or rewriting a file whose previous state is no longer relevant.open() is called — there is no recovery path and no warning. A single wrong mode string in a log rotation script, a daily report generator, or an API response writer can silently wipe data that took months to accumulate.open() call, immediately ask yourself 'do I want to destroy existing content on every run?' If no, change 'w' to 'a'. If yes, add a comment documenting why 'w' is intentional. That comment serves as a speed bump for the next developer who sees it and instinctively wonders if it is a bug.The 'with' Statement — Why Every Production File Open Uses It
Here is a scenario that breaks real applications: your code opens a file, starts processing its contents, and then raises an unexpected exception halfway through — maybe a network timeout, maybe a malformed record, maybe a KeyError on a dictionary lookup. If you opened the file with a plain open() call and relied on a manual file.close() at the end of the function, that close() never runs. The file handle stays open, the OS-level resource stays allocated, and the process accumulates open file descriptors with every subsequent error.
On Linux and macOS, the default limit for open file descriptors per process is 1,024. That number sounds large until you have a web server handling 200 requests per minute, each of which opens a file without properly closing it on the error path. At that rate, you hit the limit in under ten minutes and every subsequent file operation in the entire process starts failing.
The 'with' statement solves this with the context manager protocol. When you enter a 'with' block, Python calls the object's __enter__ method. When the block exits — regardless of whether it exits normally, through a return statement, or because an exception was raised — Python calls the object's __exit__ method, which closes the file handle. Guaranteed. Every time.
This is not a stylistic nicety or a PEP 8 preference. It is the mechanism that makes the difference between code that works in development and code that survives weeks of continuous operation in production. The production incident at the top of this guide happened because one engineer replaced a 'with' statement with a bare open() and a manual close(), the close() was on the wrong side of an early return, and the server degraded silently over 42 hours before crashing hard.
import os # First, create a sample config file to work with config_path = "server_config.txt" with open(config_path, "w") as config_file: config_file.write("host=localhost\n") config_file.write("port=8080\n") config_file.write("debug=True\n") config_file.write("max_connections=100\n") # ❌ THE RISKY WAY — bare open() with manual close() # If any line between open() and close() raises an exception, # close() is skipped. The file handle leaks into the process. # Under enough load or enough failures, this exhausts the OS fd limit. def read_config_risky(filepath): config_file = open(filepath, "r") # handle is now open raw_content = config_file.read() # what if the file is unreadable mid-read? config_file.close() # this line might NEVER execute return raw_content # ✅ THE SAFE WAY — context manager guarantees cleanup on every exit path # The file closes the instant the 'with' block ends, whether normally or on exception. # You cannot accidentally leave it open — the protocol enforces closure. def read_config_safe(filepath): with open(filepath, "r") as config_file: # __enter__ opens, registers cleanup raw_content = config_file.read() # config_file.__exit__ has been called — the file is 100% closed here # The 'config_file' name is still in scope but using it raises ValueError: # 'I/O operation on closed file' — a clear error rather than a silent leak return raw_content # ✅ OPENING MULTIPLE FILES IN ONE 'with' STATEMENT # Both files are guaranteed to close even if one raises an exception mid-operation. # Cleaner than nesting two 'with' blocks. def merge_configs(primary_path, override_path, output_path): with open(primary_path, "r") as primary, \ open(override_path, "r") as override, \ open(output_path, "w") as merged: merged.write(primary.read()) merged.write(override.read()) # ✅ PARSING CONFIG — turning a raw text file into a usable dictionary # Line-by-line iteration keeps memory usage flat — important if config files # ever grow beyond a few kilobytes (templates, include directives, etc.) def parse_config(filepath): settings = {} with open(filepath, "r") as config_file: for line in config_file: # reads one line at a time line = line.strip() # removes leading/trailing whitespace and \n if not line or line.startswith("#"): # skip blank lines and comments continue if "=" in line: # maxsplit=1 protects values that themselves contain '=' characters # Without it, 'url=https://host:8080' would split into three parts key, value = line.split("=", 1) settings[key.strip()] = value.strip() return settings # Run the demonstrations raw = read_config_safe(config_path) print("Raw file content:") print(raw) parsed = parse_config(config_path) print("Parsed config dictionary:") for setting_key, setting_value in parsed.items(): print(f" {setting_key} → {setting_value}") print() print(f"Is config_file closed after 'with' block? True (cannot access it meaningfully)") # Clean up os.remove(config_path)
host=localhost
port=8080
debug=True
max_connections=100
Parsed config dictionary:
host → localhost
port → 8080
debug → True
max_connections → 100
Is config_file closed after 'with' block? True (cannot access it meaningfully)
open() raises an exception. The files close in reverse order of opening — dst closes first, then src — which is the safe order for copy and merge operations.open() without 'with' leaks file descriptors on every early return and every unhandled exception — not just on catastrophic failures. A function that returns early when input is invalid, skips close() on that path, and gets called a thousand times per hour will exhaust the OS file descriptor limit in under two hours with no error until the cliff.open() call in production code is a potential file descriptor leak. Leaks are silent until the OS hard limit is hit, at which point the entire process fails simultaneously with no grace period.open() calls and manual close()open() is a latent leak that only manifests under error conditions or high loadReading Strategies — read vs readline vs readlines vs Iteration
Python gives you four distinct ways to read file content, and picking the wrong one for your data size is one of the most common and most avoidable performance mistakes in Python scripts. The good news: the selection rule is simple once you understand what each method actually does.
file.read() pulls the entire file into a single string in memory. It is convenient for small configuration files, templates, and hash calculations, but loading a 2GB log file into a string will consume 2GB of RAM and potentially kill the process. Read() is correct for files you know are bounded in size — a few megabytes at most.
file.readlines() reads the entire file and returns a list of strings — one string per line, each with its trailing newline character included. It has the same total memory cost as read() because everything loads at once. The advantage is that you get random line access: all_lines[47] gives you line 47 without reading anything else. Use it when you genuinely need index-based line access. Rarely needed in practice.
file.readline() reads exactly one line and advances the cursor. Each call returns the next line. Useful for reading a header row separately, implementing state machines over file content, or when you need fine-grained control over which lines you process. Low overhead per call but verbose for processing entire files.
Iterating over the file object directly — for line in file — is the correct default for almost everything. Python buffers the file in OS-level chunks (typically 8KB) and yields one line at a time. Your memory usage stays flat regardless of whether the file is 10MB or 10GB. This is how you process large files without ever thinking about RAM.
For writing, file.write() takes a single string and writes it exactly as given — no automatic newlines added. file.writelines() takes an iterable of strings and writes each one in sequence — also with no automatic newlines added. The writelines() trap is subtle: if you forget to include ' ' in your strings, all your lines are concatenated into one continuous stream with no separators, and the output looks nothing like what you intended.
import os sales_data_path = "quarterly_sales.csv" # Create a realistic sample CSV file for the demonstrations with open(sales_data_path, "w") as sales_file: sales_file.write("date,product,units_sold,revenue\n") sales_file.write("2024-01-15,Widget Pro,120,2400.00\n") sales_file.write("2024-01-22,Widget Pro,95,1900.00\n") sales_file.write("2024-02-03,Gadget Plus,200,6000.00\n") sales_file.write("2024-02-18,Widget Pro,150,3000.00\n") sales_file.write("2024-03-07,Gadget Plus,175,5250.00\n") # STRATEGY 1: read() — entire file as one string # Use when: small file (<10MB), you need the full content as a string, # or you are hashing, templating, or comparing entire file contents. # Avoid when: file could grow — every byte in the file costs one byte of RAM. with open(sales_data_path, "r") as sales_file: entire_content = sales_file.read() print("=== Strategy 1: read() ===") print(f"Type: {type(entire_content).__name__}, Characters: {len(entire_content)}") print() # STRATEGY 2: readlines() — list of line strings, newlines included # Use when: you need random access to lines by index (e.g., 'give me line 3'). # Avoid when: file is large — the entire file loads into RAM as a list. with open(sales_data_path, "r") as sales_file: all_lines = sales_file.readlines() print("=== Strategy 2: readlines() ===") print(f"Type: {type(all_lines).__name__}, Line count: {len(all_lines)}") print(f"Line at index 2 (raw): '{all_lines[2]}'") print(f"Line at index 2 (clean): '{all_lines[2].strip()}'") print() # STRATEGY 3: Line-by-line iteration — the correct default for file processing # Use when: processing any file that could grow, filtering rows, aggregating data. # Python reads in OS-level chunks internally — your memory usage is O(1). # next(file) skips the header row without loading it into a structure you track. def calculate_total_revenue(filepath): total_revenue = 0.0 with open(filepath, "r") as sales_file: next(sales_file) # skip the header row cleanly for data_line in sales_file: # reads one line at a time from OS buffer columns = data_line.strip().split(",") total_revenue += float(columns[3]) # index 3 is the 'revenue' column return total_revenue total = calculate_total_revenue(sales_data_path) print("=== Strategy 3: Line-by-line iteration ===") print(f"Total revenue across all sales: ${total:,.2f}") print() # STRATEGY 4: readline() — one line at a time, explicit control # Use when: reading a header separately, then processing the rest differently. with open(sales_data_path, "r") as sales_file: header = sales_file.readline().strip() # reads exactly the first line column_names = header.split(",") print("=== Strategy 4: readline() for header ===") print(f"Columns: {column_names}") first_data_line = sales_file.readline().strip() # cursor is now at line 2 print(f"First data row: {first_data_line}") print() # writelines() DEMO — no automatic newlines added # Every string in the list must include '\n' explicitly. # Forgetting '\n' merges all lines into one continuous string with no separators. results_path = "revenue_summary.txt" summary_lines = [ "=== Q1 2024 Revenue Summary ===\n", # \n is required — writelines() adds nothing f"Total Revenue: ${total:,.2f}\n", "Source: quarterly_sales.csv\n", "Generated by: io/thecodeforge/files/reading_strategies_demo.py\n", ] with open(results_path, "w") as results_file: results_file.writelines(summary_lines) with open(results_path, "r") as results_file: print("=== writelines() output ===") print(results_file.read()) os.remove(sales_data_path) os.remove(results_path)
Type: str, Characters: 185
=== Strategy 2: readlines() ===
Type: list, Line count: 6
Line at index 2 (raw): '2024-01-22,Widget Pro,95,1900.00\n'
Line at index 2 (clean): '2024-01-22,Widget Pro,95,1900.00'
=== Strategy 3: Line-by-line iteration ===
Total revenue across all sales: $18,550.00
=== Strategy 4: readline() for header ===
Columns: ['date', 'product', 'units_sold', 'revenue']
First data row: 2024-01-15,Widget Pro,120,2400.00
=== writelines() output ===
=== Q1 2024 Revenue Summary ===
Total Revenue: $18,550.00
Source: quarterly_sales.csv
Generated by: io/thecodeforge/files/reading_strategies_demo.py
file.read() and file.readlines(), which load the entire file into RAM before you can process any of it. This answer demonstrates that you understand the distinction between loading data and streaming data, which is fundamental to building production data pipelines.file.readlines() are O(n) in memory where n is file size — a 5GB file needs 5GB of RAM before you process a single record. Line-by-line iteration is O(1) in memory regardless of file size.read() or readlines(). Use line-by-line iteration. This is not a premature optimization — it is the difference between a script that works on your laptop and one that works in a container with 512MB of memory.file.readlines() are convenience methods for small, bounded files only — they load the entire file into RAM before you process any of it.file.read() — simple, fast, appropriate when the file size is bounded and knownfile.readlines() — returns a list you can index into, but loads the entire file into RAMfile.readline() for the header, then switch to 'for line in file' for the bodyfile.writelines() — but include '\n' in each string explicitly; writelines() adds nothing between itemsReal-World Pattern — Building a Persistent Task Manager with File I/O
Reading and writing individual lines is one thing. Putting it together into a coherent application that correctly handles all the edge cases is what separates tutorial knowledge from practical production skill. Let's build a minimal persistent task manager — one that saves tasks to a file, loads them correctly on startup, marks them complete, and never loses data between runs or between failures.
This exact pattern appears throughout production codebases: shopping cart persistence, user preference files, application state caches, CI pipeline checkpoint files, and configuration management tools. The core loop is always the same — load state from disk at startup, modify in memory, write back to disk when state changes.
Two deliberate design decisions in this implementation are worth understanding. First, we use 'a' mode for adding tasks — it is non-destructive and safe to call concurrently or repeatedly. Second, we use 'w' mode when marking a task complete, because there is no efficient way to delete or modify a line in the middle of a file without rewriting it. The read-modify-write pattern — load all records into memory, change what needs changing, write everything back — is the standard approach for file-based persistence with small-to-medium datasets.
For production deployments where the file could be large or where a crash mid-write would be unacceptable, the safe extension of this pattern is to write to a temporary file first, verify the write succeeded, and then use os.replace() to atomically swap the temporary file into place. This guarantees you never end up with a half-written, corrupted file — the swap is atomic at the OS level.
import os from datetime import datetime TASKS_FILE = "my_tasks.txt" COMPLETED_MARKER = "[DONE]" PENDING_MARKER = "[TODO]" def load_tasks(filepath): """ Read all tasks from disk. Returns an empty list if the file does not exist. First run behavior: no file means no tasks — a perfectly valid state. Using os.path.exists() rather than try/except here because we need to distinguish 'file does not exist' from 'file exists but cannot be read'. """ if not os.path.exists(filepath): return [] tasks = [] with open(filepath, "r") as task_file: for raw_line in task_file: # line-by-line: memory stays flat stripped = raw_line.strip() if stripped: # skip any blank lines tasks.append(stripped) return tasks def add_task(filepath, task_description): """ Append a new task to the file. Uses 'a' mode — no existing content is touched regardless of what happens. Safe to call concurrently from multiple processes (though not transactionally safe). """ timestamp = datetime.now().strftime("%Y-%m-%d %H:%M") task_entry = f"{PENDING_MARKER} [{timestamp}] {task_description}\n" with open(filepath, "a") as task_file: # 'a' is the only correct mode here task_file.write(task_entry) print(f" Added: '{task_description}'") def complete_task(filepath, task_index): """ Mark a task as complete. Requires rewriting the entire file because there is no efficient mechanism to modify a single line in place. Production-safe variant: write to a .tmp file first, then os.replace() to atomically swap it in. This prevents partial writes from corrupting the task file if the process is killed mid-write. """ all_tasks = load_tasks(filepath) if task_index < 0 or task_index >= len(all_tasks): print(f" No task at index {task_index}. Valid range: 0 to {len(all_tasks) - 1}") return target_task = all_tasks[task_index] if COMPLETED_MARKER in target_task: print(f" Task {task_index} is already marked complete.") return # Update the marker in memory all_tasks[task_index] = target_task.replace(PENDING_MARKER, COMPLETED_MARKER, 1) # Atomic write pattern — safe against crashes mid-write temp_path = filepath + ".tmp" with open(temp_path, "w") as temp_file: # write to temp first for updated_task in all_tasks: temp_file.write(updated_task + "\n") # os.replace() is atomic at the OS level — the tasks file is either the # old version or the new version; it is never half-written os.replace(temp_path, filepath) print(f" Marked task {task_index} as complete.") def display_tasks(filepath): """ Print all tasks with their index so the user knows what to pass to complete_task. Includes summary statistics to make the display more useful. """ all_tasks = load_tasks(filepath) if not all_tasks: print(" No tasks yet. Add some with add_task().") return done = sum(1 for t in all_tasks if COMPLETED_MARKER in t) pending = len(all_tasks) - done print(f" Tasks ({done} complete, {pending} pending):") for index, task_line in enumerate(all_tasks): print(f" [{index}] {task_line}") # --- Simulated session --- print("--- Adding tasks ---") add_task(TASKS_FILE, "Write unit tests for the payment module") add_task(TASKS_FILE, "Review PR #47 from the backend team") add_task(TASKS_FILE, "Update README with new API endpoints") add_task(TASKS_FILE, "Deploy hotfix to staging environment") print("\n--- Current task list ---") display_tasks(TASKS_FILE) print("\n--- Completing tasks 1 and 3 ---") complete_task(TASKS_FILE, 1) complete_task(TASKS_FILE, 3) print("\n--- Updated task list ---") display_tasks(TASKS_FILE) print("\n--- Testing edge cases ---") complete_task(TASKS_FILE, 1) # already done complete_task(TASKS_FILE, 99) # invalid index # Clean up demo files for path in [TASKS_FILE, TASKS_FILE + ".tmp"]: if os.path.exists(path): os.remove(path) print("\n--- Demo complete. All demo files removed. ---")
Added: 'Write unit tests for the payment module'
Added: 'Review PR #47 from the backend team'
Added: 'Update README with new API endpoints'
Added: 'Deploy hotfix to staging environment'
--- Current task list ---
Tasks (0 complete, 4 pending):
[0] [TODO] [2024-06-10 14:23] Write unit tests for the payment module
[1] [TODO] [2024-06-10 14:23] Review PR #47 from the backend team
[2] [TODO] [2024-06-10 14:23] Update README with new API endpoints
[3] [TODO] [2024-06-10 14:23] Deploy hotfix to staging environment
--- Completing tasks 1 and 3 ---
Marked task 1 as complete.
Marked task 3 as complete.
--- Updated task list ---
Tasks (2 complete, 2 pending):
[0] [TODO] [2024-06-10 14:23] Write unit tests for the payment module
[1] [DONE] [2024-06-10 14:23] Review PR #47 from the backend team
[2] [TODO] [2024-06-10 14:23] Update README with new API endpoints
[3] [DONE] [2024-06-10 14:23] Deploy hotfix to staging environment
--- Testing edge cases ---
Task 1 is already marked complete.
No task at index 99. Valid range: 0 to 3
--- Demo complete. All demo files removed. ---
os.replace() is atomic — it is guaranteed to be either the old version or the new version, never a partial write. This is not over-engineering; it is the standard practice for any file that contains data you cannot afford to lose.os.replace() pattern adds crash safety with three lines of additional code. Write to a .tmp file. If that succeeds, call os.replace(). If the process dies before os.replace() runs, the target file is untouched — you still have the old data. If it dies after os.replace() runs, the target has the new data. There is no window where the file is partially written. For any file you care about in production, this is the correct implementation.os.replace() for an atomic swap. This eliminates the risk of half-written corrupt files from process crashes during the write step.| Method | Returns | Memory Profile | Best For | Newlines Handled? |
|---|---|---|---|---|
| file.read() | Single string containing entire file | Entire file in RAM — O(n) where n is file size | Small bounded files, hashing content, template rendering, comparing full file contents | Yes — '\n' characters are included in the string as-is |
| file.readlines() | List of strings, one per line | Entire file in RAM — same cost as read() | When you need random access to lines by index: all_lines[47] | Yes — each string in the list ends with '\n'; call .strip() to remove |
| for line in file | One string per iteration | Constant — OS buffer size regardless of file size | Any file that could grow: logs, CSVs, exports, data pipelines — use this by default | Yes — trailing '\n' included; call .strip() or .rstrip('\n') per line |
| file.readline() | One line as a string | Single line in RAM | Reading headers separately, state machines, mixed-read patterns | Yes — trailing '\n' included; call .strip() to remove |
| file.write(s) | Integer — number of characters written | Only the string you pass — O(1) relative to file size | Writing individual strings with explicit control over content and newlines | No — you must add '\n' explicitly when you want a new line |
| file.writelines(iterable) | None | Only what you pass — iterable consumed lazily | Writing a list or generator of strings; efficient for batch writes | No — you must include '\n' in each string; writelines() adds nothing between items |
🎯 Key Takeaways
- Always use the 'with' statement — it guarantees the file closes on all exit paths including exceptions, early returns, and raised errors. Every bare
open()call in production code is a potential file descriptor leak that accumulates silently until the OS hard limit is hit and everything fails simultaneously. - Mode 'w' destroys existing content the instant you call
open()— no confirmation, no warning, no recovery. Use 'a' for appending and 'w' only when you explicitly and intentionally need a clean file. When in doubt, use 'a' and verify the behavior is correct before switching to 'w'. - Iterating over a file object line-by-line is O(1) in memory regardless of file size — it is the correct default for any file that might grow in production. Only use
read()orreadlines()when the file size is bounded, known, and genuinely small. - writelines() does not add newlines between items — you must include '\n' in each string yourself. Forgetting this merges all your output into one continuous line with no separators, which is obvious in testing but only discovered in production when downstream parsing fails.
⚠ Common Mistakes to Avoid
Interview Questions on This Topic
- QWhat is the difference between opening a file in 'r+' mode and 'w+' mode, and when would you choose one over the other?Mid-levelReveal
- QYou have a 50GB log file and need to find all lines containing the string 'ERROR'. Walk me through how you would write that in Python and explain why your approach is memory-efficient.Mid-levelReveal
- QIf an exception is raised inside a 'with open(...)' block, is the file guaranteed to be closed? How does the context manager protocol work under the hood?JuniorReveal
- QYou are building a data pipeline that processes a 10GB CSV, transforms each row, and writes results to a new file. How do you handle the case where the pipeline crashes halfway through, and you need to resume without reprocessing rows you already wrote?SeniorReveal
Frequently Asked Questions
What is the difference between read(), readline(), and readlines() in Python?
file.read() returns the entire file as a single string — one call, everything in memory. file.readline() returns exactly one line each time it is called, advancing the cursor forward; call it repeatedly to step through the file. file.readlines() returns a list where each element is one line string including its trailing newline character — the entire file loaded into a list at once.
For large files, none of these three is the right default. Iterating directly over the file object — 'for line in file' — reads data in OS-level chunks and yields one decoded line at a time, keeping memory usage flat regardless of file size. Use read() for small bounded files, readline() for state-machine-style reading, readlines() when you need line-index access, and direct iteration for everything else.
Do I need to close a file in Python if I use the 'with' statement?
No — that is the entire point of the 'with' statement. The context manager protocol guarantees that the file's __exit__ method is called when the block ends, which closes the file handle automatically. You do not need an explicit close() call and should not add one, as it would be redundant.
The file handle is closed the moment execution leaves the 'with' block — whether it exits normally, returns early, or raises an exception. Attempting to use the file handle after the 'with' block ends will raise ValueError: I/O operation on closed file, which is the correct and safe behavior.
How do I read a file that might not exist yet without getting an error?
Two approaches, each with a distinct advantage. First, check existence before opening: if os.path.exists(filepath): with open(filepath, 'r') as f: ... This is clear and readable but has a theoretical race condition — the file could be deleted between the check and the open().
Second, use try/except and catch FileNotFoundError specifically:
``python try: with open(filepath, 'r') as f: content = ``f.read() except FileNotFoundError: content = '' # or return a default value
This is the more Pythonic approach (EAFP — Easier to Ask Forgiveness than Permission) and eliminates the race condition. It also makes the 'file does not exist' case explicitly handled rather than silently bypassed. For production code that reads optional configuration files or state files, the try/except pattern is preferred.
Developer and founder of TheCodeForge. I built this site because I was tired of tutorials that explain what to type without explaining why it works. Every article here is written to make concepts actually click.