Python File Handling Explained — Read, Write, Append and Real-World Patterns
Every meaningful program eventually needs to talk to the outside world — not just the screen, but actual storage. Log files, configuration files, CSV exports, user-uploaded text, cached API responses — all of it lives on disk as files. If your Python script can't read or write files, it's essentially a calculator that resets every time you switch it off. File handling is the bridge between your running program and data that survives after the program exits.
The problem Python file handling solves is deceptively simple: how do you safely open a resource, use it, and guarantee it gets released — even when something goes wrong? Languages that don't enforce this leave files locked open, causing data corruption, permission errors, and crashes that only appear under load. Python's answer is the context manager (the with statement), which makes safe file handling the path of least resistance rather than an afterthought.
By the end of this article you'll know how to open files for reading, writing, and appending; understand exactly what happens behind the scenes when you do; handle errors like a professional; work with both text and binary files; and recognise the two or three patterns that cover 95% of real-world file work. You'll also know the mistakes that trip up even experienced developers — so you can skip straight past them.
Opening and Reading Files — and Why the 'with' Statement Exists
Before you can do anything with a file, Python needs a file object — a live connection to that file on disk. You create one with the built-in open() function. The two most important arguments are the file path and the mode: 'r' for read, 'w' for write, 'a' for append, and 'b' tacked on for binary (e.g., 'rb').
Here's the thing most tutorials gloss over: open() acquires an operating-system resource. The OS gives your process a file descriptor — a numbered slot in a limited table. If you open files without closing them, you eventually exhaust that table and get a Too many open files error. Worse, unflushed writes may never reach disk.
The with statement solves this by acting as a guaranteed cleanup mechanism. It calls file.close() the instant the indented block exits — whether that exit is normal, via return, or via a crashing exception. You should use with open(...) every single time, with no exceptions. The only reason the two-line open() / close() pattern still exists in docs is historical; treat it as legacy.
Reading has three flavours: read() loads the entire file into one string (fine for small files, dangerous for large ones), readline() fetches one line at a time, and readlines() returns a list of all lines. For most real work, iterating the file object directly is the cleanest and most memory-efficient approach — Python streams one line at a time without loading everything.
# Scenario: parse a server log file and count how many lines are ERROR level log_file_path = "server.log" # Create a sample log file to work with so this script is self-contained with open(log_file_path, "w", encoding="utf-8") as log_file: log_file.write("INFO 2024-06-01 08:00:01 Server started\n") log_file.write("INFO 2024-06-01 08:01:15 Request received from 192.168.1.10\n") log_file.write("ERROR 2024-06-01 08:01:16 Database connection timeout\n") log_file.write("INFO 2024-06-01 08:02:00 Retrying connection\n") log_file.write("ERROR 2024-06-01 08:02:05 Max retries exceeded\n") log_file.write("INFO 2024-06-01 08:02:06 Falling back to cache\n") error_count = 0 error_lines = [] # The 'with' block guarantees the file is closed when we're done, # even if an exception is raised inside the block. with open(log_file_path, "r", encoding="utf-8") as log_file: # Iterating the file object directly reads ONE line at a time — # this works correctly even for a 10 GB log file because Python # never loads the whole thing into memory at once. for line in log_file: stripped_line = line.strip() # remove trailing newline characters if stripped_line.startswith("ERROR"): error_count += 1 error_lines.append(stripped_line) print(f"Total ERROR lines found: {error_count}") print("\nError details:") for error in error_lines: print(f" -> {error}")
Error details:
-> ERROR 2024-06-01 08:01:16 Database connection timeout
-> ERROR 2024-06-01 08:02:05 Max retries exceeded
Writing and Appending — Understanding Why 'w' Mode Is a Trap
Writing to a file sounds simple — and it is, once you understand the critical difference between 'w' (write) and 'a' (append) mode, because confusing them is one of the most common ways to destroy your own data.
Opening a file in 'w' mode does two things: it creates the file if it doesn't exist, and it truncates the file to zero bytes if it does exist — immediately, before you write a single character. That means opening an existing file with 'w' and then crashing before writing anything leaves you with an empty file. The original content is gone.
Append mode ('a') is safer for ongoing data: the file is created if absent, but if it exists, the write cursor starts at the very end. Existing content is untouched. This is exactly what you want for log files, audit trails, or any data you're accumulating over time.
For writing structured data — think generating a report or saving configuration — 'w' is correct because you intentionally want a fresh file each run. For recording events as they happen over time, 'a' is correct. Choosing wrong corrupts data silently, which is why understanding the WHY matters more than memorising the letters.
write() takes a single string. Unlike print(), it does not add a newline automatically — you must include ' ' yourself. writelines() accepts a list of strings and also skips automatic newlines, so each string in your list must already end with ' ' if you want line breaks.
import datetime activity_log_path = "user_activity.log" def log_activity(username: str, action: str) -> None: """Append a timestamped activity record to the log file. We use 'a' mode so that every call adds to the existing log rather than overwriting it. This function is safe to call from multiple places in a long-running application. """ timestamp = datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S") # Build the full log entry as a single string with its own newline log_entry = f"[{timestamp}] USER={username} ACTION={action}\n" with open(activity_log_path, "a", encoding="utf-8") as activity_log: activity_log.write(log_entry) # 'a' mode: cursor is always at end of file def generate_daily_report(report_date: str, summary_lines: list[str]) -> None: """Write a fresh daily report, replacing any previous report for the same name. We intentionally use 'w' mode here because the report is always regenerated from scratch — old content should not carry over. """ report_path = f"report_{report_date}.txt" header = f"=== Daily Report for {report_date} ===\n\n" with open(report_path, "w", encoding="utf-8") as report_file: report_file.write(header) # writelines() writes each string as-is — no automatic newlines added # so we ensure each summary line already ends with '\n' report_file.writelines(line + "\n" for line in summary_lines) print(f"Report saved to {report_path}") # Simulate three user events arriving over time log_activity("alice", "LOGIN") log_activity("bob", "VIEWED_DASHBOARD") log_activity("alice", "EXPORTED_CSV") # Read back the log to confirm all three entries were preserved print("--- Current activity log ---") with open(activity_log_path, "r", encoding="utf-8") as activity_log: print(activity_log.read()) # Generate a report (uses 'w' mode — intentionally fresh each time) generate_daily_report( report_date="2024-06-01", summary_lines=[ "Total logins: 1", "Total page views: 1", "Total exports: 1" ] )
[2024-06-01 09:15:01] USER=alice ACTION=LOGIN
[2024-06-01 09:15:02] USER=bob ACTION=VIEWED_DASHBOARD
[2024-06-01 09:15:03] USER=alice ACTION=EXPORTED_CSV
Report saved to report_2024-06-01.txt
Error Handling and File Existence — Writing Code That Doesn't Embarrass You in Production
A file operation in production will eventually fail. The path doesn't exist, the disk is full, the process doesn't have permission, the file is locked by another process. Ignoring this is fine in a throwaway script and a fireable offence in production code.
Python raises specific exceptions for file errors. FileNotFoundError fires when you try to read a file that doesn't exist. PermissionError fires when the OS blocks access. IsADirectoryError fires when you accidentally pass a directory path where a file path was expected. All three are subclasses of OSError, so catching OSError covers all of them — but catching the specific type gives you better error messages.
A common anti-pattern is checking os.path.exists() before opening a file. This looks safe but it's a race condition: between your check and your open(), another process can delete or create that file. The correct pattern is EAFP (Easier to Ask Forgiveness than Permission) — just try to open it and handle the exception. Python was designed around this idiom.
For reading a file that might not exist yet (like a config file on first run), a clean pattern is to catch FileNotFoundError and return a sensible default rather than crashing. For writing, you often want the opposite: verify the destination directory exists and create it if not, using pathlib.Path.mkdir(parents=True, exist_ok=True) before writing.
import json from pathlib import Path DEFAULT_CONFIG = { "theme": "dark", "language": "en", "auto_save_interval_seconds": 30 } CONFIG_PATH = Path("app_data") / "config.json" def load_config() -> dict: """Load config from disk. Returns defaults if the file doesn't exist yet. Uses EAFP style: try the operation, handle the specific failure. This avoids the race condition that os.path.exists() creates. """ try: with open(CONFIG_PATH, "r", encoding="utf-8") as config_file: loaded_config = json.load(config_file) print(f"Config loaded from {CONFIG_PATH}") return loaded_config except FileNotFoundError: # This is expected on first run — not an error, just a first boot print(f"No config file found at {CONFIG_PATH}. Using defaults.") return DEFAULT_CONFIG.copy() except json.JSONDecodeError as parse_error: # The file exists but is malformed — this IS an error worth reporting print(f"WARNING: Config file is corrupted ({parse_error}). Using defaults.") return DEFAULT_CONFIG.copy() except PermissionError: # We can't read the file — fail loudly, don't silently use defaults raise RuntimeError( f"Cannot read config at {CONFIG_PATH}. Check file permissions." ) def save_config(config: dict) -> None: """Save config to disk, creating the directory structure if it doesn't exist.""" # mkdir with parents=True creates 'app_data/' if it's missing # exist_ok=True means no error if the directory is already there CONFIG_PATH.parent.mkdir(parents=True, exist_ok=True) with open(CONFIG_PATH, "w", encoding="utf-8") as config_file: # indent=2 makes the JSON human-readable — important for config files json.dump(config, config_file, indent=2) print(f"Config saved to {CONFIG_PATH}") # First load — file doesn't exist yet current_config = load_config() print(f"Theme: {current_config['theme']}") # User changes a setting current_config["theme"] = "light" save_config(current_config) # Second load — now reads from disk current_config = load_config() print(f"Theme after reload: {current_config['theme']}")
Theme: dark
Config saved to app_data/config.json
Config loaded from app_data/config.json
Theme after reload: light
Binary Files and pathlib — Handling Images, PDFs and Modern Path Management
Not all files are text. Images, PDFs, audio, compiled code, and serialised data are binary — they contain bytes that aren't valid UTF-8 text. Open them in text mode and you'll get a UnicodeDecodeError at best, or silently corrupted data at worst. Binary mode ('rb', 'wb') tells Python to skip all encoding/decoding and work with raw bytes.
A very common real-world task is copying or processing binary files — resizing images, attaching files to emails, or storing uploaded files from a web form. The pattern is identical to text mode but you read bytes objects instead of strings.
For path manipulation, the modern way in Python 3.4+ is pathlib.Path. Forget string concatenation with os.path.join() — pathlib lets you build paths with / operator, check existence with .exists(), get the file extension with .suffix, list directory contents with .iterdir(), and open files directly via path.open(). It's more readable, cross-platform by default, and object-oriented in a way that makes your intent clear.
When you're processing a directory full of files — a common automation task — pathlib with a generator expression is significantly cleaner than os.listdir() combined with string filtering. The pattern Path('data').glob('*.csv') gives you an iterator of all CSV files in the directory, ready to open.
from pathlib import Path import shutil # Scenario: scan a 'downloads' folder and copy image files into an 'images' archive. # This pattern works identically on Windows, macOS and Linux because pathlib # handles the slash vs backslash difference for you automatically. DOWNLOADS_DIR = Path("sample_downloads") IMAGES_ARCHIVE_DIR = Path("organised") / "images" # Create sample directory with mixed file types so the script is self-contained DOWNLOADS_DIR.mkdir(exist_ok=True) (DOWNLOADS_DIR / "holiday_photo.jpg").write_bytes(b"\xff\xd8\xff" + b"\x00" * 10) (DOWNLOADS_DIR / "budget.xlsx").write_bytes(b"PK\x03\x04" + b"\x00" * 10) (DOWNLOADS_DIR / "profile_pic.png").write_bytes(b"\x89PNG" + b"\x00" * 10) (DOWNLOADS_DIR / "notes.txt").write_text("Some text notes", encoding="utf-8") (DOWNLOADS_DIR / "logo.jpg").write_bytes(b"\xff\xd8\xff" + b"\x00" * 10) IMAGES_ARCHIVE_DIR.mkdir(parents=True, exist_ok=True) image_extensions = {".jpg", ".jpeg", ".png", ".gif", ".webp"} copy_count = 0 # Path.iterdir() yields Path objects for every item in the directory — # no string path manipulation needed, no os.path.join() required. for file_path in DOWNLOADS_DIR.iterdir(): # .is_file() filters out subdirectories # .suffix gives the file extension including the dot, e.g. '.jpg' if file_path.is_file() and file_path.suffix.lower() in image_extensions: destination = IMAGES_ARCHIVE_DIR / file_path.name # shutil.copy2 copies file content AND metadata (timestamps etc.) shutil.copy2(file_path, destination) copy_count += 1 print(f" Copied: {file_path.name} -> {destination}") print(f"\nDone. {copy_count} image file(s) archived to {IMAGES_ARCHIVE_DIR}") # Demonstrate reading binary content back for image_path in IMAGES_ARCHIVE_DIR.glob("*.jpg"): # 'rb' mode returns raw bytes — no encoding involved with image_path.open("rb") as image_file: first_bytes = image_file.read(3) # JPEG magic bytes are FF D8 FF print(f" {image_path.name} magic bytes: {first_bytes.hex().upper()}")
Copied: profile_pic.png -> organised/images/profile_pic.png
Copied: logo.jpg -> organised/images/logo.jpg
Done. 3 image file(s) archived to organised/images
holiday_photo.jpg magic bytes: FFD8FF
logo.jpg magic bytes: FFD8FF
| Aspect | Text Mode ('r', 'w', 'a') | Binary Mode ('rb', 'wb', 'ab') |
|---|---|---|
| Data type returned | str | bytes |
| Encoding applied | Yes — uses specified or platform encoding | No — raw bytes only |
| Newline translation | Yes — '\r\n' on Windows becomes '\n' | No — bytes are untouched |
| Use case | Logs, config, CSV, JSON, source code | Images, PDFs, audio, executables, pickled data |
| Error on bad bytes | UnicodeDecodeError if file isn't valid text | Never — all byte sequences are valid |
| File size consideration | Slightly smaller in memory (str interning) | Exact byte-for-byte copy in memory |
🎯 Key Takeaways
- Always use 'with open(...) as f:' — it's not just style, it's a resource safety guarantee that prevents file descriptor leaks and ensures buffers are flushed to disk.
- 'w' mode truncates the file to zero bytes the instant open() is called, before any write() happens — if you need existing content to survive, you want 'a' mode instead.
- Iterate the file object directly ('for line in file:') rather than calling read() — this keeps memory usage constant for files of any size, which is the difference between a script that scales and one that doesn't.
- Use pathlib.Path instead of os.path string operations — it's cross-platform, reads like English, and integrates cleanly with open(), glob(), mkdir(), and the rest of the standard library.
⚠ Common Mistakes to Avoid
- ✕Mistake 1: Opening a file without 'with' and forgetting to call close() — Symptom: intermittent 'Too many open files' OSError in long-running apps, or writes that never appear on disk because the buffer was never flushed — Fix: always use 'with open(...) as f:' with no exceptions. If you're in a class and must store the file object, implement __enter__ and __exit__ properly, or use contextlib.closing().
- ✕Mistake 2: Using 'w' mode on a file you meant to append to — Symptom: the file exists after your script runs but contains only the most recent run's data; all previous data is silently gone — Fix: ask yourself 'should old content survive this write?' If yes, use 'a'. If no (you're regenerating the file intentionally), use 'w'. Never default to 'w' without thinking about it.
- ✕Mistake 3: Reading an entire large file into memory with read() — Symptom: your script works fine on a 10 KB test file, then crashes with MemoryError or causes the server to swap when pointed at a 2 GB production log — Fix: iterate the file object line by line ('for line in file_object:') for text files, or use 'file.read(chunk_size)' in a loop for binary files. This keeps memory usage flat regardless of file size.
Interview Questions on This Topic
- QWhat is the difference between opening a file in 'w' and 'a' mode, and what happens to an existing file's content the moment you call open() with 'w'?
- QWhy should you use a 'with' statement when working with files in Python, and what specifically happens under the hood when the 'with' block exits — even if an exception is raised?
- QA production script reads a config file that sometimes doesn't exist on first boot. A junior dev wrote 'if os.path.exists(path): open(path)' to handle this. What is wrong with that approach, and how would you rewrite it correctly?
Frequently Asked Questions
What is the difference between read(), readline() and readlines() in Python?
read() loads the entire file into a single string — convenient for small files but dangerous for large ones. readline() fetches exactly one line and moves the cursor forward, useful when you need manual control. readlines() returns a list of all lines as strings. In practice, iterating the file object directly ('for line in file:') is better than all three for large files because it reads one line at a time without loading everything into memory.
How do I check if a file exists before opening it in Python?
The Pythonic way is not to check first — just try to open it and catch FileNotFoundError. This avoids a race condition where the file could be deleted between your check and your open(). Use 'try: open(path) except FileNotFoundError: handle_it()'. If you only need to check without opening, Path('yourfile.txt').exists() from pathlib is the cleanest syntax.
Why do I get a UnicodeDecodeError when opening a file that looks like a text file?
This happens when the file's actual encoding doesn't match what Python is using to decode it. Python defaults to the platform encoding (often Windows-1252 on Windows, UTF-8 elsewhere). The fix is to specify the encoding explicitly: open('file.txt', 'r', encoding='utf-8'). If you don't know the file's encoding, install the 'chardet' library and use it to detect the encoding before opening. For files that might contain arbitrary bytes, open in binary mode ('rb') and handle decoding yourself.
Written and reviewed by senior developers with real-world experience across enterprise, startup and open-source projects. Every article on TheCodeForge is written to be clear, accurate and genuinely useful — not just SEO filler.