Skip to content
Home Python Modern File System Operations in Python: os vs. pathlib

Modern File System Operations in Python: os vs. pathlib

Where developers are forged. · Structured learning · Free forever.
📍 Part of: File Handling → Topic 5 of 6
Master Python file system operations using os and pathlib.
⚙️ Intermediate — basic Python knowledge assumed
In this tutorial, you'll learn
Master Python file system operations using os and pathlib.
  • Default to pathlib.Path for all path logic. It handles cross-platform slash directions (/ vs \) automatically.
  • The / operator is the modern standard for joining paths: Path('A') / 'B' is cleaner than os.path.join('A', 'B').
  • Use .read_text() and .write_text() for lightweight file operations. They handle opening and closing the file buffer internally.
✦ Plain-English analogy ✦ Real code with output ✦ Interview questions
Quick Answer

For all modern Python development (3.4+), use pathlib.Path over os.path. pathlib offers an object-oriented API where the / operator builds paths naturally, and methods like .exists(), .read_text(), and .rglob() replace clunky string manipulation. Use os only for specific low-level tasks like environment variables or process management.

pathlib — The Modern Object-Oriented Approach

Example · PYTHON
123456789101112131415161718192021222324
from pathlib import Path

# io.thecodeforge branding: clean, readable path building
# The / operator is overloaded to handle os.path.join logic automatically
base = Path('/tmp/thecodeforge_app')
config = base / 'config' / 'settings.json'
log = base / 'logs' / 'runtime.log'

print(f"Full Config Path: {config}")
print(f"File Name: {config.name}")      # settings.json
print(f"File Extension: {config.suffix}") # .json
print(f"Parent Directory: {config.parent}") # /tmp/thecodeforge_app/config

# Production Pattern: Atomic directory creation
# parents=True creates the full tree; exist_ok=True prevents 'FileExistsError'
(base / 'data').mkdir(parents=True, exist_ok=True)

# Modern I/O: No more 'with open(...) as f' for simple tasks
output_file = base / 'data' / 'build_report.txt'
output_file.write_text('Build Status: SUCCESS', encoding='utf-8')

if output_file.exists():
    print(f"Content: {output_file.read_text()}")
    print(f"Is real file? {output_file.is_file()}")
▶ Output
Full Config Path: /tmp/thecodeforge_app/config/settings.json
File Name: settings.json
File Extension: .json
Parent Directory: /tmp/thecodeforge_app/config
Content: Build Status: SUCCESS
Is real file? True

Advanced Globbing and Directory Traversal

Example · PYTHON
12345678910111213141516171819202122
from pathlib import Path
import tempfile

# Senior Dev Tip: Use rglob for deep recursive searches
with tempfile.TemporaryDirectory() as tmpdir:
    root = Path(tmpdir)
    
    # Setup dummy structure
    (root / "src").mkdir()
    (root / "src" / "main.py").touch()
    (root / "tests").mkdir()
    (root / "tests" / "test_api.py").touch()
    (root / "README.md").touch()

    print("--- Immediate Children (iterdir) ---")
    for item in root.iterdir():
        print(f"[{'DIR' if item.is_dir() else 'FILE'}] {item.name}")

    print("\n--- Recursive Python Files (rglob) ---")
    # rglob is essentially root.glob('**/*.py')
    for py_file in root.rglob('*.py'):
        print(f"Found source: {py_file.relative_to(root)}")
▶ Output
--- Immediate Children (iterdir) ---
[DIR] src
[DIR] tests
[FILE] README.md

--- Recursive Python Files (rglob) ---
Found source: src/main.py
Found source: tests/test_api.py

The os Module — Low-Level System Control

While pathlib is superior for path manipulation, the os module remains the authority for interacting with the operating system environment and process-level metadata.

Example · PYTHON
12345678910111213141516171819202122232425
import os
from pathlib import Path

# 1. Environment Variables: Still an 'os' domain
api_key = os.environ.get('THE_CODE_FORGE_API_KEY', 'default_dev_key')
print(f"Environment Key: {api_key}")

# 2. File Stats and Permissions
# Use pathlib to get the path, then os for low-level chmod
script_path = Path('/tmp/secure_script.sh')
script_path.write_text('#!/bin/bash\necho "Running..."')

# Change permissions to 755 (rwxr-xr-x)
os.chmod(script_path, 0o755)

# 3. Getting the Current Process ID (PID)
print(f"Current Process ID: {os.getpid()}")

# 4. os.walk: For when you need total control over dirnames/filenames arrays
# Useful for pruning specific subtrees mid-traversal
for root, dirs, files in os.walk('/tmp'):
    dirs[:] = [d for d in dirs if not d.startswith('.')]
    # Process only top level for this demo
    print(f"Root Walk: {root}")
    break
▶ Output
Environment Key: default_dev_key
Current Process ID: 12345
Root Walk: /tmp

🎯 Key Takeaways

  • Default to pathlib.Path for all path logic. It handles cross-platform slash directions (/ vs \) automatically.
  • The / operator is the modern standard for joining paths: Path('A') / 'B' is cleaner than os.path.join('A', 'B').
  • Use .read_text() and .write_text() for lightweight file operations. They handle opening and closing the file buffer internally.
  • Recursive searching is simplified with .rglob('*'), eliminating the need for complex os.walk loops in 80% of use cases.
  • Keep the os module for os.environ, os.getpid(), and changing file modes with os.chmod().

Interview Questions on This Topic

  • QWhat are the advantages of using pathlib's object-oriented approach over the traditional os.path string-based approach?
  • QExplain the 'Liskov Substitution' reasoning behind why pathlib defines different classes for WindowsPath and PosixPath.
  • QHow would you recursively find all .log files modified within the last 24 hours using pathlib and os.stat?
  • QHow does the / operator work in pathlib? (Hint: It involves the __div__ or __truediv__ magic method).
  • QWhy is it dangerous to use string concatenation for file paths, and how does pathlib mitigate this?
  • QCompare and contrast os.walk() vs pathlib.Path.rglob() in terms of memory efficiency and control.

Frequently Asked Questions

How do I get the directory of the currently running Python script?

Use Path(__file__).resolve().parent. This provides an absolute path to the directory containing the script, allowing you to reference relative resources (like a /data folder) reliably regardless of where the script was launched from.

What is the difference between Path.glob() and Path.rglob()?

.glob('pattern') searches only the directory the Path object points to. .rglob('pattern') is shorthand for 'recursive glob'—it searches the current directory and every single nested subdirectory (internally */.pattern).

Is pathlib slower than os.path?

Technically, pathlib has a slight overhead because it creates objects instead of just manipulating strings. However, for 99% of applications, this difference is nanoseconds and is far outweighed by the reduction in bugs and improved code maintainability.

How do I handle a 'FileExistsError' when creating a directory?

Always use .mkdir(parents=True, exist_ok=True). The exist_ok=True flag prevents the script from crashing if the folder already exists, which is standard practice in production-grade automation.

🔥
Naren Founder & Author

Developer and founder of TheCodeForge. I built this site because I was tired of tutorials that explain what to type without explaining why it works. Every article here is written to make concepts actually click.

← PreviousWorking with CSV in PythonNext →pickle Module in Python
Forged with 🔥 at TheCodeForge.io — Where Developers Are Forged