Home Python Python os and pathlib Modules — File Paths, Directory Operations and Real-World Patterns

Python os and pathlib Modules — File Paths, Directory Operations and Real-World Patterns

In Plain English 🔥
Think of your computer's file system like a giant office building. Every file is a document sitting in a specific room, on a specific floor, in a specific wing. The 'address' of that document is its path — like 'Floor 3 / Wing B / Room 12 / report.pdf'. Python's os and pathlib modules are your building directory and key-card system combined: they let you read those addresses, walk the hallways, create new rooms, and move documents around — without you needing to memorise every corridor yourself.
⚡ Quick Answer
Think of your computer's file system like a giant office building. Every file is a document sitting in a specific room, on a specific floor, in a specific wing. The 'address' of that document is its path — like 'Floor 3 / Wing B / Room 12 / report.pdf'. Python's os and pathlib modules are your building directory and key-card system combined: they let you read those addresses, walk the hallways, create new rooms, and move documents around — without you needing to memorise every corridor yourself.

Every real Python application eventually touches the file system. Whether you're building a data pipeline that reads CSVs from a folder, a web scraper that saves results to disk, a CLI tool that organises photos, or a test suite that creates temporary directories — you need to navigate paths, check if files exist, create folders, and do it all in a way that doesn't break when a colleague runs your code on Windows instead of Mac. This is not optional knowledge; it's the plumbing behind almost every non-trivial Python project.

os Module — The Veteran Swiss Army Knife for File System Operations

The os module has been part of Python since version 1. Its job is to give you a portable interface to whatever operating system your code runs on. 'Portable' is the key word — os.path.join('reports', 'july', 'sales.csv') produces reports/july/sales.csv on Linux and Mac, but reports\july\sales.csv on Windows. Without this, hard-coded slashes are a silent bug waiting to ambush you the moment someone else runs your script.

The module splits into two concerns. First, os itself handles process-level stuff: environment variables, the current working directory, creating and removing directories, listing folder contents. Second, os.path handles the string manipulation of path addresses — joining segments, checking existence, splitting filenames from their extensions.

You'll reach for os most often in scripts that need to inspect or modify the environment they're running in — reading a config path from an environment variable, making sure a required output directory exists before writing to it, or recursively walking a directory tree. It's procedural, explicit, and it works everywhere.

os_file_operations.py · PYTHON
123456789101112131415161718192021222324252627282930313233343536373839404142
import os

# --- 1. Current working directory ---
current_dir = os.getcwd()
print(f"Script is running from: {current_dir}")

# --- 2. Build a cross-platform path safely ---
# NEVER hard-code slashes. os.path.join handles the separator for you.
reports_path = os.path.join(current_dir, 'data', 'reports', 'july_sales.csv')
print(f"Target file path: {reports_path}")

# --- 3. Check existence before acting ---
# Trying to open a non-existent file raises FileNotFoundError.
# Always guard against this in production code.
if os.path.exists(reports_path):
    print("File exists — safe to open.")
else:
    print("File not found — creating parent directories now.")
    # exist_ok=True means no error if the folder already exists
    os.makedirs(os.path.dirname(reports_path), exist_ok=True)

# --- 4. Split a path into useful parts ---
file_directory = os.path.dirname(reports_path)   # everything except the filename
file_name      = os.path.basename(reports_path)  # just 'july_sales.csv'
name_only, ext = os.path.splitext(file_name)      # ('july_sales', '.csv')

print(f"Directory : {file_directory}")
print(f"Filename  : {file_name}")
print(f"Name only : {name_only}")
print(f"Extension : {ext}")

# --- 5. List directory contents (non-recursive) ---
script_dir = os.path.dirname(os.path.abspath(__file__))
print(f"\nFiles and folders in script directory:")
for entry in os.listdir(script_dir):
    full_entry_path = os.path.join(script_dir, entry)
    kind = 'DIR ' if os.path.isdir(full_entry_path) else 'FILE'
    print(f"  [{kind}] {entry}")

# --- 6. Read an environment variable with a safe fallback ---
log_level = os.environ.get('LOG_LEVEL', 'INFO')  # returns 'INFO' if not set
print(f"\nLog level from environment: {log_level}")
▶ Output
Script is running from: /home/user/projects/myapp
Target file path: /home/user/projects/myapp/data/reports/july_sales.csv
File not found — creating parent directories now.
Directory : /home/user/projects/myapp/data/reports
Filename : july_sales.csv
Name only : july_sales
Extension : .csv

Files and folders in script directory:
[FILE] os_file_operations.py
[DIR ] data
[DIR ] tests

Log level from environment: INFO
⚠️
Watch Out: __file__ vs os.getcwd()os.getcwd() returns wherever the shell is when you run the script — not where the script lives. If you run 'python scripts/process.py' from your project root, getcwd() gives you the project root, but __file__ gives you the script's actual location. For paths relative to your script file, always use os.path.dirname(os.path.abspath(__file__)) as your anchor.

pathlib Module — Object-Oriented Paths That Actually Make Sense

Introduced in Python 3.4, pathlib was born from a simple frustration: path manipulation using os.path is a collection of disconnected functions that you have to import and chain together in awkward ways. pathlib flips this around — a path becomes an object, and every operation is a method or property on that object. The result is code that reads like English.

The core class you'll use is Path. On Windows it automatically becomes a WindowsPath; on Unix it becomes a PosixPath. You don't pick — Python does. This means your code is genuinely cross-platform without you doing anything extra.

The slash operator (/) is overloaded on Path objects to join path segments. That means Path('data') / 'reports' / 'july_sales.csv' is valid Python and produces the correct path for the current OS. This single feature makes pathlib code dramatically more readable than the equivalent os.path.join chains. For any new code you write today, pathlib is the right default choice. Use os when you need environment variables, process utilities, or you're maintaining legacy code.

pathlib_operations.py · PYTHON
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950
from pathlib import Path

# --- 1. Create a Path object — the anchor for everything else ---
# Path(__file__) gives us this script's path as an object, not a raw string
script_path   = Path(__file__).resolve()  # resolve() makes it absolute, no '..' segments
project_root  = script_path.parent        # go up one level to the containing folder

print(f"Script   : {script_path}")
print(f"Project  : {project_root}")

# --- 2. Build paths with / operator — readable and safe ---
data_dir      = project_root / 'data'
output_file   = data_dir / 'processed' / 'summary.csv'

print(f"Output file will go to: {output_file}")

# --- 3. Create directories — mkdir with parents + exist_ok ---
# parents=True creates every missing folder in the chain
# exist_ok=True suppresses the error if the folder already exists
output_file.parent.mkdir(parents=True, exist_ok=True)
print(f"Ensured directory exists: {output_file.parent}")

# --- 4. Inspect path properties — no functions, just attributes ---
print(f"\nPath anatomy for: {output_file}")
print(f"  .name     : {output_file.name}")      # 'summary.csv'
print(f"  .stem     : {output_file.stem}")      # 'summary'
print(f"  .suffix   : {output_file.suffix}")    # '.csv'
print(f"  .parent   : {output_file.parent}")    # parent directory
print(f"  .parts    : {output_file.parts}")     # tuple of every segment

# --- 5. Write and read files directly from the Path object ---
output_file.write_text("date,revenue\n2024-07-01,15200\n2024-07-02,18400\n")
content = output_file.read_text()
print(f"\nFile content written and read back:\n{content}")

# --- 6. Glob — find files matching a pattern recursively ---
print("All .csv files anywhere under data/:")
for csv_file in data_dir.rglob('*.csv'):  # rglob = recursive glob
    # relative_to() makes the output cleaner — no giant absolute paths
    print(f"  {csv_file.relative_to(project_root)}")

# --- 7. Check existence and file type ---
print(f"\nDoes output file exist? {output_file.exists()}")
print(f"Is it a file?           {output_file.is_file()}")
print(f"Is it a directory?      {output_file.is_dir()}")

# --- 8. Rename / move a file ---
archive_file = output_file.with_name('summary_archived.csv')  # same dir, new name
output_file.rename(archive_file)
print(f"\nFile renamed to: {archive_file.name}")
▶ Output
Script : /home/user/projects/myapp/pathlib_operations.py
Project : /home/user/projects/myapp
Output file will go to: /home/user/projects/myapp/data/processed/summary.csv
Ensured directory exists: /home/user/projects/myapp/data/processed

Path anatomy for: /home/user/projects/myapp/data/processed/summary.csv
.name : summary.csv
.stem : summary
.suffix : .csv
.parent : /home/user/projects/myapp/data/processed
.parts : ('/', 'home', 'user', 'projects', 'myapp', 'data', 'processed', 'summary.csv')

File content written and read back:
date,revenue
2024-07-01,15200
2024-07-02,18400

All .csv files anywhere under data/:
data/processed/summary.csv

Does output file exist? True
Is it a file? True
Is it a directory? False

File renamed to: summary_archived.csv
⚠️
Pro Tip: with_suffix() for Dynamic File NamingPath objects have a with_suffix() method that's incredibly handy for format conversion workflows. If you have input_path = Path('report.csv'), you can get the JSON version's path with input_path.with_suffix('.json') — it returns a new Path with the extension swapped. No string slicing, no regex. Combine it with with_name() for full filename replacements and your file-processing pipelines become much cleaner.

Real-World Pattern — Building a Cross-Platform File Organiser

Theory only sticks when you see it solve an actual problem. Here's a pattern you'll encounter constantly: you have an input folder with mixed files, and you need to sort them into subfolders by type, create a manifest of what was moved, and handle edge cases cleanly. This is the kind of task that separates someone who's read the docs from someone who's used the tools in production.

This example uses pathlib as the primary tool (for its readability) and dips into os for the environment variable — which is the natural split. Notice how Path objects flow through the entire function without a single string concatenation. Notice how exist_ok=True means you can run the script multiple times without it crashing on the second run. And notice how iterdir() gives you proper Path objects back, so you can call .suffix and .rename directly without any conversion.

This pattern is also the foundation for more complex tasks: swap iterdir() for rglob('*') to go recursive, add a dry_run flag that prints moves without executing them, or plug in a logging call instead of print. Real codebases are just these small patterns stacked on top of each other.

file_organiser.py · PYTHON
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091929394959697
import os
from pathlib import Path
from datetime import datetime

# --- Configuration via environment variable with sensible default ---
# In production you'd set INBOX_DIR in your .env or CI environment
inbox_dir   = Path(os.environ.get('INBOX_DIR', './inbox'))
output_dir  = Path(os.environ.get('OUTPUT_DIR', './organised'))
manifest    = []  # we'll log every move here

# Map file extensions to human-friendly category folder names
EXTENSION_MAP = {
    '.jpg':  'images',
    '.jpeg': 'images',
    '.png':  'images',
    '.gif':  'images',
    '.mp4':  'videos',
    '.mov':  'videos',
    '.pdf':  'documents',
    '.docx': 'documents',
    '.txt':  'documents',
    '.csv':  'data',
    '.json': 'data',
    '.xlsx': 'data',
}

def organise_inbox(source_dir: Path, dest_dir: Path) -> list[dict]:
    """
    Move every file in source_dir into a category subfolder under dest_dir.
    Returns a list of move records for logging or auditing.
    """
    move_log = []

    if not source_dir.exists():
        print(f"Inbox directory not found: {source_dir}")
        return move_log

    # iterdir() yields every item in the directory as a Path object
    for item in source_dir.iterdir():
        if item.is_dir():
            # Skip subdirectories — only handle flat files in this version
            continue

        # .suffix returns the extension including the dot, e.g. '.pdf'
        # .lower() handles cases like '.JPG' vs '.jpg'
        extension    = item.suffix.lower()
        category     = EXTENSION_MAP.get(extension, 'misc')  # unknown types go to 'misc'
        category_dir = dest_dir / category

        # Create the category folder if it doesn't exist yet
        category_dir.mkdir(parents=True, exist_ok=True)

        destination = category_dir / item.name

        # Handle name collisions — append a timestamp rather than silently overwriting
        if destination.exists():
            timestamp   = datetime.now().strftime('%Y%m%d_%H%M%S')
            new_name    = f"{item.stem}_{timestamp}{item.suffix}"
            destination = category_dir / new_name
            print(f"  Collision detected — renaming to: {new_name}")

        # .rename() moves the file; on some OS/filesystem combos use .replace() instead
        item.rename(destination)

        move_log.append({
            'original' : str(item),
            'moved_to' : str(destination),
            'category' : category,
        })
        print(f"  Moved [{category:10s}] {item.name} -> {destination.relative_to(dest_dir)}")

    return move_log


def write_manifest(move_log: list[dict], dest_dir: Path) -> None:
    """Write a human-readable manifest of all file moves."""
    manifest_path = dest_dir / 'manifest.txt'
    lines = [f"File Organisation Manifest — {datetime.now().isoformat()}\n"]
    lines += [f"{r['category']:10s} | {r['original']} -> {r['moved_to']}" for r in move_log]
    manifest_path.write_text('\n'.join(lines))
    print(f"\nManifest written to: {manifest_path}")


# --- Entry point ---
if __name__ == '__main__':
    # For demo purposes, create some fake files in the inbox
    inbox_dir.mkdir(parents=True, exist_ok=True)
    for fake_file in ['photo.jpg', 'report.pdf', 'data_export.csv', 'notes.txt', 'video.mp4']:
        (inbox_dir / fake_file).write_text(f"Demo content for {fake_file}")

    print(f"Organising files from: {inbox_dir}")
    print(f"Destination root    : {output_dir}\n")

    results = organise_inbox(inbox_dir, output_dir)
    write_manifest(results, output_dir)

    print(f"\nDone. {len(results)} file(s) organised.")
▶ Output
Organising files from: inbox
Destination root : organised

Moved [images ] photo.jpg -> images/photo.jpg
Moved [documents ] report.pdf -> documents/report.pdf
Moved [data ] data_export.csv -> data/data_export.csv
Moved [documents ] notes.txt -> documents/notes.txt
Moved [videos ] video.mp4 -> videos/video.mp4

Manifest written to: organised/manifest.txt

Done. 5 file(s) organised.
🔥
Interview Gold: rename() vs replace()Path.rename() raises an error if destination already exists on some platforms (especially Windows). Path.replace() always overwrites the destination atomically. For collision-safe moves, either check destination.exists() first (as shown above) or use replace() when overwriting is intentional. Interviewers love this distinction because it's a real cross-platform bug that bites teams in production.

os.walk vs pathlib.rglob — Choosing the Right Recursive Tool

When you need to traverse a directory tree recursively, you have two solid options: the classic os.walk() and pathlib's rglob(). They solve the same problem differently, and knowing when to pick each one marks you as someone who actually thinks about the tools they use.

os.walk() is a generator that yields a three-tuple for every directory it visits: (dirpath, list_of_subdirs, list_of_files). This gives you fine-grained control — you can modify the subdirectory list in-place to prune branches you don't want to descend into. That's powerful when you need to skip hidden directories, node_modules, or .git folders without visiting them at all.

pathlib.rglob('*.csv') is simpler: it returns a flat generator of Path objects matching the pattern, anywhere in the tree. You don't get the tree structure, just the matches. For the common case of 'find me all files of type X', rglob is less code and more readable. Use os.walk when you need to control traversal behaviour; use rglob when you just need the results.

recursive_traversal.py · PYTHON
12345678910111213141516171819202122232425262728293031323334353637383940414243444546
import os
from pathlib import Path

project_root = Path('./sample_project')

# --- Set up a sample directory tree for demonstration ---
for folder in ['src', 'src/utils', 'tests', 'docs', '.git', 'node_modules']:
    (project_root / folder).mkdir(parents=True, exist_ok=True)
for filepath in ['src/main.py', 'src/utils/helpers.py', 'tests/test_main.py',
                 'docs/readme.md', '.git/config', 'node_modules/package.json']:
    (project_root / filepath).write_text(f"# {filepath}")

print("=" * 55)
print("METHOD 1: pathlib rglob — simple pattern matching")
print("=" * 55)

# rglob('*.py') matches any .py file in any subdirectory
for python_file in sorted(project_root.rglob('*.py')):
    print(f"  {python_file.relative_to(project_root)}")
# OUTPUT INCLUDES node_modules and .git if they had .py files — no pruning

print()
print("=" * 55)
print("METHOD 2: os.walk — pruning hidden/vendor directories")
print("=" * 55)

# Directories we never want to descend into
SKIP_DIRS = {'.git', 'node_modules', '__pycache__', '.venv'}

for dirpath, subdirs, filenames in os.walk(project_root):
    # Modify subdirs IN-PLACE — this tells os.walk not to visit pruned folders
    # This is the KEY advantage of os.walk over rglob
    subdirs[:] = [
        directory for directory in subdirs
        if directory not in SKIP_DIRS
    ]

    for filename in filenames:
        if filename.endswith('.py'):
            full_path     = os.path.join(dirpath, filename)
            relative_path = os.path.relpath(full_path, project_root)
            print(f"  {relative_path}")

print()
print("Notice: .git and node_modules were skipped entirely by os.walk,")
print("but rglob would have descended into them if they contained .py files.")
▶ Output
=======================================================
METHOD 1: pathlib rglob — simple pattern matching
=======================================================
src/main.py
src/utils/helpers.py
tests/test_main.py

=======================================================
METHOD 2: os.walk — pruning hidden/vendor directories
=======================================================
src/main.py
src/utils/helpers.py
tests/test_main.py

Notice: .git and node_modules were skipped entirely by os.walk,
but rglob would have descended into them if they contained .py files.
⚠️
Pro Tip: Filter rglob Results with a Generator ExpressionIf you want rglob simplicity but need to skip certain folders, filter the results: python_files = (f for f in project_root.rglob('*.py') if '.git' not in f.parts and 'node_modules' not in f.parts). Checking f.parts (a tuple of path segments) is cleaner than string 'in' checks because it won't accidentally match a folder named 'not_git' when you check for 'git'.
Feature / Aspectos / os.pathpathlib.Path
Python version introducedPython 1 (ancient, stable)Python 3.4+
StyleProcedural — functions on stringsObject-oriented — methods on objects
Path joiningos.path.join('a', 'b', 'c')Path('a') / 'b' / 'c'
Get filenameos.path.basename(path)path.name
Get extensionos.path.splitext(f)[1]path.suffix
Check existenceos.path.exists(path)path.exists()
Create directoriesos.makedirs(path, exist_ok=True)path.mkdir(parents=True, exist_ok=True)
Read file contentsopen(path).read()path.read_text()
Recursive file searchos.walk() with manual filteringpath.rglob('*.ext')
Prune traversal branchesYes — modify subdirs[:] in os.walkNo — must filter results after the fact
Environment variablesos.environ.get('KEY', 'default')Not supported — use os for this
Return type of listingStringsPath objects (immediately useful)
Readability for beginnersModerate — many functions to rememberHigh — reads like plain English
Best used forEnv vars, process info, legacy codeAll new file path manipulation code

🎯 Key Takeaways

  • Never hard-code path separators — use pathlib's / operator or os.path.join() and let Python handle the OS difference between / and \\.
  • Anchor file paths to __file__, not os.getcwd() — your script's location is fixed, but the working directory depends on where the user runs it from.
  • pathlib is the right default for all new code — it returns objects you can keep working with, not raw strings that need re-parsing with more function calls.
  • Use os.walk when you need to prune the traversal tree (skip vendor folders, hidden dirs) — modifying subdirs[:] in-place is a feature rglob simply doesn't have.

⚠ Common Mistakes to Avoid

  • Mistake 1: Hard-coding path separators like 'data/reports/file.csv' or 'data\\reports\\file.csv' — Your script crashes or produces wrong paths on a different OS — Use os.path.join() or the pathlib / operator so the separator is always correct for the platform running the code.
  • Mistake 2: Using os.getcwd() as the base for relative paths — If someone runs your script from a different working directory (e.g., python scripts/process.py from the project root), all your relative paths point to the wrong place — Always anchor relative paths to __file__ using Path(__file__).resolve().parent or os.path.dirname(os.path.abspath(__file__)) so paths are relative to the script, not the shell.
  • Mistake 3: Calling path.mkdir() without exist_ok=True in a script that might run more than once — Raises FileExistsError on the second run, crashing your pipeline — Always pass exist_ok=True (and parents=True if you're creating nested directories) unless you specifically need to error when the folder already exists.

Interview Questions on This Topic

  • QWhat's the difference between os.path.abspath() and Path.resolve() — and is there a case where they'd return different results?
  • QIf you need to recursively find all Python files in a project but skip node_modules and .git directories, would you use os.walk or pathlib.rglob, and why?
  • QPath.rename() and Path.replace() both move files — what's the critical difference between them, and when has choosing the wrong one caused a real bug?

Frequently Asked Questions

Should I use os.path or pathlib for new Python projects?

Use pathlib for all new code. It's been the recommended approach since Python 3.6 and produces cleaner, more readable code. The only time you still reach for os directly is for environment variables (os.environ), process utilities, or when maintaining older codebases that already use os.path throughout.

How do I convert between a pathlib Path object and a plain string?

Wrap the Path in str(): str(Path('data/file.csv')) gives you a plain string. Going the other way, just pass the string to Path(): Path('/home/user/data/file.csv'). Most modern Python libraries like open(), pandas.read_csv(), and json.load() accept Path objects directly, so you rarely need to convert at all.

What's the difference between Path.glob() and Path.rglob()?

glob() only searches inside the immediate directory and one level of pattern — Path('data').glob('.csv') finds CSVs directly inside data/ but not inside data/subfolders/. rglob() is recursive — Path('data').rglob('.csv') finds every CSV file anywhere in the entire tree under data/. The 'r' stands for recursive. When in doubt about which to use, rglob is the safer choice if you're not sure how deep your files are nested.

🔥
TheCodeForge Editorial Team Verified Author

Written and reviewed by senior developers with real-world experience across enterprise, startup and open-source projects. Every article on TheCodeForge is written to be clear, accurate and genuinely useful — not just SEO filler.

← PreviousWorking with CSV in PythonNext →pickle Module in Python
Forged with 🔥 at TheCodeForge.io — Where Developers Are Forged