Skip to content
Home ML / AI How to Set Up Your Machine Learning Environment in 2026 (Beginner Guide)

How to Set Up Your Machine Learning Environment in 2026 (Beginner Guide)

Where developers are forged. · Structured learning · Free forever.
📍 Part of: ML Basics → Topic 16 of 25
Complete beginner setup for Jupyter, Anaconda, VS Code, Python, scikit-learn, TensorFlow and PyTorch.
🧑‍💻 Beginner-friendly — no prior ML / AI experience needed
In this tutorial, you'll learn
Complete beginner setup for Jupyter, Anaconda, VS Code, Python, scikit-learn, TensorFlow and PyTorch.
  • Install Python 3.12 and create a dedicated virtual environment for every project — this single habit prevents 90% of ML environment issues
  • Install libraries in order: NumPy first, then pandas, then scikit-learn, then MLflow and LLM SDKs, then deep learning last
  • Never mix TensorFlow and PyTorch in the same environment without explicit version pinning for every shared dependency
✦ Plain-English analogy ✦ Real code with output ✦ Interview questions
Quick Answer
  • ML environment setup requires Python 3.11 or 3.12, a package manager, an IDE, and core libraries installed in the correct order
  • Anaconda or pip manages dependencies — never mix both in the same project without explicit isolation
  • VS Code with the Jupyter extension replaces standalone Jupyter Notebook for most workflows in 2026
  • Performance insight: virtual environments add zero runtime overhead but prevent 90% of dependency conflicts
  • Production insight: environment mismatches between local and deployed code cause silent model failures — version pinning is mandatory
  • Biggest mistake: installing TensorFlow and PyTorch in the same environment without version pinning
  • 2026 addition: add the openai or anthropic SDK to your environment from day one — LLM API calls are a baseline expectation in most ML roles
🚨 START HERE
ML Environment Setup Quick Reference
Immediate commands for environment setup, verification, and debugging in 2026
🟡Need to verify Python and library versions across the full stack
Immediate ActionRun version check commands for all core libraries in one pass
Commands
python --version && pip list | grep -E 'numpy|pandas|scikit-learn|torch|tensorflow|openai|anthropic|mlflow'
python -c "import sys; print(f'Python {sys.version}'); import numpy; print(f'NumPy {numpy.__version__}'); import sklearn; print(f'scikit-learn {sklearn.__version__}'); import torch; print(f'PyTorch {torch.__version__}')"
Fix NowIf versions do not match expected, do not patch in place — recreate the virtual environment from requirements.txt to avoid compounding the mismatch
🟡Need to check if GPU is available and performing correctly
Immediate ActionTest CUDA availability, device name, and compute performance for PyTorch
Commands
python -c "import torch; print(f'CUDA available: {torch.cuda.is_available()}'); print(f'GPU: {torch.cuda.get_device_name(0) if torch.cuda.is_available() else None}'); print(f'PyTorch: {torch.__version__}')"
python -c "import torch, time; size=4096; a=torch.randn(size,size); b=torch.randn(size,size); s=time.time(); torch.matmul(a,b); print(f'CPU: {time.time()-s:.2f}s'); a,b=a.cuda(),b.cuda(); torch.cuda.synchronize(); s=time.time(); torch.matmul(a,b); torch.cuda.synchronize(); print(f'GPU: {time.time()-s:.2f}s')" 2>/dev/null || echo 'GPU not available — CPU only'
Fix NowIf CUDA is False despite having an NVIDIA GPU, run nvidia-smi to confirm driver is installed, then reinstall PyTorch with the matching CUDA build from pytorch.org
🟡Need to freeze current environment for reproducibility
Immediate ActionGenerate pinned requirements.txt and verify it captures all dependencies
Commands
pip freeze > requirements.txt && wc -l requirements.txt
cat requirements.txt | grep -E 'numpy|pandas|scikit-learn|torch|openai|mlflow'
Fix NowCommit requirements.txt to version control immediately — an unpinned environment has a half-life measured in weeks
🟡Need to recreate environment from requirements.txt on a new machine
Immediate ActionCreate a fresh virtual environment and install pinned dependencies
Commands
python3.12 -m venv ml_env && source ml_env/bin/activate && pip install --upgrade pip
pip install -r requirements.txt
Fix NowIf any package fails to install, check for OS-specific wheels or CUDA build mismatches — the error message will name the conflicting package
Production IncidentModel Gives Different Results on Developer Laptop vs Production ServerA fraud detection model scored 94% accuracy locally but dropped to 61% in production. The root cause was a scikit-learn version mismatch that silently changed default hyperparameters without raising a single warning.
SymptomModel accuracy dropped from 94% to 61% immediately after deployment. No code changes between local and production. Same dataset, same algorithm, same random seed. The model had passed all local tests.
AssumptionThe team assumed environment differences only affected installation speed and error messages — not model behavior. They tracked code versions in Git but never verified library versions across environments.
Root causeLocal environment used scikit-learn 1.3.0 while production used 1.1.0. The RandomForestClassifier default for max_features changed between these versions from 'auto' to 'sqrt'. This silently altered the model's feature sampling strategy on every tree, producing a fundamentally different model — no import errors, no warnings, no indication anything was wrong until predictions landed in production.
Fix1. Added requirements.txt with pinned versions using == for every dependency 2. Added an environment verification script that checks library versions at application startup and fails loudly if versions do not match expected 3. Implemented a CI pipeline step that runs model evaluation tests against a Docker image matching the production environment exactly 4. Replaced ad-hoc deployment with a Docker container that carries the environment definition with it — local and production are now guaranteed identical
Key Lesson
Always pin dependency versions in requirements.txt using == — not >= or ~=Library version mismatches change model behavior, not just installation behavior — this is the dangerous caseVerify library versions match between local and production before every deployment, not after something breaksDocker is the only reliable way to guarantee environment consistency across machines and teammates
Production Debug GuideSymptom to action mapping for common setup issues
ImportError: DLL load failed when importing tensorflow on WindowsInstall Microsoft Visual C++ Redistributable 2019 or later from the Microsoft download page. Verify your Python version matches the TensorFlow wheel — TensorFlow 2.16+ requires Python 3.10 to 3.12 on Windows. Mismatched Python versions produce this error silently even when the install appears to succeed.
ModuleNotFoundError for a library you just installedYou installed into a different environment than the one you are running. Run 'which python' on macOS/Linux or 'where python' on Windows, then 'pip list' to confirm the active environment contains the library. The fix is always: activate the correct environment first, then install.
Jupyter kernel crashes when importing torch or tensorflowJupyter is using a different Python than your virtual environment. Register the virtual environment as a Jupyter kernel: python -m ipykernel install --user --name ml_env --display-name 'ML 2026'. Then restart Jupyter and select the new kernel from the kernel menu.
CUDA not available error when running PyTorch on an NVIDIA GPURun nvidia-smi and note the CUDA version shown in the top-right corner — this is the maximum CUDA version your driver supports. Install the matching PyTorch build from pytorch.org using the official install selector. Installing PyTorch without specifying the CUDA build installs the CPU-only version by default and produces exactly this error.
pip install takes forever or fails with connection timeoutIncrease the timeout threshold: pip install --timeout 300 package_name. If behind a corporate proxy, set HTTP_PROXY and HTTPS_PROXY environment variables. As a last resort, use an alternative mirror: pip install -i https://pypi.tuna.tsinghua.edu.cn/simple package_name.
openai or anthropic SDK imports fail after installationConfirm you are in the correct virtual environment with 'pip list | grep openai'. If present but still failing, check that your OPENAI_API_KEY environment variable is set — the SDK validates the key at import time in some versions. Use python-dotenv to load a .env file rather than hardcoding keys in source files.
numpy version conflict error after installing torch or tensorflowDeep learning libraries frequently require a specific NumPy range. Check the error message for the required version range, then pin NumPy explicitly: pip install 'numpy>=1.24,<2.0'. If the conflict persists, recreate the environment from scratch and install in the order specified in Step 3 of this guide.

ML environment setup is the first barrier that stops most beginners — and it is entirely avoidable with the right sequence. Dependency conflicts between TensorFlow, PyTorch, and scikit-learn create cryptic errors that derail learning momentum at the worst possible moment. The core problem is not complexity — it is sequencing and isolation. Installing tools in the wrong order or mixing package managers creates conflicts that take hours to diagnose and are nearly impossible to trace without experience. This guide provides a tested installation sequence for 2026 that avoids the common pitfalls. Every step produces a verifiable output so you know exactly where something breaks before it becomes a three-hour debugging session. The environment you build here will support classical ML, deep learning, and LLM API integration — the three layers of a complete 2026 ML workflow.

Step 1: Install Python

Python is the foundation of every ML environment. In 2026, Python 3.11 and 3.12 are the stable targets for ML work. Python 3.11 improved interpreter performance by 10 to 60 percent over 3.10 and has broad library compatibility. Python 3.12 is the current release with full support from NumPy, pandas, PyTorch 2.x, and the OpenAI SDK. Avoid Python 3.13 for ML work in 2026 — some compiled ML libraries lag by one to two minor versions. Never use the system Python that ships with macOS or Linux for ML development — it exists for the operating system, not you.

install_python.sh · BASH
1234567891011121314151617181920212223242526
# macOS — install via Homebrew (recommended)
brew install python@3.12
# Verify
python3.12 --version
# Expected: Python 3.12.x

# macOS/Linux — alternatively install via pyenv for multi-version management
curl https://pyenv.run | bash
# Add to shell profile (~/.zshrc or ~/.bashrc):
# export PATH="$HOME/.pyenv/bin:$PATH"
# eval "$(pyenv init -)"
pyenv install 3.12.3
pyenv global 3.12.3
python --version
# Expected: Python 3.12.3

# Windows — download from python.org/downloads
# Check 'Add Python to PATH' during installation
# Verify in PowerShell:
python --version
# Expected: Python 3.12.x

# Verify pip is installed and up to date
python3.12 -m pip install --upgrade pip
pip --version
# Expected: pip 24.x from .../python3.12/site-packages/pip
▶ Output
Python 3.12.3
pip 24.0 from /usr/local/lib/python3.12/site-packages/pip (python 3.12)
⚠ Which Python Version to Install in 2026
📊 Production Insight
Python version mismatches cause the same class of silent failures as library version mismatches.
Document the exact Python version in README.md and your setup script — not just 'Python 3'.
pyenv makes switching between Python versions per-project trivial and is worth installing from day one.
🎯 Key Takeaway
Python 3.12 is the right target for ML work in 2026.
Never use system Python for ML development — it exists for the OS, not you.
pyenv is the cleanest way to manage multiple Python versions across projects.
Python Installation Method Selection
IfmacOS, single Python version needed
UseInstall via Homebrew: brew install python@3.12 — simplest path, keeps system Python untouched
IfmacOS or Linux, need multiple Python versions across projects
UseInstall pyenv first, then install Python versions through pyenv — version switching becomes one command
IfWindows
UseDownload from python.org, check 'Add to PATH', verify in PowerShell — straightforward if you follow the checklist
IfTeam environment with strict version requirements
UseUse Docker with FROM python:3.12-slim — guarantees every team member runs identical Python without manual installation

Step 2: Create a Virtual Environment

Virtual environments isolate project dependencies so different projects can use different library versions without conflicts. This is not optional — it is the single step that prevents 90% of the dependency errors that derail beginners. Every ML project gets its own environment. The two standard tools are venv (built into Python, no install required) and conda (from Anaconda or Miniconda, better for managing compiled dependencies like CUDA). For most beginners, venv with pip is the right starting point. For teams managing GPU drivers, CUDA versions, and complex compiled dependencies across operating systems, conda provides better control. In 2026, a third option has become practical for teams: container-first development using Docker, where the environment definition lives in a Dockerfile and every developer runs the same container.

create_virtual_env.sh · BASH
12345678910111213141516171819202122232425262728293031323334353637383940414243444546
# Option A: venv (built into Python — recommended for most users)
# Create a named environment directory outside the project so it is never committed
python3.12 -m venv ~/ml_envs/ml_2026

# Activate on macOS/Linux
source ~/ml_envs/ml_2026/bin/activate

# Activate on Windows (PowerShell)
# ~/ml_envs/ml_2026/Scripts/Activate.ps1

# Verify you are in the virtual environment — path must point to ml_2026
which python
# Expected: ~/ml_envs/ml_2026/bin/python

which pip
# Expected: ~/ml_envs/ml_2026/bin/pip

# Upgrade pip, setuptools, and wheel inside the environment before installing anything else
pip install --upgrade pip setuptools wheel

# When you are done working
deactivate

# ---

# Option B: conda (from Anaconda or Miniconda)
# Create conda environment with Python version pinned
conda create -n ml_2026 python=3.12 -y

# Activate
conda activate ml_2026

# Verify — path must point to the conda environment
which python
conda list | head -20

# Deactivate
conda deactivate

# ---

# Add this to .gitignore to ensure the environment is never committed
echo 'ml_env/' >> .gitignore
echo '__pycache__/' >> .gitignore
echo '*.pyc' >> .gitignore
echo '.env' >> .gitignore
▶ Output
(ml_2026) $ which python
/Users/username/ml_envs/ml_2026/bin/python
(ml_2026) $ pip --version
pip 24.0 from /Users/username/ml_envs/ml_2026/lib/python3.12/site-packages/pip (python 3.12)
(ml_2026) $ python --version
Python 3.12.3
⚠ Virtual Environment Rules — Non-Negotiable
📊 Production Insight
Dependency conflicts are the single largest source of ML environment issues for beginners and experienced engineers alike.
Virtual environments eliminate the conflict by design — isolation is cheaper than debugging.
Without isolation, installing TensorFlow frequently silently downgrades NumPy in a way that breaks scikit-learn imports with no obvious error message.
🎯 Key Takeaway
Virtual environments are mandatory for ML development — no exceptions.
One project equals one environment equals no dependency conflicts.
venv for simplicity, conda for compiled dependencies, Docker for team reproducibility.
Virtual Environment Tool Selection
IfBeginner, single Python version, pip packages only
UseUse venv — it is built into Python 3.12, requires no installation, and covers 95% of ML use cases
IfNeed multiple Python versions simultaneously or non-Python compiled dependencies
UseUse conda — it manages Python versions and compiled C/Fortran dependencies that pip cannot handle cleanly
IfTeam project with strict reproducibility requirements
UseUse Docker with a pinned requirements.txt — this is the only approach that guarantees byte-for-byte environment consistency
IfWorking with Jupyter notebooks across multiple projects
UseUse venv + ipykernel to register each environment as a separate Jupyter kernel — one kernel per environment

Step 3: Install Core ML Libraries

Core ML libraries form the foundation of every project. Install them in a specific order to avoid dependency conflicts — this sequence has been tested against the 2026 library release landscape. NumPy must be installed first because every other scientific Python library links against it at compile time. Then pandas for data manipulation, matplotlib and seaborn for visualization, scikit-learn for classical ML algorithms, and Jupyter support. Deep learning libraries come last and ideally live in their own environment. In 2026, add the openai SDK or anthropic SDK to your baseline environment — LLM API calls are now a standard component of production ML pipelines, not an advanced specialty skill. Add MLflow for experiment tracking from the start rather than retrofitting it later.

install_core_libraries.sh · BASH
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758
# Always activate your virtual environment first — verify with 'which python'
source ~/ml_envs/ml_2026/bin/activate

# Step 1: Upgrade pip before installing anything
pip install --upgrade pip setuptools wheel

# Step 2: Install core data science stack — order matters
pip install numpy==1.26.4
pip install pandas==2.2.2
pip install matplotlib==3.9.0
pip install seaborn==0.13.2

# Step 3: Install scikit-learn and gradient boosting libraries
pip install scikit-learn==1.5.0
pip install xgboost==2.0.3
pip install lightgbm==4.3.0

# Step 4: Install Jupyter support
pip install jupyter==1.0.0 ipykernel==6.29.4
# Register this environment as a Jupyter kernel
python -m ipykernel install --user --name ml_2026 --display-name "ML 2026 (Python 3.12)"

# Step 5: Install experiment tracking
pip install mlflow==2.13.0

# Step 6: Install LLM API SDKs — baseline in 2026
pip install openai==1.30.1
pip install anthropic==0.28.0
pip install python-dotenv==1.0.1

# Step 7: Install ONE deep learning library
# Option A: PyTorch — recommended for beginners and researchers in 2026
# CPU-only version (fast to install, no GPU required for learning)
pip install torch==2.3.0 torchvision==0.18.0 torchaudio==2.3.0 --index-url https://download.pytorch.org/whl/cpu
# GPU version — get the exact command from pytorch.org/get-started/locally
# pip install torch==2.3.0 torchvision==0.18.0 --index-url https://download.pytorch.org/whl/cu121

# Option B: TensorFlowif your team uses it
# pip install tensorflow==2.16.1

# Do not install both in the same environment without explicit version pinning and testing

# Step 8: Verify every importcatch silent failures before they surface in a notebook
python -c "
import numpy as np; print(f'NumPy {np.__version__}')
import pandas as pd; print(f'pandas {pd.__version__}')
import sklearn; print(f'scikit-learn {sklearn.__version__}')
import matplotlib; print(f'matplotlib {matplotlib.__version__}')
import xgboost; print(f'XGBoost {xgboost.__version__}')
import mlflow; print(f'MLflow {mlflow.__version__}')
import openai; print(f'openai {openai.__version__}')
import torch; print(f'PyTorch {torch.__version__}')
print('All imports successful')
"

# Step 9: Freeze requirements with exact version pins
pip freeze > requirements.txt
echo "requirements.txt generated with $(wc -l < requirements.txt) packages"
▶ Output
NumPy 1.26.4
pandas 2.2.2
scikit-learn 1.5.0
matplotlib 3.9.0
XGBoost 2.0.3
MLflow 2.13.0
openai 1.30.1
PyTorch 2.3.0+cpu
All imports successful
requirements.txt generated with 87 packages
⚠ Installation Order and Version Pinning
📊 Production Insight
Installing deep learning libraries before NumPy frequently silently downgrades NumPy to an older version that conflicts with scikit-learn.
TensorFlow and PyTorch have historically required conflicting NumPy version ranges — check the current compatibility matrix at pytorch.org and tensorflow.org before installing both.
Always verify imports after installation. Silent failures during install complete without error but raise ImportError at runtime, often minutes into a training run.
MLflow takes two minutes to install and saves hours when you need to compare model versions. Install it on day one.
🎯 Key Takeaway
Install in order: NumPy, pandas, visualization, scikit-learn, XGBoost, Jupyter, MLflow, LLM SDKs, then deep learning last.
Pin every version with == in requirements.txt — this single habit prevents the most common class of production environment failures.
Add openai and mlflow to your baseline environment — they are part of the 2026 ML stack, not advanced add-ons.

Step 4: Configure VS Code for ML Development

VS Code with the Jupyter extension has replaced standalone Jupyter Notebook as the standard ML development environment in 2026. It gives you IntelliSense, inline type checking, debugging with breakpoints inside notebook cells, Git integration, and notebook support in a single editor — with none of the browser tab management overhead of classic Jupyter. The critical configuration is selecting the correct Python interpreter from your virtual environment. Get this wrong and every import will fail with ModuleNotFoundError while the library is sitting correctly installed in a different environment. Configure settings.json per project rather than globally so team members get consistent behavior automatically.

vscode_settings.json · JSON
12345678910111213141516171819202122232425262728293031323334353637383940
{
  "python.defaultInterpreterPath": "~/ml_envs/ml_2026/bin/python",

  "jupyter.askForKernelRestart": false,
  "jupyter.alwaysTrustNotebooks": true,
  "notebook.cellToolbarLocation": {
    "default": "right",
    "jupyter-notebook": "left"
  },
  "notebook.output.scrolling": true,
  "notebook.cellExecutionTimeout": 600000,

  "[python]": {
    "editor.defaultFormatter": "ms-python.black-formatter",
    "editor.formatOnSave": true,
    "editor.codeActionsOnSave": {
      "source.organizeImports": "explicit"
    }
  },

  "python.analysis.typeCheckingMode": "basic",
  "python.analysis.autoImportCompletions": true,

  "files.exclude": {
    "**/__pycache__": true,
    "**/*.pyc": true,
    "**/.ipynb_checkpoints": true,
    "**/ml_env": true
  },
  "search.exclude": {
    "**/data/**/*.csv": true,
    "**/data/**/*.parquet": true,
    "**/*.pkl": true,
    "**/*.pt": true
  },

  "git.ignoreLimitWarning": true,
  "editor.rulers": [88],
  "editor.tabSize": 4
}
💡Essential VS Code Extensions for ML in 2026
  • Python (ms-python.python) — core language support, interpreter selection, and test runner integration
  • Jupyter (ms-toolsai.jupyter) — notebook support with variable explorer and cell-level debugging
  • Black Formatter (ms-python.black-formatter) — automatic formatting on save, consistent style across teams
  • Pylance (ms-python.vscode-pylance) — fast IntelliSense, import resolution, and type checking powered by Pyright
  • GitLens — commit history and blame annotations per line, essential for tracking when a model change was introduced
  • Thunder Client — lightweight REST client for testing your FastAPI prediction endpoints without leaving VS Code
📊 Production Insight
Selecting the wrong Python interpreter is the single most common cause of ModuleNotFoundError in VS Code — the library is installed correctly, but VS Code is using system Python.
Always set python.defaultInterpreterPath in the project-level .vscode/settings.json, not just through the status bar selector — the status bar selection does not persist for teammates who clone the repo.
FormatOnSave with Black takes zero effort and eliminates style debates in code review.
🎯 Key Takeaway
VS Code with the Jupyter extension is the 2026 standard — faster, more debuggable, and better integrated than standalone Jupyter Notebook.
Set the Python interpreter path in settings.json per project — relying on the status bar selector breaks when teammates clone the repo.
Install all six extensions before starting any project — they pay for themselves within the first hour.

Step 5: GPU Setup for Deep Learning

GPU acceleration reduces deep learning training time from hours to minutes for medium-sized models and from days to hours for large ones. NVIDIA GPUs with CUDA support are required for both PyTorch and TensorFlow. The setup requires three components installed in a specific order: NVIDIA driver, CUDA toolkit, and cuDNN library. Version compatibility between all three is critical — mismatched versions produce cryptic CUDA errors or, worse, silent CPU fallback where training appears to work but runs 40 times slower without any warning. If you do not have an NVIDIA GPU, skip local GPU setup entirely and use Google Colab or Kaggle Notebooks — both provide free GPU access sufficient for learning and small projects.

gpu_setup_verify.sh · BASH
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859
# Step 1: Verify NVIDIA GPU is detected by the system
lspci | grep -i nvidia
# On Windows: Device Manager > Display Adapters

# Step 2: Check installed NVIDIA driver and supported CUDA version
nvidia-smi
# Top-right corner shows maximum supported CUDA version
# Example: CUDA Version: 12.4 means your driver supports CUDA up to 12.4

# Step 3: Match PyTorch CUDA build to your driver's supported CUDA version
# Get the exact install command from: https://pytorch.org/get-started/locally/
# Example for CUDA 12.1:
pip install torch==2.3.0 torchvision==0.18.0 --index-url https://download.pytorch.org/whl/cu121

# Step 4: Verify CUDA is detected by PyTorch
python -c "
import torch
print(f'PyTorch version: {torch.__version__}')
print(f'CUDA available: {torch.cuda.is_available()}')
if torch.cuda.is_available():
    print(f'CUDA version: {torch.version.cuda}')
    print(f'GPU count: {torch.cuda.device_count()}')
    print(f'GPU name: {torch.cuda.get_device_name(0)}')
    print(f'GPU memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB')
else:
    print('No CUDA GPU detected — running on CPU')
    print('Tip: install the CPU-only PyTorch build if you do not have a GPU')
"

# Step 5: Benchmark CPU vs GPU to confirm GPU is being used
python -c "
import torch
import time

size = 4096
a_cpu = torch.randn(size, size)
b_cpu = torch.randn(size, size)

start = time.time()
_ = torch.matmul(a_cpu, b_cpu)
cpu_time = time.time() - start
print(f'CPU matrix multiply ({size}x{size}): {cpu_time:.3f}s')

if torch.cuda.is_available():
    a_gpu = a_cpu.cuda()
    b_gpu = b_cpu.cuda()
    # Warm up the GPU
    torch.matmul(a_gpu, b_gpu)
    torch.cuda.synchronize()
    start = time.time()
    _ = torch.matmul(a_gpu, b_gpu)
    torch.cuda.synchronize()
    gpu_time = time.time() - start
    print(f'GPU matrix multiply ({size}x{size}): {gpu_time:.3f}s')
    print(f'Speedup: {cpu_time / gpu_time:.1f}x')
else:
    print('GPU not available — using CPU only')
    print('For learning, Google Colab provides free GPU: colab.research.google.com')
"
▶ Output
PyTorch version: 2.3.0+cu121
CUDA available: True
CUDA version: 12.1
GPU count: 1
GPU name: NVIDIA RTX 4090
GPU memory: 24.6 GB
CPU matrix multiply (4096x4096): 4.823s
GPU matrix multiply (4096x4096): 0.119s
Speedup: 40.5x
⚠ GPU Version Compatibility Matrix — Check Before Installing
📊 Production Insight
GPU setup is the most frustrating part of ML environment setup because the error messages are cryptic and version mismatches look identical to missing drivers.
The correct debugging sequence is always: driver first (nvidia-smi), then CUDA version, then PyTorch wheel — never backwards.
For learning and small projects, Google Colab and Kaggle Notebooks eliminate local GPU setup entirely and provide free access to T4 and A100 GPUs.
For production training on large models, cloud GPU instances on Lambda Labs, AWS, or GCP are more cost-effective than local RTX cards when you factor in electricity and downtime.
🎯 Key Takeaway
GPU setup requires driver, CUDA, and PyTorch wheel version alignment — check nvidia-smi before installing anything.
For learning, skip local GPU setup and use Google Colab or Kaggle — the friction is not worth it until you need it.
The 40x speedup from a GPU only matters for deep learning — classical ML on CPU is fast enough for most projects.
GPU Setup Strategy by Situation
IfNo NVIDIA GPU on your machine and you are learning
UseUse Google Colab (free T4 GPU) or Kaggle Notebooks (free 30 hours per week) — zero local setup, start training in under 5 minutes
IfHave NVIDIA GPU but want the simplest possible CUDA setup
UseUse conda to install PyTorch — conda resolves CUDA dependencies automatically based on your driver version
IfHave NVIDIA GPU and need full control over CUDA version
UseInstall CUDA toolkit manually matching your driver, then install the matching PyTorch wheel from pytorch.org
IfTraining models larger than 10B parameters or need multi-GPU
UseUse cloud GPU instances (AWS p4d, GCP A100, Lambda Labs) — local hardware is impractical at this scale

Step 6: Project Structure and Reproducibility

A well-structured ML project prevents confusion as it grows from one notebook to ten files to a deployed API. Every project needs a standard directory layout, a pinned requirements.txt, a README with setup instructions, and version control. Reproducibility means another developer — or future you six months from now — can clone the repo, run one setup command, and get identical results. This requires four things working together: pinned dependencies, documented Python version, deterministic random seeds, and a setup script that does not require tribal knowledge. In 2026, add a .env.example file to show collaborators what environment variables the project needs without committing actual API keys, and add a pre-commit configuration to enforce formatting and prevent secrets from being committed accidentally.

project_structure.txt · TEXT
1234567891011121314151617181920212223242526272829303132333435363738394041424344
my_ml_project/
├── README.md                    # Problem statement, setup instructions, results
├── requirements.txt             # Pinned dependencies — pip freeze output
├── setup.sh                     # One-command environment setup script
├── .env.example                 # Template for required environment variables (no real keys)
├── .gitignore                   # Excludes: data/, models/, .env, __pycache__, *.pkl, *.pt
├── .pre-commit-config.yaml      # Black formatting + detect-secrets hook
├── .vscode/
│   └── settings.json            # Project-level VS Code configuration
├── data/
│   ├── raw/                     # Original unmodified source data — never edit these
│   ├── processed/               # Cleaned and transformed data ready for modeling
│   └── .gitkeep                 # Preserves directory structure in Git without committing data
├── notebooks/
│   ├── 01_eda.ipynb             # Exploratory data analysis
│   ├── 02_feature_engineering.ipynb
│   └── 03_modeling.ipynb        # Training, evaluation, model selection
├── src/
│   ├── __init__.py
│   ├── data/
│   │   ├── __init__.py
│   │   ├── load.py              # Data loading — reads from data/raw/
│   │   └── preprocess.py        # Cleaning and transformation pipeline
│   ├── features/
│   │   ├── __init__.py
│   │   └── build_features.py    # Feature engineering functions
│   ├── models/
│   │   ├── __init__.py
│   │   ├── train.py             # Training script with MLflow logging
│   │   └── predict.py           # Inference logic — used by API and tests
│   └── visualization/
│       ├── __init__.py
│       └── visualize.py
├── models/                      # Saved model artifacts — tracked with DVC, not Git
│   └── .gitkeep
├── tests/
│   ├── test_data.py             # Validate data loading and preprocessing
│   ├── test_features.py
│   └── test_models.py           # Smoke tests for prediction output shape and type
├── api/
│   ├── app.py                   # FastAPI prediction endpoint
│   ├── Dockerfile               # Container definition for deployment
│   └── docker-compose.yml       # Local API + MLflow server orchestration
└── mlruns/                      # MLflow experiment tracking (add to .gitignore for large teams)
💡Reproducibility Checklist for 2026
  • Pin all dependency versions with == in requirements.txt — not >= or ~=
  • Document Python version in README.md and in setup.sh — 'Python 3' is not specific enough
  • Set random seeds for numpy, Python random module, and PyTorch at the top of every training script
  • Add .env.example to show collaborators required environment variables — never commit .env
  • Include a setup.sh that recreates the environment in one command — test it on a clean machine
  • Track large model artifacts with DVC, not Git — repositories with pickle files in version control are painful to work with
📊 Production Insight
Unpinned dependencies break reproducibility within weeks on active projects — a teammate's pip install updates a transitive dependency and suddenly your model outputs differ.
A tested setup.sh saves hours of environment troubleshooting for every new team member and every CI run.
Do not commit .env files — use .env.example to document what variables are needed, and load them with python-dotenv in local development.
Model artifacts belong in DVC or cloud object storage, not Git. A 200MB pickle file in Git history makes every clone painful forever.
🎯 Key Takeaway
Standard project structure prevents the 'works on my machine' problem as projects grow.
Pin dependencies, document Python version, include a tested setup script, and use .env.example for API keys.
Reproducibility is an engineering requirement in 2026 — not a nice-to-have that you add later.
🗂 ML Environment Tools Comparison
Key differences between environment management approaches in 2026
ToolPackage ManagerComplexityBest ForGPU Support
venv + pippip (PyPI)LowIndividual projects and beginners — simplest path from zero to workingManual CUDA install required
Anacondaconda (defaults + conda-forge)MediumData science teams managing compiled dependencies and multiple Python versionsconda resolves CUDA automatically
Minicondaconda (minimal install)MediumExperienced users who want conda's dependency resolution without the 3GB Anaconda base installconda resolves CUDA automatically
Dockerpip inside containerHighTeam reproducibility and production deployment — the only approach that guarantees identical environmentsNVIDIA Container Toolkit required
Google Colabpip (pre-installed stack)Very LowLearning, quick experiments, free GPU access without any local setupFree T4 and A100 GPU
Poetrypoetry (PyPI with lock file)MediumProduction Python projects that need dependency lock files and clean package publishingManual CUDA install required

🎯 Key Takeaways

  • Install Python 3.12 and create a dedicated virtual environment for every project — this single habit prevents 90% of ML environment issues
  • Install libraries in order: NumPy first, then pandas, then scikit-learn, then MLflow and LLM SDKs, then deep learning last
  • Never mix TensorFlow and PyTorch in the same environment without explicit version pinning for every shared dependency
  • Pin all dependency versions with == in requirements.txt and commit it to version control — treat it as part of the model artifact
  • VS Code with the Jupyter extension is the 2026 standard development environment — configure the Python interpreter path per project in settings.json
  • For GPU setup, run nvidia-smi first, note the maximum CUDA version, then install the matching PyTorch wheel — always in that order

⚠ Common Mistakes to Avoid

    Installing TensorFlow and PyTorch in the same environment
    Symptom

    Cryptic import errors, NumPy version conflicts, or one library silently downgrading the other's dependencies. Models trained with one library may produce different numerical results after the other library modifies shared C extensions. This class of error is nearly impossible to diagnose without knowing it is a version conflict.

    Fix

    Create separate virtual environments for TensorFlow and PyTorch projects — one project, one framework, one environment. If a project genuinely requires both, pin every shared dependency version explicitly in requirements.txt, verify the combination on a clean environment, and document the constraint prominently in README.md.

    Using system Python for ML development
    Symptom

    Installing ML packages breaks macOS or Linux system tools. Permission errors require sudo. Upgrading Python for an ML project breaks system utilities that depend on the original version. pip install modifies shared packages that other system processes depend on.

    Fix

    Always use a virtual environment. Never run pip install without an active virtual environment. On macOS, the pre-installed Python is owned by the OS — treat it as read-only and install your own Python via Homebrew or pyenv.

    Not pinning dependency versions in requirements.txt
    Symptom

    Model works today and breaks next month when a library releases an update that changes a default parameter or removes a function. Different team members with different installation dates get different results. Production deployment diverges from local development silently.

    Fix

    Use pip freeze > requirements.txt with exact version pins (==) after every environment setup. Commit requirements.txt to version control. Run pip install -r requirements.txt on every new machine and in every CI pipeline run. Treat requirements.txt as part of the model artifact, not just the codebase.

    Installing Jupyter globally instead of inside the virtual environment
    Symptom

    Jupyter cannot find libraries installed in your virtual environment. Import errors appear in notebooks that work fine when you run the same code from the terminal. The wrong Python version is used by the Jupyter kernel even after activating the virtual environment.

    Fix

    Install jupyter and ipykernel inside the virtual environment. Register the environment as a named Jupyter kernel: python -m ipykernel install --user --name ml_2026 --display-name 'ML 2026'. Select this kernel explicitly in Jupyter or VS Code — never rely on the default kernel being correct.

    Skipping GPU driver and CUDA version verification before installing deep learning libraries
    Symptom

    PyTorch reports CUDA is not available despite having a GPU. Training silently falls back to CPU at 40 times slower speed with no warning. Cryptic CUDA runtime errors appear mid-training after hours of computation.

    Fix

    Always run nvidia-smi before installing any deep learning library. Note the maximum CUDA version shown, then install the matching PyTorch wheel. After installation, run torch.cuda.is_available() and confirm it returns True before starting any training run.

    Committing API keys or environment variables to version control
    Symptom

    OpenAI API keys, database credentials, or Anthropic API keys appear in .env files committed to Git. Security scanners flag the repository. Keys are exposed in public repositories, incurring unexpected API charges.

    Fix

    Add .env to .gitignore immediately when creating the project. Create a .env.example with placeholder values to document what variables are needed. Load environment variables with python-dotenv in local development. Use repository secrets or a secrets manager for CI/CD and production.

Interview Questions on This Topic

  • QHow would you set up a reproducible ML environment for a team of five developers?Mid-levelReveal
    Start with a requirements.txt using exact version pins for every dependency — pip freeze output, not manually curated. Include a setup.sh script that creates a virtual environment, installs from requirements.txt, and registers the Jupyter kernel in one command. Add a .env.example documenting every environment variable the project needs. Use Docker for production parity — a Dockerfile that starts from a pinned Python base image and installs the same requirements.txt. Add a GitHub Actions workflow that builds the Docker image and runs the test suite on every pull request so environment drift gets caught before it merges. Document the Python version, CUDA version if applicable, and any OS-specific requirements in README.md. The goal is that any developer can clone the repo and have a working environment in under ten minutes without asking anyone for help.
  • QYour model gives different results on your laptop versus your colleague's laptop. How do you debug this?Mid-levelReveal
    The systematic approach: first, compare pip freeze output from both machines and diff the results — this immediately identifies version mismatches. Focus on scikit-learn, NumPy, and PyTorch, since these are the libraries most likely to change model behavior across versions. Second, confirm random seeds are set identically for numpy, Python's random module, and the ML library — unset seeds produce different results across runs. Third, check data loading — different OS line endings, encoding assumptions, or pandas parsing behavior can alter the input data before the model sees it. Fourth, if using a GPU, verify both machines are using the same compute path — a CPU result and a GPU result for the same model can differ due to floating-point operation order. The long-term fix is Docker — if the environment definition travels with the code, this class of problem cannot occur.
  • QExplain the difference between pip and conda. When would you choose one over the other?JuniorReveal
    pip installs Python packages from PyPI and handles pure-Python dependencies well. conda is a cross-language package manager that can install Python packages, compiled C/Fortran libraries, CUDA toolkits, and system-level dependencies — things pip cannot manage. For most ML work involving pure Python libraries, pip with venv is simpler, faster, and sufficient. Conda becomes necessary when you need to manage CUDA versions alongside PyTorch or TensorFlow, when working with geospatial libraries that have complex compiled dependencies, or when you need to switch between multiple Python versions on the same machine. The critical rule: never mix pip and conda package installations in the same environment without understanding the implications — conda can overwrite pip-installed packages silently, causing confusing behavior.
  • QWhy would a model trained locally produce different predictions in production without any code changes?SeniorReveal
    This is one of the most dangerous failure modes in ML deployment, and it is almost always an environment issue. The most common cause: library version mismatch where a default parameter changed between versions. The sklearn RandomForestClassifier max_features default changed from 'auto' to 'sqrt' between version 1.1 and 1.3 — a model trained locally on 1.3 and deployed on 1.1 uses fundamentally different feature sampling without raising any error. Other causes include NumPy floating-point behavior differences across versions, different random state handling, or a preprocessing step that behaves differently due to a pandas API change. The prevention: pin all dependency versions with ==, verify versions match between environments, and run prediction tests against production-matched environments in CI before every deployment.

Frequently Asked Questions

Should I use Anaconda or pip for ML development?

For most beginners in 2026, pip with venv is the right starting point. It is simpler, faster to install, and sufficient for pure Python ML work with scikit-learn, PyTorch, and the OpenAI SDK. Anaconda provides better handling of compiled dependencies — CUDA, MKL, HDF5 — and manages multiple Python versions, but it is heavier and its solver is slower. Use pip with venv if your stack is pure Python packages from PyPI. Switch to conda if you need CUDA management, complex compiled dependencies, or multiple Python versions across projects. The rule that overrides everything else: never install packages with both pip and conda in the same environment without understanding exactly what each one is managing — the two package managers can silently conflict in ways that are very difficult to diagnose.

Do I need a GPU to learn machine learning?

No. All classical ML — scikit-learn, XGBoost, random forests, gradient boosting — runs efficiently on CPU. You only need a GPU when training deep learning models on large datasets. For learning deep learning, Google Colab provides free GPU access with T4 and A100 options, and Kaggle Notebooks provides 30 free GPU hours per week. Both require zero local setup. For production deep learning at scale, cloud GPU instances on Lambda Labs, AWS, or GCP are more practical than buying local hardware when you factor in cost per compute hour, maintenance, and the ability to scale to multi-GPU training.

How do I fix the 'No module named sklearn' error after installing scikit-learn?

This error almost always means you installed scikit-learn into a different Python environment than the one currently running. Debug it in this order: run 'which python' to see which Python is active, then run 'pip list | grep scikit-learn' to check if scikit-learn is visible. If it is not listed, you are in the wrong environment — activate the correct virtual environment first, then reinstall. If using Jupyter, the kernel may be using a different Python than your terminal. Fix it by registering the correct environment as a kernel: python -m ipykernel install --user --name ml_2026, then select that kernel in Jupyter or VS Code.

What is the difference between Jupyter Notebook and JupyterLab?

Jupyter Notebook is the original single-document browser interface for running code cells interactively. JupyterLab is the successor — it adds multiple document tabs, an integrated file browser, a terminal, and extension support in a single browser window. In 2026, VS Code with the Jupyter extension has largely superseded both for daily development. It provides notebook support plus a full IDE — IntelliSense, debugging, Git integration, and extensions — without a browser. Use VS Code for all development work. Keep JupyterLab available for situations where you need to share or present a live notebook in a browser environment without VS Code installed.

How do I make my ML project reproducible on another machine?

Four things working together: pinned dependencies in requirements.txt using pip freeze with == pins; Python version documented explicitly in README.md and in setup.sh — '3.12.3', not 'Python 3'; a setup.sh script that creates the virtual environment, installs requirements.txt, and registers the Jupyter kernel in one command; and random seeds set in every training script for numpy, Python's random module, and PyTorch. Test reproducibility by cloning the repo on a fresh machine and running only setup.sh — if you need to run anything else, your documentation is incomplete. For production-grade reproducibility, add a Dockerfile so the environment definition is version-controlled alongside the code.

Should I add LLM SDK libraries to my ML environment?

Yes, from the start. In 2026, LLM API calls — OpenAI, Anthropic, or local models via Ollama — are a standard component of ML projects, not an advanced specialty. Adding 'pip install openai anthropic python-dotenv' to your baseline environment costs nothing and makes LLM integration available when you need it. Store API keys in a .env file loaded with python-dotenv, and add .env to .gitignore immediately. The .env.example pattern — a committed file with placeholder values — documents what keys collaborators need without exposing real credentials.

🔥
Naren Founder & Author

Developer and founder of TheCodeForge. I built this site because I was tired of tutorials that explain what to type without explaining why it works. Every article here is written to make concepts actually click.

← PreviousMachine Learning Roadmap 2026 – From Complete Beginner to Job-ReadyNext →Mathematics for Machine Learning – Explained Without Tears
Forged with 🔥 at TheCodeForge.io — Where Developers Are Forged