Beginner 5 min · April 14, 2026

How to Set Up Your Machine Learning Environment in 2026 (Beginner Guide)

scikit-learn Version Mismatch - Model Accuracy Dropped 33%

Q: Should I use Anaconda or pip for ML development?

For most beginners in 2026, pip with venv is the right starting point. It is simpler, faster to install, and sufficient for pure Python ML work with scikit-learn, PyTorch, and the OpenAI SDK. Anaconda provides better handling of compiled dependencies — CUDA, MKL, HDF5 — and manages multiple Python versions, but it is heavier and its solver is slower. Use pip with venv if your stack is pure Python packages from PyPI. Switch to conda if you need CUDA management, complex compiled dependencies, or multiple Python versions across projects. The rule that overrides everything else: never install packages with both pip and conda in the same environment without understanding exactly what each one is managing — the two package managers can silently conflict in ways that are very difficult to diagnose.

Q: Do I need a GPU to learn machine learning?

No. All classical ML — scikit-learn, XGBoost, random forests, gradient boosting — runs efficiently on CPU. You only need a GPU when training deep learning models on large datasets. For learning deep learning, Google Colab provides free GPU access with T4 and A100 options, and Kaggle Notebooks provides 30 free GPU hours per week. Both require zero local setup. For production deep learning at scale, cloud GPU instances on Lambda Labs, AWS, or GCP are more practical than buying local hardware when you factor in cost per compute hour, maintenance, and the ability to scale to multi-GPU training.

Q: How do I fix the 'No module named sklearn' error after installing scikit-learn?

This error almost always means you installed scikit-learn into a different Python environment than the one currently running. Debug it in this order: run 'which python' to see which Python is active, then run 'pip list | grep scikit-learn' to check if scikit-learn is visible. If it is not listed, you are in the wrong environment — activate the correct virtual environment first, then reinstall. If using Jupyter, the kernel may be using a different Python than your terminal. Fix it by registering the correct environment as a kernel: python -m ipykernel install --user --name ml_2026, then select that kernel in Jupyter or VS Code.

Q: What is the difference between Jupyter Notebook and JupyterLab?

Jupyter Notebook is the original single-document browser interface for running code cells interactively. JupyterLab is the successor — it adds multiple document tabs, an integrated file browser, a terminal, and extension support in a single browser window. In 2026, VS Code with the Jupyter extension has largely superseded both for daily development. It provides notebook support plus a full IDE — IntelliSense, debugging, Git integration, and extensions — without a browser. Use VS Code for all development work. Keep JupyterLab available for situations where you need to share or present a live notebook in a browser environment without VS Code installed.

Q: How do I make my ML project reproducible on another machine?

Four things working together: pinned dependencies in requirements.txt using pip freeze with == pins; Python version documented explicitly in README.md and in setup.sh — '3.12.3', not 'Python 3'; a setup.sh script that creates the virtual environment, installs requirements.txt, and registers the Jupyter kernel in one command; and random seeds set in every training script for numpy, Python's random module, and PyTorch. Test reproducibility by cloning the repo on a fresh machine and running only setup.sh — if you need to run anything else, your documentation is incomplete. For production-grade reproducibility, add a Dockerfile so the environment definition is version-controlled alongside the code.

Q: Should I add LLM SDK libraries to my ML environment?

Yes, from the start. In 2026, LLM API calls — OpenAI, Anthropic, or local models via Ollama — are a standard component of ML projects, not an advanced specialty. Adding 'pip install openai anthropic python-dotenv' to your baseline environment costs nothing and makes LLM integration available when you need it. Store API keys in a .env file loaded with python-dotenv, and add .env to .gitignore immediately. The .env.example pattern — a committed file with placeholder values — documents what keys collaborators need without exposing real credentials.

scikit-learn 1.3.0 changed RandomForest's max_features from 'auto' to 'sqrt', dropping accuracy 33%.

Naren Founder & Principal Engineer

20+ years shipping production ML systems and the infrastructure behind them. Drawn from code that ran under real load.

✓ Production

production tested

July 19, 2026

last updated

2,466

articles · all by Naren

Before you start⏱ 20 min

✓Basic programming fundamentals
✓A computer with internet access
✓Willingness to follow along with examples

● Production Incident 🔎 Debug Guide ⚙ Triage Commands

⚡Quick Answer

ML environment setup requires Python 3.11 or 3.12, a package manager, an IDE, and core libraries installed in the correct order
Anaconda or pip manages dependencies — never mix both in the same project without explicit isolation
VS Code with the Jupyter extension replaces standalone Jupyter Notebook for most workflows in 2026
Performance insight: virtual environments add zero runtime overhead but prevent 90% of dependency conflicts
Production insight: environment mismatches between local and deployed code cause silent model failures — version pinning is mandatory
Biggest mistake: installing TensorFlow and PyTorch in the same environment without version pinning
2026 addition: add the openai or anthropic SDK to your environment from day one — LLM API calls are a baseline expectation in most ML roles

✦ Definition~90s read

What is How to Set Up Your Machine Learning Environment in 2026 (Beginner Guide)?

This article addresses a specific, painful failure mode in machine learning: a 33% accuracy drop caused by a scikit-learn version mismatch. The core problem is that scikit-learn, like many ML libraries, does not guarantee serialization compatibility across minor or patch versions.

★

Setting up an ML environment is like setting up a professional kitchen before cooking.

A model trained and pickled with scikit-learn 0.24 will silently produce different predictions when loaded with 0.23 or 0.25, often without any error. This is not a bug—it's a documented behavior that catches teams who treat ML dependencies like generic Python packages.

The fix is not just 'install the right version' but a systematic approach to environment management that prevents this class of failure from the start.

The article walks through a complete, reproducible ML development environment setup that eliminates version drift. It covers Python installation, virtual environments (venv or conda), pinning exact library versions in requirements.txt, and configuring VS Code to use that environment.

For deep learning, it includes GPU setup with CUDA and cuDNN version matching. The goal is to make your ML environment deterministic: the same code, same data, same libraries produce the same results on any machine. This is table stakes for production ML, but many tutorials skip it, leading to the kind of accuracy regression that wastes days of debugging.

Plain-English First

Setting up an ML environment is like setting up a professional kitchen before cooking. You need the right tools (Python, libraries), the right workspace (IDE), and everything organized so ingredients from one dish do not contaminate another (virtual environments). Skip the organization step and you will spend more time fighting installation errors than building models. This guide walks through every step in tested sequence — from zero to a working, reproducible ML environment that matches what professional teams use in 2026.

ML environment setup is the first barrier that stops most beginners — and it is entirely avoidable with the right sequence. Dependency conflicts between TensorFlow, PyTorch, and scikit-learn create cryptic errors that derail learning momentum at the worst possible moment. The core problem is not complexity — it is sequencing and isolation. Installing tools in the wrong order or mixing package managers creates conflicts that take hours to diagnose and are nearly impossible to trace without experience. This guide provides a tested installation sequence for 2026 that avoids the common pitfalls. Every step produces a verifiable output so you know exactly where something breaks before it becomes a three-hour debugging session. The environment you build here will support classical ML, deep learning, and LLM API integration — the three layers of a complete 2026 ML workflow.

Why scikit-learn Version Pinning Is Not Optional

Setting up a machine learning environment means creating a reproducible, isolated runtime where model training and inference behave deterministically. The core mechanic is dependency locking: every library — scikit-learn, numpy, pandas — must be pinned to exact versions, not ranges. A minor version bump in scikit-learn can change default parameters, alter random seed behavior, or deprecate preprocessing methods silently.

In practice, this works through virtual environments (conda, venv) and lock files (requirements.txt, environment.yml). The critical property is that model serialization (pickle, joblib) embeds the library version. Loading a model trained with scikit-learn 0.24 into 1.0 can silently reinterpret internal data structures, producing different predictions. The accuracy drop you see is often not model decay — it's a version mismatch corrupting the decision path.

Use strict version pinning in any system where models are trained once and deployed elsewhere — CI/CD pipelines, production inference services, or team collaborations. Without it, a 'pip install --upgrade' on deployment day can regress accuracy by 33% overnight, and you'll waste days debugging data drift when the real culprit is a changed default in train_test_split or LogisticRegression.

⚠ Pickle Is Not Portable

A model serialized with scikit-learn 0.24 may load silently in 1.0 but produce different predictions — always train and serve with identical library versions.

📊 Production Insight

A team deployed a fraud model trained in scikit-learn 0.22 into a container with 0.24. Accuracy dropped 33% because the default solver for LogisticRegression changed from 'liblinear' to 'lbfgs', altering coefficient paths.

Symptom: model accuracy degrades immediately after deployment with no data drift, and retraining on the same data does not recover the original accuracy.

Rule of thumb: pin every dependency to the exact minor version in both training and serving environments, and validate that model predictions match byte-for-byte on a fixed test input.

🎯 Key Takeaway

A model is only as reproducible as its dependency versions — pin everything, including transitive dependencies.

Never assume backward compatibility in scikit-learn; minor version bumps can silently change default parameters.

Always validate model predictions after any environment change using a fixed test input and a known-good baseline.

thecodeforge.io

Setup Machine Learning Environment

Step 1: Install Python

Python is the foundation of every ML environment. In 2026, Python 3.11 and 3.12 are the stable targets for ML work. Python 3.11 improved interpreter performance by 10 to 60 percent over 3.10 and has broad library compatibility. Python 3.12 is the current release with full support from NumPy, pandas, PyTorch 2.x, and the OpenAI SDK. Avoid Python 3.13 for ML work in 2026 — some compiled ML libraries lag by one to two minor versions. Never use the system Python that ships with macOS or Linux for ML development — it exists for the operating system, not you.

install_python.shBASH

# macOS — install via Homebrew (recommended)
brew install python@3.12
# Verify
python3.12 --version
# Expected: Python 3.12.x

# macOS/Linux — alternatively install via pyenv for multi-version management
curl https://pyenv.run | bash
# Add to shell profile (~/.zshrc or ~/.bashrc):
# export PATH="$HOME/.pyenv/bin:$PATH"
# eval "$(pyenv init -)"
pyenv install 3.12.3
pyenv global 3.12.3
python --version
# Expected: Python 3.12.3

# Windows — download from python.org/downloads
# Check 'Add Python to PATH' during installation
# Verify in PowerShell:
python --version
# Expected: Python 3.12.x

# Verify pip is installed and up to date
python3.12 -m pip install --upgrade pip
pip --version
# Expected: pip 24.x from .../python3.12/site-packages/pip

Output

Python 3.12.3

pip 24.0 from /usr/local/lib/python3.12/site-packages/pip (python 3.12)

⚠ Which Python Version to Install in 2026

📊 Production Insight

Python version mismatches cause the same class of silent failures as library version mismatches.

Document the exact Python version in README.md and your setup script — not just 'Python 3'.

pyenv makes switching between Python versions per-project trivial and is worth installing from day one.

🎯 Key Takeaway

Python 3.12 is the right target for ML work in 2026.

Never use system Python for ML development — it exists for the OS, not you.

pyenv is the cleanest way to manage multiple Python versions across projects.

Python Installation Method Selection

IfmacOS, single Python version needed

→

UseInstall via Homebrew: brew install python@3.12 — simplest path, keeps system Python untouched

IfmacOS or Linux, need multiple Python versions across projects

→

UseInstall pyenv first, then install Python versions through pyenv — version switching becomes one command

IfWindows

→

UseDownload from python.org, check 'Add to PATH', verify in PowerShell — straightforward if you follow the checklist

IfTeam environment with strict version requirements

→

UseUse Docker with FROM python:3.12-slim — guarantees every team member runs identical Python without manual installation

Step 2: Create a Virtual Environment

Virtual environments isolate project dependencies so different projects can use different library versions without conflicts. This is not optional — it is the single step that prevents 90% of the dependency errors that derail beginners. Every ML project gets its own environment. The two standard tools are venv (built into Python, no install required) and conda (from Anaconda or Miniconda, better for managing compiled dependencies like CUDA). For most beginners, venv with pip is the right starting point. For teams managing GPU drivers, CUDA versions, and complex compiled dependencies across operating systems, conda provides better control. In 2026, a third option has become practical for teams: container-first development using Docker, where the environment definition lives in a Dockerfile and every developer runs the same container.

create_virtual_env.shBASH

# Option A: venv (built into Python — recommended for most users)
# Create a named environment directory outside the project so it is never committed
python3.12 -m venv ~/ml_envs/ml_2026

# Activate on macOS/Linux
source ~/ml_envs/ml_2026/bin/activate

# Activate on Windows (PowerShell)
# ~/ml_envs/ml_2026/Scripts/Activate.ps1

# Verify you are in the virtual environment — path must point to ml_2026
which python
# Expected: ~/ml_envs/ml_2026/bin/python

which pip
# Expected: ~/ml_envs/ml_2026/bin/pip

# Upgrade pip, setuptools, and wheel inside the environment before installing anything else
pip install --upgrade pip setuptools wheel

# When you are done working
deactivate

# ---

# Option B: conda (from Anaconda or Miniconda)
# Create conda environment with Python version pinned
conda create -n ml_2026 python=3.12 -y

# Activate
conda activate ml_2026

# Verify — path must point to the conda environment
which python
conda list | head -20

# Deactivate
conda deactivate

# ---

# Add this to .gitignore to ensure the environment is never committed
echo 'ml_env/' >> .gitignore
echo '__pycache__/' >> .gitignore
echo '*.pyc' >> .gitignore
echo '.env' >> .gitignore

Output

(ml_2026) $ which python

/Users/username/ml_envs/ml_2026/bin/python

(ml_2026) $ pip --version

pip 24.0 from /Users/username/ml_envs/ml_2026/lib/python3.12/site-packages/pip (python 3.12)

(ml_2026) $ python --version

Python 3.12.3

⚠ Virtual Environment Rules — Non-Negotiable

📊 Production Insight

Dependency conflicts are the single largest source of ML environment issues for beginners and experienced engineers alike.

Virtual environments eliminate the conflict by design — isolation is cheaper than debugging.

Without isolation, installing TensorFlow frequently silently downgrades NumPy in a way that breaks scikit-learn imports with no obvious error message.

🎯 Key Takeaway

Virtual environments are mandatory for ML development — no exceptions.

One project equals one environment equals no dependency conflicts.

venv for simplicity, conda for compiled dependencies, Docker for team reproducibility.

Virtual Environment Tool Selection

IfBeginner, single Python version, pip packages only

→

UseUse venv — it is built into Python 3.12, requires no installation, and covers 95% of ML use cases

IfNeed multiple Python versions simultaneously or non-Python compiled dependencies

→

UseUse conda — it manages Python versions and compiled C/Fortran dependencies that pip cannot handle cleanly

IfTeam project with strict reproducibility requirements

→

UseUse Docker with a pinned requirements.txt — this is the only approach that guarantees byte-for-byte environment consistency

IfWorking with Jupyter notebooks across multiple projects

→

UseUse venv + ipykernel to register each environment as a separate Jupyter kernel — one kernel per environment

thecodeforge.io

Setup Machine Learning Environment

Step 3: Install Core ML Libraries

Core ML libraries form the foundation of every project. Install them in a specific order to avoid dependency conflicts — this sequence has been tested against the 2026 library release landscape. NumPy must be installed first because every other scientific Python library links against it at compile time. Then pandas for data manipulation, matplotlib and seaborn for visualization, scikit-learn for classical ML algorithms, and Jupyter support. Deep learning libraries come last and ideally live in their own environment. In 2026, add the openai SDK or anthropic SDK to your baseline environment — LLM API calls are now a standard component of production ML pipelines, not an advanced specialty skill. Add MLflow for experiment tracking from the start rather than retrofitting it later.

install_core_libraries.shBASH

# Always activate your virtual environment first — verify with 'which python'
source ~/ml_envs/ml_2026/bin/activate

# Step 1: Upgrade pip before installing anything
pip install --upgrade pip setuptools wheel

# Step 2: Install core data science stack — order matters
pip install numpy==1.26.4
pip install pandas==2.2.2
pip install matplotlib==3.9.0
pip install seaborn==0.13.2

# Step 3: Install scikit-learn and gradient boosting libraries
pip install scikit-learn==1.5.0
pip install xgboost==2.0.3
pip install lightgbm==4.3.0

# Step 4: Install Jupyter support
pip install jupyter==1.0.0 ipykernel==6.29.4
# Register this environment as a Jupyter kernel
python -m ipykernel install --user --name ml_2026 --display-name "ML 2026 (Python 3.12)"

# Step 5: Install experiment tracking
pip install mlflow==2.13.0

# Step 6: Install LLM API SDKs — baseline in 2026
pip install openai==1.30.1
pip install anthropic==0.28.0
pip install python-dotenv==1.0.1

# Step 7: Install ONE deep learning library
# Option A: PyTorch — recommended for beginners and researchers in 2026
# CPU-only version (fast to install, no GPU required for learning)
pip install torch==2.3.0 torchvision==0.18.0 torchaudio==2.3.0 --index-url https://download.pytorch.org/whl/cpu
# GPU version — get the exact command from pytorch.org/get-started/locally
# pip install torch==2.3.0 torchvision==0.18.0 --index-url https://download.pytorch.org/whl/cu121

# Option B: TensorFlow — if your team uses it
# pip install tensorflow==2.16.1

# Do not install both in the same environment without explicit version pinning and testing

# Step 8: Verify every import — catch silent failures before they surface in a notebook
python -c "
import numpy as np; print(f'NumPy {np.__version__}')
import pandas as pd; print(f'pandas {pd.__version__}')
import sklearn; print(f'scikit-learn {sklearn.__version__}')
import matplotlib; print(f'matplotlib {matplotlib.__version__}')
import xgboost; print(f'XGBoost {xgboost.__version__}')
import mlflow; print(f'MLflow {mlflow.__version__}')
import openai; print(f'openai {openai.__version__}')
import torch; print(f'PyTorch {torch.__version__}')
print('All imports successful')
"

# Step 9: Freeze requirements with exact version pins
pip freeze > requirements.txt
echo "requirements.txt generated with $(wc -l < requirements.txt) packages"

Output

NumPy 1.26.4

pandas 2.2.2

scikit-learn 1.5.0

matplotlib 3.9.0

XGBoost 2.0.3

MLflow 2.13.0

openai 1.30.1

PyTorch 2.3.0+cpu

All imports successful

requirements.txt generated with 87 packages

⚠ Installation Order and Version Pinning

📊 Production Insight

Installing deep learning libraries before NumPy frequently silently downgrades NumPy to an older version that conflicts with scikit-learn.

TensorFlow and PyTorch have historically required conflicting NumPy version ranges — check the current compatibility matrix at pytorch.org and tensorflow.org before installing both.

Always verify imports after installation. Silent failures during install complete without error but raise ImportError at runtime, often minutes into a training run.

MLflow takes two minutes to install and saves hours when you need to compare model versions. Install it on day one.

🎯 Key Takeaway

Install in order: NumPy, pandas, visualization, scikit-learn, XGBoost, Jupyter, MLflow, LLM SDKs, then deep learning last.

Pin every version with == in requirements.txt — this single habit prevents the most common class of production environment failures.

Add openai and mlflow to your baseline environment — they are part of the 2026 ML stack, not advanced add-ons.

Step 4: Configure VS Code for ML Development

VS Code with the Jupyter extension has replaced standalone Jupyter Notebook as the standard ML development environment in 2026. It gives you IntelliSense, inline type checking, debugging with breakpoints inside notebook cells, Git integration, and notebook support in a single editor — with none of the browser tab management overhead of classic Jupyter. The critical configuration is selecting the correct Python interpreter from your virtual environment. Get this wrong and every import will fail with ModuleNotFoundError while the library is sitting correctly installed in a different environment. Configure settings.json per project rather than globally so team members get consistent behavior automatically.

vscode_settings.jsonJSON

{
  "python.defaultInterpreterPath": "~/ml_envs/ml_2026/bin/python",

  "jupyter.askForKernelRestart": false,
  "jupyter.alwaysTrustNotebooks": true,
  "notebook.cellToolbarLocation": {
    "default": "right",
    "jupyter-notebook": "left"
  },
  "notebook.output.scrolling": true,
  "notebook.cellExecutionTimeout": 600000,

  "[python]": {
    "editor.defaultFormatter": "ms-python.black-formatter",
    "editor.formatOnSave": true,
    "editor.codeActionsOnSave": {
      "source.organizeImports": "explicit"
    }
  },

  "python.analysis.typeCheckingMode": "basic",
  "python.analysis.autoImportCompletions": true,

  "files.exclude": {
    "**/__pycache__": true,
    "**/*.pyc": true,
    "**/.ipynb_checkpoints": true,
    "**/ml_env": true
  },
  "search.exclude": {
    "**/data/**/*.csv": true,
    "**/data/**/*.parquet": true,
    "**/*.pkl": true,
    "**/*.pt": true
  },

  "git.ignoreLimitWarning": true,
  "editor.rulers": [88],
  "editor.tabSize": 4
}

💡Essential VS Code Extensions for ML in 2026

Python (ms-python.python) — core language support, interpreter selection, and test runner integration
Jupyter (ms-toolsai.jupyter) — notebook support with variable explorer and cell-level debugging
Black Formatter (ms-python.black-formatter) — automatic formatting on save, consistent style across teams
Pylance (ms-python.vscode-pylance) — fast IntelliSense, import resolution, and type checking powered by Pyright
GitLens — commit history and blame annotations per line, essential for tracking when a model change was introduced
Thunder Client — lightweight REST client for testing your FastAPI prediction endpoints without leaving VS Code

📊 Production Insight

Selecting the wrong Python interpreter is the single most common cause of ModuleNotFoundError in VS Code — the library is installed correctly, but VS Code is using system Python.

Always set python.defaultInterpreterPath in the project-level .vscode/settings.json, not just through the status bar selector — the status bar selection does not persist for teammates who clone the repo.

FormatOnSave with Black takes zero effort and eliminates style debates in code review.

🎯 Key Takeaway

VS Code with the Jupyter extension is the 2026 standard — faster, more debuggable, and better integrated than standalone Jupyter Notebook.

Set the Python interpreter path in settings.json per project — relying on the status bar selector breaks when teammates clone the repo.

Install all six extensions before starting any project — they pay for themselves within the first hour.

Step 5: GPU Setup for Deep Learning

GPU acceleration reduces deep learning training time from hours to minutes for medium-sized models and from days to hours for large ones. NVIDIA GPUs with CUDA support are required for both PyTorch and TensorFlow. The setup requires three components installed in a specific order: NVIDIA driver, CUDA toolkit, and cuDNN library. Version compatibility between all three is critical — mismatched versions produce cryptic CUDA errors or, worse, silent CPU fallback where training appears to work but runs 40 times slower without any warning. If you do not have an NVIDIA GPU, skip local GPU setup entirely and use Google Colab or Kaggle Notebooks — both provide free GPU access sufficient for learning and small projects.

gpu_setup_verify.shBASH

# Step 1: Verify NVIDIA GPU is detected by the system
lspci | grep -i nvidia
# On Windows: Device Manager > Display Adapters

# Step 2: Check installed NVIDIA driver and supported CUDA version
nvidia-smi
# Top-right corner shows maximum supported CUDA version
# Example: CUDA Version: 12.4 means your driver supports CUDA up to 12.4

# Step 3: Match PyTorch CUDA build to your driver's supported CUDA version
# Get the exact install command from: https://pytorch.org/get-started/locally/
# Example for CUDA 12.1:
pip install torch==2.3.0 torchvision==0.18.0 --index-url https://download.pytorch.org/whl/cu121

# Step 4: Verify CUDA is detected by PyTorch
python -c "
import torch
print(f'PyTorch version: {torch.__version__}')
print(f'CUDA available: {torch.cuda.is_available()}')
if torch.cuda.is_available():
    print(f'CUDA version: {torch.version.cuda}')
    print(f'GPU count: {torch.cuda.device_count()}')
    print(f'GPU name: {torch.cuda.get_device_name(0)}')
    print(f'GPU memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB')
else:
    print('No CUDA GPU detected — running on CPU')
    print('Tip: install the CPU-only PyTorch build if you do not have a GPU')
"

# Step 5: Benchmark CPU vs GPU to confirm GPU is being used
python -c "
import torch
import time

size = 4096
a_cpu = torch.randn(size, size)
b_cpu = torch.randn(size, size)

start = time.time()
_ = torch.matmul(a_cpu, b_cpu)
cpu_time = time.time() - start
print(f'CPU matrix multiply ({size}x{size}): {cpu_time:.3f}s')

if torch.cuda.is_available():
    a_gpu = a_cpu.cuda()
    b_gpu = b_cpu.cuda()
    # Warm up the GPU
    torch.matmul(a_gpu, b_gpu)
    torch.cuda.synchronize()
    start = time.time()
    _ = torch.matmul(a_gpu, b_gpu)
    torch.cuda.synchronize()
    gpu_time = time.time() - start
    print(f'GPU matrix multiply ({size}x{size}): {gpu_time:.3f}s')
    print(f'Speedup: {cpu_time / gpu_time:.1f}x')
else:
    print('GPU not available — using CPU only')
    print('For learning, Google Colab provides free GPU: colab.research.google.com')
"

Output

PyTorch version: 2.3.0+cu121

CUDA available: True

CUDA version: 12.1

GPU count: 1

GPU name: NVIDIA RTX 4090

GPU memory: 24.6 GB

CPU matrix multiply (4096x4096): 4.823s

GPU matrix multiply (4096x4096): 0.119s

Speedup: 40.5x

⚠ GPU Version Compatibility Matrix — Check Before Installing

📊 Production Insight

GPU setup is the most frustrating part of ML environment setup because the error messages are cryptic and version mismatches look identical to missing drivers.

The correct debugging sequence is always: driver first (nvidia-smi), then CUDA version, then PyTorch wheel — never backwards.

For learning and small projects, Google Colab and Kaggle Notebooks eliminate local GPU setup entirely and provide free access to T4 and A100 GPUs.

For production training on large models, cloud GPU instances on Lambda Labs, AWS, or GCP are more cost-effective than local RTX cards when you factor in electricity and downtime.

🎯 Key Takeaway

GPU setup requires driver, CUDA, and PyTorch wheel version alignment — check nvidia-smi before installing anything.

For learning, skip local GPU setup and use Google Colab or Kaggle — the friction is not worth it until you need it.

The 40x speedup from a GPU only matters for deep learning — classical ML on CPU is fast enough for most projects.

GPU Setup Strategy by Situation

IfNo NVIDIA GPU on your machine and you are learning

→

UseUse Google Colab (free T4 GPU) or Kaggle Notebooks (free 30 hours per week) — zero local setup, start training in under 5 minutes

IfHave NVIDIA GPU but want the simplest possible CUDA setup

→

UseUse conda to install PyTorch — conda resolves CUDA dependencies automatically based on your driver version

IfHave NVIDIA GPU and need full control over CUDA version

→

UseInstall CUDA toolkit manually matching your driver, then install the matching PyTorch wheel from pytorch.org

IfTraining models larger than 10B parameters or need multi-GPU

→

UseUse cloud GPU instances (AWS p4d, GCP A100, Lambda Labs) — local hardware is impractical at this scale

Step 6: Project Structure and Reproducibility

A well-structured ML project prevents confusion as it grows from one notebook to ten files to a deployed API. Every project needs a standard directory layout, a pinned requirements.txt, a README with setup instructions, and version control. Reproducibility means another developer — or future you six months from now — can clone the repo, run one setup command, and get identical results. This requires four things working together: pinned dependencies, documented Python version, deterministic random seeds, and a setup script that does not require tribal knowledge. In 2026, add a .env.example file to show collaborators what environment variables the project needs without committing actual API keys, and add a pre-commit configuration to enforce formatting and prevent secrets from being committed accidentally.

project_structure.txtTEXT

my_ml_project/
├── README.md                    # Problem statement, setup instructions, results
├── requirements.txt             # Pinned dependencies — pip freeze output
├── setup.sh                     # One-command environment setup script
├── .env.example                 # Template for required environment variables (no real keys)
├── .gitignore                   # Excludes: data/, models/, .env, __pycache__, *.pkl, *.pt
├── .pre-commit-config.yaml      # Black formatting + detect-secrets hook
├── .vscode/
│   └── settings.json            # Project-level VS Code configuration
├── data/
│   ├── raw/                     # Original unmodified source data — never edit these
│   ├── processed/               # Cleaned and transformed data ready for modeling
│   └── .gitkeep                 # Preserves directory structure in Git without committing data
├── notebooks/
│   ├── 01_eda.ipynb             # Exploratory data analysis
│   ├── 02_feature_engineering.ipynb
│   └── 03_modeling.ipynb        # Training, evaluation, model selection
├── src/
│   ├── __init__.py
│   ├── data/
│   │   ├── __init__.py
│   │   ├── load.py              # Data loading — reads from data/raw/
│   │   └── preprocess.py        # Cleaning and transformation pipeline
│   ├── features/
│   │   ├── __init__.py
│   │   └── build_features.py    # Feature engineering functions
│   ├── models/
│   │   ├── __init__.py
│   │   ├── train.py             # Training script with MLflow logging
│   │   └── predict.py           # Inference logic — used by API and tests
│   └── visualization/
│       ├── __init__.py
│       └── visualize.py
├── models/                      # Saved model artifacts — tracked with DVC, not Git
│   └── .gitkeep
├── tests/
│   ├── test_data.py             # Validate data loading and preprocessing
│   ├── test_features.py
│   └── test_models.py           # Smoke tests for prediction output shape and type
├── api/
│   ├── app.py                   # FastAPI prediction endpoint
│   ├── Dockerfile               # Container definition for deployment
│   └── docker-compose.yml       # Local API + MLflow server orchestration
└── mlruns/                      # MLflow experiment tracking (add to .gitignore for large teams)

💡Reproducibility Checklist for 2026

Pin all dependency versions with == in requirements.txt — not >= or ~=
Document Python version in README.md and in setup.sh — 'Python 3' is not specific enough
Set random seeds for numpy, Python random module, and PyTorch at the top of every training script
Add .env.example to show collaborators required environment variables — never commit .env
Include a setup.sh that recreates the environment in one command — test it on a clean machine
Track large model artifacts with DVC, not Git — repositories with pickle files in version control are painful to work with

📊 Production Insight

Unpinned dependencies break reproducibility within weeks on active projects — a teammate's pip install updates a transitive dependency and suddenly your model outputs differ.

A tested setup.sh saves hours of environment troubleshooting for every new team member and every CI run.

Do not commit .env files — use .env.example to document what variables are needed, and load them with python-dotenv in local development.

Model artifacts belong in DVC or cloud object storage, not Git. A 200MB pickle file in Git history makes every clone painful forever.

🎯 Key Takeaway

Standard project structure prevents the 'works on my machine' problem as projects grow.

Pin dependencies, document Python version, include a tested setup script, and use .env.example for API keys.

Reproducibility is an engineering requirement in 2026 — not a nice-to-have that you add later.

Stop Using pip Install Blindly. Start With `requirements.txt`

You just watched a teammate spend three hours debugging a model that worked yesterday. The cause? A transitive dependency got auto-upgraded. That is a production incident waiting to happen. Every ML project is a dependency minefield — NumPy, SciPy, scikit-learn, TensorFlow, PyTorch, each with its own C extension hell. Pip does not track these. It will happily install the latest compatible version of a sub-dependency, and suddenly your XGBoost segfaults. The fix is brutal but simple: pin your entire environment, not just the top-level packages. Use pip freeze > requirements.txt the moment your model trains successfully. Then commit that file. Better yet, use pip-tools to compile a requirements.in into a locked requirements.txt with all transitive hashes. Never trust an unpinned environment in production — it's not a question of if it breaks, but when.

freeze_env.shBASH

#!/usr/bin/env bash
# io.thecodeforge
# Freeze the active environment into a reproducible lock file
pip freeze > requirements.txt
echo "Frozen $(wc -l < requirements.txt) packages to requirements.txt"

# Verify no uninstalled packages exist
pip check || echo "WARNING: Dependency conflict detected!"

Output

Frozen 47 packages to requirements.txt

All dependencies are compatible.

⚠ Production Trap:

Don't just pin scikit-learn. Run 'pip freeze' immediately after a successful training run. Transitive deps like joblib or threadpoolctl can silently break your model serialization.

🎯 Key Takeaway

Freeze your entire environment with pip freeze > requirements.txt after every successful model run — your future self will thank you.

Your Data Pipeline Needs Validation Before Training

Most ML setups focus on model code. They ignore the data pipeline. That's how you train a model on corrupted data, deploy it, and only find out three weeks later when performance tanks. The data loading step is where the silent killers live: missing values, type mismatches, out-of-range values, label leakage. Before you ever call model.fit(), add validation gates. Pandera or Great Expectations can enforce a schema on your DataFrames. Validate column types, null proportions, and value ranges. If you are reading from Parquet or CSV, verify the file hash matches a known good version. This is not academic — I have seen a production model drift by 15% because a CSV column header got accidentally renamed. Your training pipeline should crash hard on bad data, not silently absorb it. That is the mark of a mature setup: the computer says no until you prove your data is clean.

validate_data.pyPYTHON

import pandera as pa
import pandas as pd

# io.thecodeforge - Data schema enforcement for ML pipelines
schema = pa.DataFrameSchema(
    columns={
        "feature_a": pa.Column(int, pa.Check.in_range(0, 100)),
        "feature_b": pa.Column(float, pa.Check.greater_than(0.0)),
        "label": pa.Column(int, pa.Check.isin([0, 1]))
    },
    coerce=True  # cast types; fail on invalid
)

df = pd.read_parquet("training_data.parquet")
try:
    validated_df = schema.validate(df)
    print(f"Validated {len(validated_df)} rows — ready to train.")
except pa.errors.SchemaError as e:
    raise RuntimeError(f"Data validation failed: {e}") from e

Output

Validated 10000 rows — ready to train.

💡Senior Dev Move:

Validate data before the training script. A 100ms schema check now saves hours of post-mortems later. I keep a standalone 'validate_dataset.py' that I run in CI.

🎯 Key Takeaway

Validate your data schema and value ranges before training — never trust raw input to your ML pipeline.

● Production incidentPOST-MORTEMseverity: high

Model Gives Different Results on Developer Laptop vs Production Server

Symptom

Model accuracy dropped from 94% to 61% immediately after deployment. No code changes between local and production. Same dataset, same algorithm, same random seed. The model had passed all local tests.

Assumption

The team assumed environment differences only affected installation speed and error messages — not model behavior. They tracked code versions in Git but never verified library versions across environments.

Root cause

Local environment used scikit-learn 1.3.0 while production used 1.1.0. The RandomForestClassifier default for max_features changed between these versions from 'auto' to 'sqrt'. This silently altered the model's feature sampling strategy on every tree, producing a fundamentally different model — no import errors, no warnings, no indication anything was wrong until predictions landed in production.

Fix

1. Added requirements.txt with pinned versions using == for every dependency 2. Added an environment verification script that checks library versions at application startup and fails loudly if versions do not match expected 3. Implemented a CI pipeline step that runs model evaluation tests against a Docker image matching the production environment exactly 4. Replaced ad-hoc deployment with a Docker container that carries the environment definition with it — local and production are now guaranteed identical

Key lesson

Always pin dependency versions in requirements.txt using == — not >= or ~=
Library version mismatches change model behavior, not just installation behavior — this is the dangerous case
Verify library versions match between local and production before every deployment, not after something breaks
Docker is the only reliable way to guarantee environment consistency across machines and teammates

Production debug guideSymptom to action mapping for common setup issues7 entries

Symptom · 01

ImportError: DLL load failed when importing tensorflow on Windows

→

Fix

Install Microsoft Visual C++ Redistributable 2019 or later from the Microsoft download page. Verify your Python version matches the TensorFlow wheel — TensorFlow 2.16+ requires Python 3.10 to 3.12 on Windows. Mismatched Python versions produce this error silently even when the install appears to succeed.

Symptom · 02

ModuleNotFoundError for a library you just installed

→

Fix

You installed into a different environment than the one you are running. Run 'which python' on macOS/Linux or 'where python' on Windows, then 'pip list' to confirm the active environment contains the library. The fix is always: activate the correct environment first, then install.

Symptom · 03

Jupyter kernel crashes when importing torch or tensorflow

→

Fix

Jupyter is using a different Python than your virtual environment. Register the virtual environment as a Jupyter kernel: python -m ipykernel install --user --name ml_env --display-name 'ML 2026'. Then restart Jupyter and select the new kernel from the kernel menu.

Symptom · 04

CUDA not available error when running PyTorch on an NVIDIA GPU

→

Fix

Run nvidia-smi and note the CUDA version shown in the top-right corner — this is the maximum CUDA version your driver supports. Install the matching PyTorch build from pytorch.org using the official install selector. Installing PyTorch without specifying the CUDA build installs the CPU-only version by default and produces exactly this error.

Symptom · 05

pip install takes forever or fails with connection timeout

→

Fix

Increase the timeout threshold: pip install --timeout 300 package_name. If behind a corporate proxy, set HTTP_PROXY and HTTPS_PROXY environment variables. As a last resort, use an alternative mirror: pip install -i https://pypi.tuna.tsinghua.edu.cn/simple package_name.

Symptom · 06

openai or anthropic SDK imports fail after installation

→

Fix

Confirm you are in the correct virtual environment with 'pip list | grep openai'. If present but still failing, check that your OPENAI_API_KEY environment variable is set — the SDK validates the key at import time in some versions. Use python-dotenv to load a .env file rather than hardcoding keys in source files.

Symptom · 07

numpy version conflict error after installing torch or tensorflow

→

Fix

Deep learning libraries frequently require a specific NumPy range. Check the error message for the required version range, then pin NumPy explicitly: pip install 'numpy>=1.24,<2.0'. If the conflict persists, recreate the environment from scratch and install in the order specified in Step 3 of this guide.

★ ML Environment Setup Quick ReferenceImmediate commands for environment setup, verification, and debugging in 2026

Need to verify Python and library versions across the full stack−

Immediate action

Run version check commands for all core libraries in one pass

Commands

python -c "import sys; print(f'Python {sys.version}'); import numpy; print(f'NumPy {numpy.__version__}'); import sklearn; print(f'scikit-learn {sklearn.__version__}'); import torch; print(f'PyTorch {torch.__version__}')"

Fix now

If versions do not match expected, do not patch in place — recreate the virtual environment from requirements.txt to avoid compounding the mismatch

Need to check if GPU is available and performing correctly+

Need to freeze current environment for reproducibility+

Need to recreate environment from requirements.txt on a new machine+

ML Environment Tools Comparison

Tool	Package Manager	Complexity	Best For	GPU Support
venv + pip	pip (PyPI)	Low	Individual projects and beginners — simplest path from zero to working	Manual CUDA install required
Anaconda	conda (defaults + conda-forge)	Medium	Data science teams managing compiled dependencies and multiple Python versions	conda resolves CUDA automatically
Miniconda	conda (minimal install)	Medium	Experienced users who want conda's dependency resolution without the 3GB Anaconda base install	conda resolves CUDA automatically
Docker	pip inside container	High	Team reproducibility and production deployment — the only approach that guarantees identical environments	NVIDIA Container Toolkit required
Google Colab	pip (pre-installed stack)	Very Low	Learning, quick experiments, free GPU access without any local setup	Free T4 and A100 GPU
Poetry	poetry (PyPI with lock file)	Medium	Production Python projects that need dependency lock files and clean package publishing	Manual CUDA install required

⚙ Quick Reference

8 commands from this guide

File	Command / Code	Purpose
install_python.sh	brew install python@3.12	Step 1
create_virtual_env.sh	python3.12 -m venv ~/ml_envs/ml_2026	Step 2
install_core_libraries.sh	source ~/ml_envs/ml_2026/bin/activate	Step 3
vscode_settings.json	{	Step 4
gpu_setup_verify.sh	lspci \| grep -i nvidia	Step 5
project_structure.txt	my_ml_project/	Step 6
freeze_env.sh	pip freeze > requirements.txt	Stop Using pip Install Blindly. Start With `requirements.txt
validate_data.py	schema = pa.DataFrameSchema(	Your Data Pipeline Needs Validation Before Training

Key takeaways

Install Python 3.12 and create a dedicated virtual environment for every project

this single habit prevents 90% of ML environment issues

Install libraries in order

NumPy first, then pandas, then scikit-learn, then MLflow and LLM SDKs, then deep learning last

Never mix TensorFlow and PyTorch in the same environment without explicit version pinning for every shared dependency

Pin all dependency versions with == in requirements.txt and commit it to version control

treat it as part of the model artifact

VS Code with the Jupyter extension is the 2026 standard development environment

configure the Python interpreter path per project in settings.json

For GPU setup, run nvidia-smi first, note the maximum CUDA version, then install the matching PyTorch wheel

always in that order

INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01SENIOR

How would you set up a reproducible ML environment for a team of five de...

Q02SENIOR

Your model gives different results on your laptop versus your colleague'...

Q03JUNIOR

Explain the difference between pip and conda. When would you choose one ...

Q04SENIOR

Why would a model trained locally produce different predictions in produ...

Q01 of 04SENIOR

How would you set up a reproducible ML environment for a team of five developers?

ANSWER

Start with a requirements.txt using exact version pins for every dependency — pip freeze output, not manually curated. Include a setup.sh script that creates a virtual environment, installs from requirements.txt, and registers the Jupyter kernel in one command. Add a .env.example documenting every environment variable the project needs. Use Docker for production parity — a Dockerfile that starts from a pinned Python base image and installs the same requirements.txt. Add a GitHub Actions workflow that builds the Docker image and runs the test suite on every pull request so environment drift gets caught before it merges. Document the Python version, CUDA version if applicable, and any OS-specific requirements in README.md. The goal is that any developer can clone the repo and have a working environment in under ten minutes without asking anyone for help.

FAQ · 6 QUESTIONS

Frequently Asked Questions

Should I use Anaconda or pip for ML development?

Do I need a GPU to learn machine learning?

How do I fix the 'No module named sklearn' error after installing scikit-learn?

What is the difference between Jupyter Notebook and JupyterLab?

How do I make my ML project reproducible on another machine?

Should I add LLM SDK libraries to my ML environment?

Naren Founder & Principal Engineer

20+ years shipping production ML systems and the infrastructure behind them. Drawn from code that ran under real load.

✓ Verified

production tested

July 19, 2026

last updated

2,466

articles · all by Naren

🔥

That's ML Basics. Mark it forged?

5 min read · try the examples if you haven't