Senior 4 min · April 14, 2026

scikit-learn Version Mismatch - Model Accuracy Dropped 33%

scikit-learn 1.

N
Naren · Founder
Plain-English first. Then code. Then the interview question.
About
 ● Production Incident 🔎 Debug Guide
Quick Answer
  • ML environment setup requires Python 3.11 or 3.12, a package manager, an IDE, and core libraries installed in the correct order
  • Anaconda or pip manages dependencies — never mix both in the same project without explicit isolation
  • VS Code with the Jupyter extension replaces standalone Jupyter Notebook for most workflows in 2026
  • Performance insight: virtual environments add zero runtime overhead but prevent 90% of dependency conflicts
  • Production insight: environment mismatches between local and deployed code cause silent model failures — version pinning is mandatory
  • Biggest mistake: installing TensorFlow and PyTorch in the same environment without version pinning
  • 2026 addition: add the openai or anthropic SDK to your environment from day one — LLM API calls are a baseline expectation in most ML roles
Plain-English First

Setting up an ML environment is like setting up a professional kitchen before cooking. You need the right tools (Python, libraries), the right workspace (IDE), and everything organized so ingredients from one dish do not contaminate another (virtual environments). Skip the organization step and you will spend more time fighting installation errors than building models. This guide walks through every step in tested sequence — from zero to a working, reproducible ML environment that matches what professional teams use in 2026.

ML environment setup is the first barrier that stops most beginners — and it is entirely avoidable with the right sequence. Dependency conflicts between TensorFlow, PyTorch, and scikit-learn create cryptic errors that derail learning momentum at the worst possible moment. The core problem is not complexity — it is sequencing and isolation. Installing tools in the wrong order or mixing package managers creates conflicts that take hours to diagnose and are nearly impossible to trace without experience. This guide provides a tested installation sequence for 2026 that avoids the common pitfalls. Every step produces a verifiable output so you know exactly where something breaks before it becomes a three-hour debugging session. The environment you build here will support classical ML, deep learning, and LLM API integration — the three layers of a complete 2026 ML workflow.

Step 1: Install Python

Python is the foundation of every ML environment. In 2026, Python 3.11 and 3.12 are the stable targets for ML work. Python 3.11 improved interpreter performance by 10 to 60 percent over 3.10 and has broad library compatibility. Python 3.12 is the current release with full support from NumPy, pandas, PyTorch 2.x, and the OpenAI SDK. Avoid Python 3.13 for ML work in 2026 — some compiled ML libraries lag by one to two minor versions. Never use the system Python that ships with macOS or Linux for ML development — it exists for the operating system, not you.

install_python.shBASH
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
# macOS — install via Homebrew (recommended)
brew install python@3.12
# Verify
python3.12 --version
# Expected: Python 3.12.x

# macOS/Linux — alternatively install via pyenv for multi-version management
curl https://pyenv.run | bash
# Add to shell profile (~/.zshrc or ~/.bashrc):
# export PATH="$HOME/.pyenv/bin:$PATH"
# eval "$(pyenv init -)"
pyenv install 3.12.3
pyenv global 3.12.3
python --version
# Expected: Python 3.12.3

# Windows — download from python.org/downloads
# Check 'Add Python to PATH' during installation
# Verify in PowerShell:
python --version
# Expected: Python 3.12.x

# Verify pip is installed and up to date
python3.12 -m pip install --upgrade pip
pip --version
# Expected: pip 24.x from .../python3.12/site-packages/pip
Output
Python 3.12.3
pip 24.0 from /usr/local/lib/python3.12/site-packages/pip (python 3.12)
Which Python Version to Install in 2026
  • Python 3.12 is the recommended target — broadest library support, stable, production-ready
  • Python 3.11 is a safe fallback if a specific library does not yet support 3.12
  • Avoid Python 3.13 for ML work — library support lags the interpreter release by months
  • Never use the system Python on macOS or Linux — use Homebrew, pyenv, or the official installer
  • On Windows, always check 'Add Python to PATH' during installation or nothing will work from the terminal
Production Insight
Python version mismatches cause the same class of silent failures as library version mismatches.
Document the exact Python version in README.md and your setup script — not just 'Python 3'.
pyenv makes switching between Python versions per-project trivial and is worth installing from day one.
Key Takeaway
Python 3.12 is the right target for ML work in 2026.
Never use system Python for ML development — it exists for the OS, not you.
pyenv is the cleanest way to manage multiple Python versions across projects.
Python Installation Method Selection
IfmacOS, single Python version needed
UseInstall via Homebrew: brew install python@3.12 — simplest path, keeps system Python untouched
IfmacOS or Linux, need multiple Python versions across projects
UseInstall pyenv first, then install Python versions through pyenv — version switching becomes one command
IfWindows
UseDownload from python.org, check 'Add to PATH', verify in PowerShell — straightforward if you follow the checklist
IfTeam environment with strict version requirements
UseUse Docker with FROM python:3.12-slim — guarantees every team member runs identical Python without manual installation

Step 2: Create a Virtual Environment

Virtual environments isolate project dependencies so different projects can use different library versions without conflicts. This is not optional — it is the single step that prevents 90% of the dependency errors that derail beginners. Every ML project gets its own environment. The two standard tools are venv (built into Python, no install required) and conda (from Anaconda or Miniconda, better for managing compiled dependencies like CUDA). For most beginners, venv with pip is the right starting point. For teams managing GPU drivers, CUDA versions, and complex compiled dependencies across operating systems, conda provides better control. In 2026, a third option has become practical for teams: container-first development using Docker, where the environment definition lives in a Dockerfile and every developer runs the same container.

create_virtual_env.shBASH
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
# Option A: venv (built into Python — recommended for most users)
# Create a named environment directory outside the project so it is never committed
python3.12 -m venv ~/ml_envs/ml_2026

# Activate on macOS/Linux
source ~/ml_envs/ml_2026/bin/activate

# Activate on Windows (PowerShell)
# ~/ml_envs/ml_2026/Scripts/Activate.ps1

# Verify you are in the virtual environment — path must point to ml_2026
which python
# Expected: ~/ml_envs/ml_2026/bin/python

which pip
# Expected: ~/ml_envs/ml_2026/bin/pip

# Upgrade pip, setuptools, and wheel inside the environment before installing anything else
pip install --upgrade pip setuptools wheel

# When you are done working
deactivate

# ---

# Option B: conda (from Anaconda or Miniconda)
# Create conda environment with Python version pinned
conda create -n ml_2026 python=3.12 -y

# Activate
conda activate ml_2026

# Verify — path must point to the conda environment
which python
conda list | head -20

# Deactivate
conda deactivate

# ---

# Add this to .gitignore to ensure the environment is never committed
echo 'ml_env/' >> .gitignore
echo '__pycache__/' >> .gitignore
echo '*.pyc' >> .gitignore
echo '.env' >> .gitignore
Output
(ml_2026) $ which python
/Users/username/ml_envs/ml_2026/bin/python
(ml_2026) $ pip --version
pip 24.0 from /Users/username/ml_envs/ml_2026/lib/python3.12/site-packages/pip (python 3.12)
(ml_2026) $ python --version
Python 3.12.3
Virtual Environment Rules — Non-Negotiable
  • Every project gets its own environment — never share environments across projects
  • Always activate the environment before running pip install — installing into the wrong environment is the most common beginner error
  • Store environments outside the project directory so they are never accidentally committed to Git
  • Add your environment directory name to .gitignore immediately
  • Upgrade pip inside the environment before installing any packages — old pip versions misresolve dependencies
Production Insight
Dependency conflicts are the single largest source of ML environment issues for beginners and experienced engineers alike.
Virtual environments eliminate the conflict by design — isolation is cheaper than debugging.
Without isolation, installing TensorFlow frequently silently downgrades NumPy in a way that breaks scikit-learn imports with no obvious error message.
Key Takeaway
Virtual environments are mandatory for ML development — no exceptions.
One project equals one environment equals no dependency conflicts.
venv for simplicity, conda for compiled dependencies, Docker for team reproducibility.
Virtual Environment Tool Selection
IfBeginner, single Python version, pip packages only
UseUse venv — it is built into Python 3.12, requires no installation, and covers 95% of ML use cases
IfNeed multiple Python versions simultaneously or non-Python compiled dependencies
UseUse conda — it manages Python versions and compiled C/Fortran dependencies that pip cannot handle cleanly
IfTeam project with strict reproducibility requirements
UseUse Docker with a pinned requirements.txt — this is the only approach that guarantees byte-for-byte environment consistency
IfWorking with Jupyter notebooks across multiple projects
UseUse venv + ipykernel to register each environment as a separate Jupyter kernel — one kernel per environment

Step 3: Install Core ML Libraries

Core ML libraries form the foundation of every project. Install them in a specific order to avoid dependency conflicts — this sequence has been tested against the 2026 library release landscape. NumPy must be installed first because every other scientific Python library links against it at compile time. Then pandas for data manipulation, matplotlib and seaborn for visualization, scikit-learn for classical ML algorithms, and Jupyter support. Deep learning libraries come last and ideally live in their own environment. In 2026, add the openai SDK or anthropic SDK to your baseline environment — LLM API calls are now a standard component of production ML pipelines, not an advanced specialty skill. Add MLflow for experiment tracking from the start rather than retrofitting it later.

install_core_libraries.shBASH
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
# Always activate your virtual environment first — verify with 'which python'
source ~/ml_envs/ml_2026/bin/activate

# Step 1: Upgrade pip before installing anything
pip install --upgrade pip setuptools wheel

# Step 2: Install core data science stack — order matters
pip install numpy==1.26.4
pip install pandas==2.2.2
pip install matplotlib==3.9.0
pip install seaborn==0.13.2

# Step 3: Install scikit-learn and gradient boosting libraries
pip install scikit-learn==1.5.0
pip install xgboost==2.0.3
pip install lightgbm==4.3.0

# Step 4: Install Jupyter support
pip install jupyter==1.0.0 ipykernel==6.29.4
# Register this environment as a Jupyter kernel
python -m ipykernel install --user --name ml_2026 --display-name "ML 2026 (Python 3.12)"

# Step 5: Install experiment tracking
pip install mlflow==2.13.0

# Step 6: Install LLM API SDKs — baseline in 2026
pip install openai==1.30.1
pip install anthropic==0.28.0
pip install python-dotenv==1.0.1

# Step 7: Install ONE deep learning library
# Option A: PyTorch — recommended for beginners and researchers in 2026
# CPU-only version (fast to install, no GPU required for learning)
pip install torch==2.3.0 torchvision==0.18.0 torchaudio==2.3.0 --index-url https://download.pytorch.org/whl/cpu
# GPU version — get the exact command from pytorch.org/get-started/locally
# pip install torch==2.3.0 torchvision==0.18.0 --index-url https://download.pytorch.org/whl/cu121

# Option B: TensorFlowif your team uses it
# pip install tensorflow==2.16.1

# Do not install both in the same environment without explicit version pinning and testing

# Step 8: Verify every importcatch silent failures before they surface in a notebook
python -c "
import numpy as np; print(f'NumPy {np.__version__}')
import pandas as pd; print(f'pandas {pd.__version__}')
import sklearn; print(f'scikit-learn {sklearn.__version__}')
import matplotlib; print(f'matplotlib {matplotlib.__version__}')
import xgboost; print(f'XGBoost {xgboost.__version__}')
import mlflow; print(f'MLflow {mlflow.__version__}')
import openai; print(f'openai {openai.__version__}')
import torch; print(f'PyTorch {torch.__version__}')
print('All imports successful')
"

# Step 9: Freeze requirements with exact version pins
pip freeze > requirements.txt
echo "requirements.txt generated with $(wc -l < requirements.txt) packages"
Output
NumPy 1.26.4
pandas 2.2.2
scikit-learn 1.5.0
matplotlib 3.9.0
XGBoost 2.0.3
MLflow 2.13.0
openai 1.30.1
PyTorch 2.3.0+cpu
All imports successful
requirements.txt generated with 87 packages
Installation Order and Version Pinning
  • Install NumPy first — deep learning libraries frequently downgrade it silently if it is installed later
  • Install scikit-learn before deep learning libraries — this avoids NumPy version conflicts that are difficult to trace
  • Never install TensorFlow and PyTorch in the same environment without explicit version pinning for every shared dependency
  • Always run the verification imports block after installation — silent failures appear at runtime, not at install time
  • Pin versions with == in requirements.txt — using >= allows silent upgrades that break model reproducibility
  • Add openai and mlflow to every environment from the start — retrofitting experiment tracking after the fact is painful
Production Insight
Installing deep learning libraries before NumPy frequently silently downgrades NumPy to an older version that conflicts with scikit-learn.
TensorFlow and PyTorch have historically required conflicting NumPy version ranges — check the current compatibility matrix at pytorch.org and tensorflow.org before installing both.
Always verify imports after installation. Silent failures during install complete without error but raise ImportError at runtime, often minutes into a training run.
MLflow takes two minutes to install and saves hours when you need to compare model versions. Install it on day one.
Key Takeaway
Install in order: NumPy, pandas, visualization, scikit-learn, XGBoost, Jupyter, MLflow, LLM SDKs, then deep learning last.
Pin every version with == in requirements.txt — this single habit prevents the most common class of production environment failures.
Add openai and mlflow to your baseline environment — they are part of the 2026 ML stack, not advanced add-ons.

Step 4: Configure VS Code for ML Development

VS Code with the Jupyter extension has replaced standalone Jupyter Notebook as the standard ML development environment in 2026. It gives you IntelliSense, inline type checking, debugging with breakpoints inside notebook cells, Git integration, and notebook support in a single editor — with none of the browser tab management overhead of classic Jupyter. The critical configuration is selecting the correct Python interpreter from your virtual environment. Get this wrong and every import will fail with ModuleNotFoundError while the library is sitting correctly installed in a different environment. Configure settings.json per project rather than globally so team members get consistent behavior automatically.

vscode_settings.jsonJSON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
{
  "python.defaultInterpreterPath": "~/ml_envs/ml_2026/bin/python",

  "jupyter.askForKernelRestart": false,
  "jupyter.alwaysTrustNotebooks": true,
  "notebook.cellToolbarLocation": {
    "default": "right",
    "jupyter-notebook": "left"
  },
  "notebook.output.scrolling": true,
  "notebook.cellExecutionTimeout": 600000,

  "[python]": {
    "editor.defaultFormatter": "ms-python.black-formatter",
    "editor.formatOnSave": true,
    "editor.codeActionsOnSave": {
      "source.organizeImports": "explicit"
    }
  },

  "python.analysis.typeCheckingMode": "basic",
  "python.analysis.autoImportCompletions": true,

  "files.exclude": {
    "**/__pycache__": true,
    "**/*.pyc": true,
    "**/.ipynb_checkpoints": true,
    "**/ml_env": true
  },
  "search.exclude": {
    "**/data/**/*.csv": true,
    "**/data/**/*.parquet": true,
    "**/*.pkl": true,
    "**/*.pt": true
  },

  "git.ignoreLimitWarning": true,
  "editor.rulers": [88],
  "editor.tabSize": 4
}
Essential VS Code Extensions for ML in 2026
  • Python (ms-python.python) — core language support, interpreter selection, and test runner integration
  • Jupyter (ms-toolsai.jupyter) — notebook support with variable explorer and cell-level debugging
  • Black Formatter (ms-python.black-formatter) — automatic formatting on save, consistent style across teams
  • Pylance (ms-python.vscode-pylance) — fast IntelliSense, import resolution, and type checking powered by Pyright
  • GitLens — commit history and blame annotations per line, essential for tracking when a model change was introduced
  • Thunder Client — lightweight REST client for testing your FastAPI prediction endpoints without leaving VS Code
Production Insight
Selecting the wrong Python interpreter is the single most common cause of ModuleNotFoundError in VS Code — the library is installed correctly, but VS Code is using system Python.
Always set python.defaultInterpreterPath in the project-level .vscode/settings.json, not just through the status bar selector — the status bar selection does not persist for teammates who clone the repo.
FormatOnSave with Black takes zero effort and eliminates style debates in code review.
Key Takeaway
VS Code with the Jupyter extension is the 2026 standard — faster, more debuggable, and better integrated than standalone Jupyter Notebook.
Set the Python interpreter path in settings.json per project — relying on the status bar selector breaks when teammates clone the repo.
Install all six extensions before starting any project — they pay for themselves within the first hour.

Step 5: GPU Setup for Deep Learning

GPU acceleration reduces deep learning training time from hours to minutes for medium-sized models and from days to hours for large ones. NVIDIA GPUs with CUDA support are required for both PyTorch and TensorFlow. The setup requires three components installed in a specific order: NVIDIA driver, CUDA toolkit, and cuDNN library. Version compatibility between all three is critical — mismatched versions produce cryptic CUDA errors or, worse, silent CPU fallback where training appears to work but runs 40 times slower without any warning. If you do not have an NVIDIA GPU, skip local GPU setup entirely and use Google Colab or Kaggle Notebooks — both provide free GPU access sufficient for learning and small projects.

gpu_setup_verify.shBASH
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
# Step 1: Verify NVIDIA GPU is detected by the system
lspci | grep -i nvidia
# On Windows: Device Manager > Display Adapters

# Step 2: Check installed NVIDIA driver and supported CUDA version
nvidia-smi
# Top-right corner shows maximum supported CUDA version
# Example: CUDA Version: 12.4 means your driver supports CUDA up to 12.4

# Step 3: Match PyTorch CUDA build to your driver's supported CUDA version
# Get the exact install command from: https://pytorch.org/get-started/locally/
# Example for CUDA 12.1:
pip install torch==2.3.0 torchvision==0.18.0 --index-url https://download.pytorch.org/whl/cu121

# Step 4: Verify CUDA is detected by PyTorch
python -c "
import torch
print(f'PyTorch version: {torch.__version__}')
print(f'CUDA available: {torch.cuda.is_available()}')
if torch.cuda.is_available():
    print(f'CUDA version: {torch.version.cuda}')
    print(f'GPU count: {torch.cuda.device_count()}')
    print(f'GPU name: {torch.cuda.get_device_name(0)}')
    print(f'GPU memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB')
else:
    print('No CUDA GPU detected — running on CPU')
    print('Tip: install the CPU-only PyTorch build if you do not have a GPU')
"

# Step 5: Benchmark CPU vs GPU to confirm GPU is being used
python -c "
import torch
import time

size = 4096
a_cpu = torch.randn(size, size)
b_cpu = torch.randn(size, size)

start = time.time()
_ = torch.matmul(a_cpu, b_cpu)
cpu_time = time.time() - start
print(f'CPU matrix multiply ({size}x{size}): {cpu_time:.3f}s')

if torch.cuda.is_available():
    a_gpu = a_cpu.cuda()
    b_gpu = b_cpu.cuda()
    # Warm up the GPU
    torch.matmul(a_gpu, b_gpu)
    torch.cuda.synchronize()
    start = time.time()
    _ = torch.matmul(a_gpu, b_gpu)
    torch.cuda.synchronize()
    gpu_time = time.time() - start
    print(f'GPU matrix multiply ({size}x{size}): {gpu_time:.3f}s')
    print(f'Speedup: {cpu_time / gpu_time:.1f}x')
else:
    print('GPU not available — using CPU only')
    print('For learning, Google Colab provides free GPU: colab.research.google.com')
"
Output
PyTorch version: 2.3.0+cu121
CUDA available: True
CUDA version: 12.1
GPU count: 1
GPU name: NVIDIA RTX 4090
GPU memory: 24.6 GB
CPU matrix multiply (4096x4096): 4.823s
GPU matrix multiply (4096x4096): 0.119s
Speedup: 40.5x
GPU Version Compatibility Matrix — Check Before Installing
  • Run nvidia-smi first and note the maximum supported CUDA version shown in the top-right corner
  • Install the PyTorch wheel that matches or is below your driver's maximum CUDA version — not above it
  • cuDNN is bundled with modern PyTorch wheels — you do not need to install it separately for PyTorch
  • TensorFlow still requires manual cuDNN installation in some configurations — check the TF GPU installation guide
  • If nvidia-smi is not found, your NVIDIA driver is not installed — install the driver before anything else
Production Insight
GPU setup is the most frustrating part of ML environment setup because the error messages are cryptic and version mismatches look identical to missing drivers.
The correct debugging sequence is always: driver first (nvidia-smi), then CUDA version, then PyTorch wheel — never backwards.
For learning and small projects, Google Colab and Kaggle Notebooks eliminate local GPU setup entirely and provide free access to T4 and A100 GPUs.
For production training on large models, cloud GPU instances on Lambda Labs, AWS, or GCP are more cost-effective than local RTX cards when you factor in electricity and downtime.
Key Takeaway
GPU setup requires driver, CUDA, and PyTorch wheel version alignment — check nvidia-smi before installing anything.
For learning, skip local GPU setup and use Google Colab or Kaggle — the friction is not worth it until you need it.
The 40x speedup from a GPU only matters for deep learning — classical ML on CPU is fast enough for most projects.
GPU Setup Strategy by Situation
IfNo NVIDIA GPU on your machine and you are learning
UseUse Google Colab (free T4 GPU) or Kaggle Notebooks (free 30 hours per week) — zero local setup, start training in under 5 minutes
IfHave NVIDIA GPU but want the simplest possible CUDA setup
UseUse conda to install PyTorch — conda resolves CUDA dependencies automatically based on your driver version
IfHave NVIDIA GPU and need full control over CUDA version
UseInstall CUDA toolkit manually matching your driver, then install the matching PyTorch wheel from pytorch.org
IfTraining models larger than 10B parameters or need multi-GPU
UseUse cloud GPU instances (AWS p4d, GCP A100, Lambda Labs) — local hardware is impractical at this scale

Step 6: Project Structure and Reproducibility

A well-structured ML project prevents confusion as it grows from one notebook to ten files to a deployed API. Every project needs a standard directory layout, a pinned requirements.txt, a README with setup instructions, and version control. Reproducibility means another developer — or future you six months from now — can clone the repo, run one setup command, and get identical results. This requires four things working together: pinned dependencies, documented Python version, deterministic random seeds, and a setup script that does not require tribal knowledge. In 2026, add a .env.example file to show collaborators what environment variables the project needs without committing actual API keys, and add a pre-commit configuration to enforce formatting and prevent secrets from being committed accidentally.

project_structure.txtTEXT
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
my_ml_project/
├── README.md                    # Problem statement, setup instructions, results
├── requirements.txt             # Pinned dependencies — pip freeze output
├── setup.sh                     # One-command environment setup script
├── .env.example                 # Template for required environment variables (no real keys)
├── .gitignore                   # Excludes: data/, models/, .env, __pycache__, *.pkl, *.pt
├── .pre-commit-config.yaml      # Black formatting + detect-secrets hook
├── .vscode/
│   └── settings.json            # Project-level VS Code configuration
├── data/
│   ├── raw/                     # Original unmodified source data — never edit these
│   ├── processed/               # Cleaned and transformed data ready for modeling
│   └── .gitkeep                 # Preserves directory structure in Git without committing data
├── notebooks/
│   ├── 01_eda.ipynb             # Exploratory data analysis
│   ├── 02_feature_engineering.ipynb
│   └── 03_modeling.ipynb        # Training, evaluation, model selection
├── src/
│   ├── __init__.py
│   ├── data/
│   │   ├── __init__.py
│   │   ├── load.py              # Data loading — reads from data/raw/
│   │   └── preprocess.py        # Cleaning and transformation pipeline
│   ├── features/
│   │   ├── __init__.py
│   │   └── build_features.py    # Feature engineering functions
│   ├── models/
│   │   ├── __init__.py
│   │   ├── train.py             # Training script with MLflow logging
│   │   └── predict.py           # Inference logic — used by API and tests
│   └── visualization/
│       ├── __init__.py
│       └── visualize.py
├── models/                      # Saved model artifacts — tracked with DVC, not Git
│   └── .gitkeep
├── tests/
│   ├── test_data.py             # Validate data loading and preprocessing
│   ├── test_features.py
│   └── test_models.py           # Smoke tests for prediction output shape and type
├── api/
│   ├── app.py                   # FastAPI prediction endpoint
│   ├── Dockerfile               # Container definition for deployment
│   └── docker-compose.yml       # Local API + MLflow server orchestration
└── mlruns/                      # MLflow experiment tracking (add to .gitignore for large teams)
Reproducibility Checklist for 2026
  • Pin all dependency versions with == in requirements.txt — not >= or ~=
  • Document Python version in README.md and in setup.sh — 'Python 3' is not specific enough
  • Set random seeds for numpy, Python random module, and PyTorch at the top of every training script
  • Add .env.example to show collaborators required environment variables — never commit .env
  • Include a setup.sh that recreates the environment in one command — test it on a clean machine
  • Track large model artifacts with DVC, not Git — repositories with pickle files in version control are painful to work with
Production Insight
Unpinned dependencies break reproducibility within weeks on active projects — a teammate's pip install updates a transitive dependency and suddenly your model outputs differ.
A tested setup.sh saves hours of environment troubleshooting for every new team member and every CI run.
Do not commit .env files — use .env.example to document what variables are needed, and load them with python-dotenv in local development.
Model artifacts belong in DVC or cloud object storage, not Git. A 200MB pickle file in Git history makes every clone painful forever.
Key Takeaway
Standard project structure prevents the 'works on my machine' problem as projects grow.
Pin dependencies, document Python version, include a tested setup script, and use .env.example for API keys.
Reproducibility is an engineering requirement in 2026 — not a nice-to-have that you add later.
● Production incidentPOST-MORTEMseverity: high

Model Gives Different Results on Developer Laptop vs Production Server

Symptom
Model accuracy dropped from 94% to 61% immediately after deployment. No code changes between local and production. Same dataset, same algorithm, same random seed. The model had passed all local tests.
Assumption
The team assumed environment differences only affected installation speed and error messages — not model behavior. They tracked code versions in Git but never verified library versions across environments.
Root cause
Local environment used scikit-learn 1.3.0 while production used 1.1.0. The RandomForestClassifier default for max_features changed between these versions from 'auto' to 'sqrt'. This silently altered the model's feature sampling strategy on every tree, producing a fundamentally different model — no import errors, no warnings, no indication anything was wrong until predictions landed in production.
Fix
1. Added requirements.txt with pinned versions using == for every dependency 2. Added an environment verification script that checks library versions at application startup and fails loudly if versions do not match expected 3. Implemented a CI pipeline step that runs model evaluation tests against a Docker image matching the production environment exactly 4. Replaced ad-hoc deployment with a Docker container that carries the environment definition with it — local and production are now guaranteed identical
Key lesson
  • Always pin dependency versions in requirements.txt using == — not >= or ~=
  • Library version mismatches change model behavior, not just installation behavior — this is the dangerous case
  • Verify library versions match between local and production before every deployment, not after something breaks
  • Docker is the only reliable way to guarantee environment consistency across machines and teammates
Production debug guideSymptom to action mapping for common setup issues7 entries
Symptom · 01
ImportError: DLL load failed when importing tensorflow on Windows
Fix
Install Microsoft Visual C++ Redistributable 2019 or later from the Microsoft download page. Verify your Python version matches the TensorFlow wheel — TensorFlow 2.16+ requires Python 3.10 to 3.12 on Windows. Mismatched Python versions produce this error silently even when the install appears to succeed.
Symptom · 02
ModuleNotFoundError for a library you just installed
Fix
You installed into a different environment than the one you are running. Run 'which python' on macOS/Linux or 'where python' on Windows, then 'pip list' to confirm the active environment contains the library. The fix is always: activate the correct environment first, then install.
Symptom · 03
Jupyter kernel crashes when importing torch or tensorflow
Fix
Jupyter is using a different Python than your virtual environment. Register the virtual environment as a Jupyter kernel: python -m ipykernel install --user --name ml_env --display-name 'ML 2026'. Then restart Jupyter and select the new kernel from the kernel menu.
Symptom · 04
CUDA not available error when running PyTorch on an NVIDIA GPU
Fix
Run nvidia-smi and note the CUDA version shown in the top-right corner — this is the maximum CUDA version your driver supports. Install the matching PyTorch build from pytorch.org using the official install selector. Installing PyTorch without specifying the CUDA build installs the CPU-only version by default and produces exactly this error.
Symptom · 05
pip install takes forever or fails with connection timeout
Fix
Increase the timeout threshold: pip install --timeout 300 package_name. If behind a corporate proxy, set HTTP_PROXY and HTTPS_PROXY environment variables. As a last resort, use an alternative mirror: pip install -i https://pypi.tuna.tsinghua.edu.cn/simple package_name.
Symptom · 06
openai or anthropic SDK imports fail after installation
Fix
Confirm you are in the correct virtual environment with 'pip list | grep openai'. If present but still failing, check that your OPENAI_API_KEY environment variable is set — the SDK validates the key at import time in some versions. Use python-dotenv to load a .env file rather than hardcoding keys in source files.
Symptom · 07
numpy version conflict error after installing torch or tensorflow
Fix
Deep learning libraries frequently require a specific NumPy range. Check the error message for the required version range, then pin NumPy explicitly: pip install 'numpy>=1.24,<2.0'. If the conflict persists, recreate the environment from scratch and install in the order specified in Step 3 of this guide.
★ ML Environment Setup Quick ReferenceImmediate commands for environment setup, verification, and debugging in 2026
Need to verify Python and library versions across the full stack
Immediate action
Run version check commands for all core libraries in one pass
Commands
python --version && pip list | grep -E 'numpy|pandas|scikit-learn|torch|tensorflow|openai|anthropic|mlflow'
python -c "import sys; print(f'Python {sys.version}'); import numpy; print(f'NumPy {numpy.__version__}'); import sklearn; print(f'scikit-learn {sklearn.__version__}'); import torch; print(f'PyTorch {torch.__version__}')"
Fix now
If versions do not match expected, do not patch in place — recreate the virtual environment from requirements.txt to avoid compounding the mismatch
Need to check if GPU is available and performing correctly+
Immediate action
Test CUDA availability, device name, and compute performance for PyTorch
Commands
python -c "import torch; print(f'CUDA available: {torch.cuda.is_available()}'); print(f'GPU: {torch.cuda.get_device_name(0) if torch.cuda.is_available() else None}'); print(f'PyTorch: {torch.__version__}')"
python -c "import torch, time; size=4096; a=torch.randn(size,size); b=torch.randn(size,size); s=time.time(); torch.matmul(a,b); print(f'CPU: {time.time()-s:.2f}s'); a,b=a.cuda(),b.cuda(); torch.cuda.synchronize(); s=time.time(); torch.matmul(a,b); torch.cuda.synchronize(); print(f'GPU: {time.time()-s:.2f}s')" 2>/dev/null || echo 'GPU not available — CPU only'
Fix now
If CUDA is False despite having an NVIDIA GPU, run nvidia-smi to confirm driver is installed, then reinstall PyTorch with the matching CUDA build from pytorch.org
Need to freeze current environment for reproducibility+
Immediate action
Generate pinned requirements.txt and verify it captures all dependencies
Commands
pip freeze > requirements.txt && wc -l requirements.txt
cat requirements.txt | grep -E 'numpy|pandas|scikit-learn|torch|openai|mlflow'
Fix now
Commit requirements.txt to version control immediately — an unpinned environment has a half-life measured in weeks
Need to recreate environment from requirements.txt on a new machine+
Immediate action
Create a fresh virtual environment and install pinned dependencies
Commands
python3.12 -m venv ml_env && source ml_env/bin/activate && pip install --upgrade pip
pip install -r requirements.txt
Fix now
If any package fails to install, check for OS-specific wheels or CUDA build mismatches — the error message will name the conflicting package
ML Environment Tools Comparison
ToolPackage ManagerComplexityBest ForGPU Support
venv + pippip (PyPI)LowIndividual projects and beginners — simplest path from zero to workingManual CUDA install required
Anacondaconda (defaults + conda-forge)MediumData science teams managing compiled dependencies and multiple Python versionsconda resolves CUDA automatically
Minicondaconda (minimal install)MediumExperienced users who want conda's dependency resolution without the 3GB Anaconda base installconda resolves CUDA automatically
Dockerpip inside containerHighTeam reproducibility and production deployment — the only approach that guarantees identical environmentsNVIDIA Container Toolkit required
Google Colabpip (pre-installed stack)Very LowLearning, quick experiments, free GPU access without any local setupFree T4 and A100 GPU
Poetrypoetry (PyPI with lock file)MediumProduction Python projects that need dependency lock files and clean package publishingManual CUDA install required

Key takeaways

1
Install Python 3.12 and create a dedicated virtual environment for every project
this single habit prevents 90% of ML environment issues
2
Install libraries in order
NumPy first, then pandas, then scikit-learn, then MLflow and LLM SDKs, then deep learning last
3
Never mix TensorFlow and PyTorch in the same environment without explicit version pinning for every shared dependency
4
Pin all dependency versions with == in requirements.txt and commit it to version control
treat it as part of the model artifact
5
VS Code with the Jupyter extension is the 2026 standard development environment
configure the Python interpreter path per project in settings.json
6
For GPU setup, run nvidia-smi first, note the maximum CUDA version, then install the matching PyTorch wheel
always in that order

Common mistakes to avoid

6 patterns
×

Installing TensorFlow and PyTorch in the same environment

Symptom
Cryptic import errors, NumPy version conflicts, or one library silently downgrading the other's dependencies. Models trained with one library may produce different numerical results after the other library modifies shared C extensions. This class of error is nearly impossible to diagnose without knowing it is a version conflict.
Fix
Create separate virtual environments for TensorFlow and PyTorch projects — one project, one framework, one environment. If a project genuinely requires both, pin every shared dependency version explicitly in requirements.txt, verify the combination on a clean environment, and document the constraint prominently in README.md.
×

Using system Python for ML development

Symptom
Installing ML packages breaks macOS or Linux system tools. Permission errors require sudo. Upgrading Python for an ML project breaks system utilities that depend on the original version. pip install modifies shared packages that other system processes depend on.
Fix
Always use a virtual environment. Never run pip install without an active virtual environment. On macOS, the pre-installed Python is owned by the OS — treat it as read-only and install your own Python via Homebrew or pyenv.
×

Not pinning dependency versions in requirements.txt

Symptom
Model works today and breaks next month when a library releases an update that changes a default parameter or removes a function. Different team members with different installation dates get different results. Production deployment diverges from local development silently.
Fix
Use pip freeze > requirements.txt with exact version pins (==) after every environment setup. Commit requirements.txt to version control. Run pip install -r requirements.txt on every new machine and in every CI pipeline run. Treat requirements.txt as part of the model artifact, not just the codebase.
×

Installing Jupyter globally instead of inside the virtual environment

Symptom
Jupyter cannot find libraries installed in your virtual environment. Import errors appear in notebooks that work fine when you run the same code from the terminal. The wrong Python version is used by the Jupyter kernel even after activating the virtual environment.
Fix
Install jupyter and ipykernel inside the virtual environment. Register the environment as a named Jupyter kernel: python -m ipykernel install --user --name ml_2026 --display-name 'ML 2026'. Select this kernel explicitly in Jupyter or VS Code — never rely on the default kernel being correct.
×

Skipping GPU driver and CUDA version verification before installing deep learning libraries

Symptom
PyTorch reports CUDA is not available despite having a GPU. Training silently falls back to CPU at 40 times slower speed with no warning. Cryptic CUDA runtime errors appear mid-training after hours of computation.
Fix
Always run nvidia-smi before installing any deep learning library. Note the maximum CUDA version shown, then install the matching PyTorch wheel. After installation, run torch.cuda.is_available() and confirm it returns True before starting any training run.
×

Committing API keys or environment variables to version control

Symptom
OpenAI API keys, database credentials, or Anthropic API keys appear in .env files committed to Git. Security scanners flag the repository. Keys are exposed in public repositories, incurring unexpected API charges.
Fix
Add .env to .gitignore immediately when creating the project. Create a .env.example with placeholder values to document what variables are needed. Load environment variables with python-dotenv in local development. Use repository secrets or a secrets manager for CI/CD and production.
INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01SENIOR
How would you set up a reproducible ML environment for a team of five de...
Q02SENIOR
Your model gives different results on your laptop versus your colleague'...
Q03JUNIOR
Explain the difference between pip and conda. When would you choose one ...
Q04SENIOR
Why would a model trained locally produce different predictions in produ...
Q01 of 04SENIOR

How would you set up a reproducible ML environment for a team of five developers?

ANSWER
Start with a requirements.txt using exact version pins for every dependency — pip freeze output, not manually curated. Include a setup.sh script that creates a virtual environment, installs from requirements.txt, and registers the Jupyter kernel in one command. Add a .env.example documenting every environment variable the project needs. Use Docker for production parity — a Dockerfile that starts from a pinned Python base image and installs the same requirements.txt. Add a GitHub Actions workflow that builds the Docker image and runs the test suite on every pull request so environment drift gets caught before it merges. Document the Python version, CUDA version if applicable, and any OS-specific requirements in README.md. The goal is that any developer can clone the repo and have a working environment in under ten minutes without asking anyone for help.
FAQ · 6 QUESTIONS

Frequently Asked Questions

01
Should I use Anaconda or pip for ML development?
02
Do I need a GPU to learn machine learning?
03
How do I fix the 'No module named sklearn' error after installing scikit-learn?
04
What is the difference between Jupyter Notebook and JupyterLab?
05
How do I make my ML project reproducible on another machine?
06
Should I add LLM SDK libraries to my ML environment?
🔥

That's ML Basics. Mark it forged?

4 min read · try the examples if you haven't

Previous
Machine Learning Roadmap 2026 – From Complete Beginner to Job-Ready
16 / 25 · ML Basics
Next
Mathematics for Machine Learning – Explained Without Tears