ML environment setup requires Python 3.11 or 3.12, a package manager, an IDE, and core libraries installed in the correct order
Anaconda or pip manages dependencies — never mix both in the same project without explicit isolation
VS Code with the Jupyter extension replaces standalone Jupyter Notebook for most workflows in 2026
Performance insight: virtual environments add zero runtime overhead but prevent 90% of dependency conflicts
Production insight: environment mismatches between local and deployed code cause silent model failures — version pinning is mandatory
Biggest mistake: installing TensorFlow and PyTorch in the same environment without version pinning
2026 addition: add the openai or anthropic SDK to your environment from day one — LLM API calls are a baseline expectation in most ML roles
Plain-English First
Setting up an ML environment is like setting up a professional kitchen before cooking. You need the right tools (Python, libraries), the right workspace (IDE), and everything organized so ingredients from one dish do not contaminate another (virtual environments). Skip the organization step and you will spend more time fighting installation errors than building models. This guide walks through every step in tested sequence — from zero to a working, reproducible ML environment that matches what professional teams use in 2026.
ML environment setup is the first barrier that stops most beginners — and it is entirely avoidable with the right sequence. Dependency conflicts between TensorFlow, PyTorch, and scikit-learn create cryptic errors that derail learning momentum at the worst possible moment. The core problem is not complexity — it is sequencing and isolation. Installing tools in the wrong order or mixing package managers creates conflicts that take hours to diagnose and are nearly impossible to trace without experience. This guide provides a tested installation sequence for 2026 that avoids the common pitfalls. Every step produces a verifiable output so you know exactly where something breaks before it becomes a three-hour debugging session. The environment you build here will support classical ML, deep learning, and LLM API integration — the three layers of a complete 2026 ML workflow.
Step 1: Install Python
Python is the foundation of every ML environment. In 2026, Python 3.11 and 3.12 are the stable targets for ML work. Python 3.11 improved interpreter performance by 10 to 60 percent over 3.10 and has broad library compatibility. Python 3.12 is the current release with full support from NumPy, pandas, PyTorch 2.x, and the OpenAI SDK. Avoid Python 3.13 for ML work in 2026 — some compiled ML libraries lag by one to two minor versions. Never use the system Python that ships with macOS or Linux for ML development — it exists for the operating system, not you.
install_python.shBASH
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
# macOS — install via Homebrew (recommended)
brew install python@3.12
# Verify
python3.12 --version
# Expected: Python3.12.x
# macOS/Linux — alternatively install via pyenv for multi-version management
curl https://pyenv.run | bash
# Add to shell profile (~/.zshrc or ~/.bashrc):
# export PATH="$HOME/.pyenv/bin:$PATH"
# eval "$(pyenv init -)"
pyenv install 3.12.3
pyenv global 3.12.3
python --version
# Expected: Python3.12.3
# Windows — download from python.org/downloads
# Check'Add Python to PATH' during installation
# Verify in PowerShell:
python --version
# Expected: Python3.12.x
# Verify pip is installed and up to date
python3.12 -m pip install --upgrade pip
pip --version
# Expected: pip 24.x from .../python3.12/site-packages/pip
Output
Python 3.12.3
pip 24.0 from /usr/local/lib/python3.12/site-packages/pip (python 3.12)
Which Python Version to Install in 2026
Python 3.12 is the recommended target — broadest library support, stable, production-ready
Python 3.11 is a safe fallback if a specific library does not yet support 3.12
Avoid Python 3.13 for ML work — library support lags the interpreter release by months
Never use the system Python on macOS or Linux — use Homebrew, pyenv, or the official installer
On Windows, always check 'Add Python to PATH' during installation or nothing will work from the terminal
Production Insight
Python version mismatches cause the same class of silent failures as library version mismatches.
Document the exact Python version in README.md and your setup script — not just 'Python 3'.
pyenv makes switching between Python versions per-project trivial and is worth installing from day one.
Key Takeaway
Python 3.12 is the right target for ML work in 2026.
Never use system Python for ML development — it exists for the OS, not you.
pyenv is the cleanest way to manage multiple Python versions across projects.
Python Installation Method Selection
IfmacOS, single Python version needed
→
UseInstall via Homebrew: brew install python@3.12 — simplest path, keeps system Python untouched
IfmacOS or Linux, need multiple Python versions across projects
→
UseInstall pyenv first, then install Python versions through pyenv — version switching becomes one command
IfWindows
→
UseDownload from python.org, check 'Add to PATH', verify in PowerShell — straightforward if you follow the checklist
IfTeam environment with strict version requirements
→
UseUse Docker with FROM python:3.12-slim — guarantees every team member runs identical Python without manual installation
Step 2: Create a Virtual Environment
Virtual environments isolate project dependencies so different projects can use different library versions without conflicts. This is not optional — it is the single step that prevents 90% of the dependency errors that derail beginners. Every ML project gets its own environment. The two standard tools are venv (built into Python, no install required) and conda (from Anaconda or Miniconda, better for managing compiled dependencies like CUDA). For most beginners, venv with pip is the right starting point. For teams managing GPU drivers, CUDA versions, and complex compiled dependencies across operating systems, conda provides better control. In 2026, a third option has become practical for teams: container-first development using Docker, where the environment definition lives in a Dockerfile and every developer runs the same container.
create_virtual_env.shBASH
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
# Option A: venv (built into Python — recommended for most users)
# Create a named environment directory outside the project so it is never committed
python3.12 -m venv ~/ml_envs/ml_2026
# Activate on macOS/Linux
source ~/ml_envs/ml_2026/bin/activate
# Activate on Windows (PowerShell)
# ~/ml_envs/ml_2026/Scripts/Activate.ps1
# Verify you are in the virtual environment — path must point to ml_2026
which python
# Expected: ~/ml_envs/ml_2026/bin/python
which pip
# Expected: ~/ml_envs/ml_2026/bin/pip
# Upgrade pip, setuptools, and wheel inside the environment before installing anything else
pip install --upgrade pip setuptools wheel
# When you are done working
deactivate
# ---
# Option B: conda (from Anaconda or Miniconda)
# Create conda environment with Python version pinned
conda create -n ml_2026 python=3.12 -y
# Activate
conda activate ml_2026
# Verify — path must point to the conda environment
which python
conda list | head -20
# Deactivate
conda deactivate
# ---
# Addthis to .gitignore to ensure the environment is never committed
echo 'ml_env/' >> .gitignore
echo '__pycache__/' >> .gitignore
echo '*.pyc' >> .gitignore
echo '.env' >> .gitignore
Output
(ml_2026) $ which python
/Users/username/ml_envs/ml_2026/bin/python
(ml_2026) $ pip --version
pip 24.0 from /Users/username/ml_envs/ml_2026/lib/python3.12/site-packages/pip (python 3.12)
(ml_2026) $ python --version
Python 3.12.3
Virtual Environment Rules — Non-Negotiable
Every project gets its own environment — never share environments across projects
Always activate the environment before running pip install — installing into the wrong environment is the most common beginner error
Store environments outside the project directory so they are never accidentally committed to Git
Add your environment directory name to .gitignore immediately
Upgrade pip inside the environment before installing any packages — old pip versions misresolve dependencies
Production Insight
Dependency conflicts are the single largest source of ML environment issues for beginners and experienced engineers alike.
Virtual environments eliminate the conflict by design — isolation is cheaper than debugging.
Without isolation, installing TensorFlow frequently silently downgrades NumPy in a way that breaks scikit-learn imports with no obvious error message.
Key Takeaway
Virtual environments are mandatory for ML development — no exceptions.
One project equals one environment equals no dependency conflicts.
venv for simplicity, conda for compiled dependencies, Docker for team reproducibility.
Virtual Environment Tool Selection
IfBeginner, single Python version, pip packages only
→
UseUse venv — it is built into Python 3.12, requires no installation, and covers 95% of ML use cases
IfNeed multiple Python versions simultaneously or non-Python compiled dependencies
→
UseUse conda — it manages Python versions and compiled C/Fortran dependencies that pip cannot handle cleanly
IfTeam project with strict reproducibility requirements
→
UseUse Docker with a pinned requirements.txt — this is the only approach that guarantees byte-for-byte environment consistency
IfWorking with Jupyter notebooks across multiple projects
→
UseUse venv + ipykernel to register each environment as a separate Jupyter kernel — one kernel per environment
Step 3: Install Core ML Libraries
Core ML libraries form the foundation of every project. Install them in a specific order to avoid dependency conflicts — this sequence has been tested against the 2026 library release landscape. NumPy must be installed first because every other scientific Python library links against it at compile time. Then pandas for data manipulation, matplotlib and seaborn for visualization, scikit-learn for classical ML algorithms, and Jupyter support. Deep learning libraries come last and ideally live in their own environment. In 2026, add the openai SDK or anthropic SDK to your baseline environment — LLM API calls are now a standard component of production ML pipelines, not an advanced specialty skill. Add MLflow for experiment tracking from the start rather than retrofitting it later.
install_core_libraries.shBASH
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
# Always activate your virtual environment first — verify with 'which python'
source ~/ml_envs/ml_2026/bin/activate
# Step1: Upgrade pip before installing anything
pip install --upgrade pip setuptools wheel
# Step2: Install core data science stack — order matters
pip install numpy==1.26.4
pip install pandas==2.2.2
pip install matplotlib==3.9.0
pip install seaborn==0.13.2
# Step3: Install scikit-learn and gradient boosting libraries
pip install scikit-learn==1.5.0
pip install xgboost==2.0.3
pip install lightgbm==4.3.0
# Step4: InstallJupyter support
pip install jupyter==1.0.0 ipykernel==6.29.4
# Registerthis environment as a Jupyter kernel
python -m ipykernel install --user --name ml_2026 --display-name "ML 2026 (Python 3.12)"
# Step5: Install experiment tracking
pip install mlflow==2.13.0
# Step6: InstallLLMAPISDKs — baseline in 2026
pip install openai==1.30.1
pip install anthropic==0.28.0
pip install python-dotenv==1.0.1
# Step7: InstallONE deep learning library
# Option A: PyTorch — recommended for beginners and researchers in 2026
# CPU-only version (fast to install, no GPU required for learning)
pip install torch==2.3.0 torchvision==0.18.0 torchaudio==2.3.0 --index-url https://download.pytorch.org/whl/cpu
# GPU version — get the exact command from pytorch.org/get-started/locally
# pip install torch==2.3.0 torchvision==0.18.0 --index-url https://download.pytorch.org/whl/cu121
# Option B: TensorFlow — if your team uses it
# pip install tensorflow==2.16.1
# Do not install both in the same environment without explicit version pinning and testing
# Step8: Verify every import — catch silent failures before they surface in a notebook
python -c "
import numpy as np; print(f'NumPy {np.__version__}')
import pandas as pd; print(f'pandas {pd.__version__}')
import sklearn; print(f'scikit-learn {sklearn.__version__}')
import matplotlib; print(f'matplotlib {matplotlib.__version__}')
import xgboost; print(f'XGBoost {xgboost.__version__}')
import mlflow; print(f'MLflow {mlflow.__version__}')
import openai; print(f'openai {openai.__version__}')
import torch; print(f'PyTorch {torch.__version__}')
print('All imports successful')
"
# Step9: Freeze requirements with exact version pins
pip freeze > requirements.txt
echo "requirements.txt generated with $(wc -l < requirements.txt) packages"
Output
NumPy 1.26.4
pandas 2.2.2
scikit-learn 1.5.0
matplotlib 3.9.0
XGBoost 2.0.3
MLflow 2.13.0
openai 1.30.1
PyTorch 2.3.0+cpu
All imports successful
requirements.txt generated with 87 packages
Installation Order and Version Pinning
Install NumPy first — deep learning libraries frequently downgrade it silently if it is installed later
Install scikit-learn before deep learning libraries — this avoids NumPy version conflicts that are difficult to trace
Never install TensorFlow and PyTorch in the same environment without explicit version pinning for every shared dependency
Always run the verification imports block after installation — silent failures appear at runtime, not at install time
Pin versions with == in requirements.txt — using >= allows silent upgrades that break model reproducibility
Add openai and mlflow to every environment from the start — retrofitting experiment tracking after the fact is painful
Production Insight
Installing deep learning libraries before NumPy frequently silently downgrades NumPy to an older version that conflicts with scikit-learn.
TensorFlow and PyTorch have historically required conflicting NumPy version ranges — check the current compatibility matrix at pytorch.org and tensorflow.org before installing both.
Always verify imports after installation. Silent failures during install complete without error but raise ImportError at runtime, often minutes into a training run.
MLflow takes two minutes to install and saves hours when you need to compare model versions. Install it on day one.
Key Takeaway
Install in order: NumPy, pandas, visualization, scikit-learn, XGBoost, Jupyter, MLflow, LLM SDKs, then deep learning last.
Pin every version with == in requirements.txt — this single habit prevents the most common class of production environment failures.
Add openai and mlflow to your baseline environment — they are part of the 2026 ML stack, not advanced add-ons.
Step 4: Configure VS Code for ML Development
VS Code with the Jupyter extension has replaced standalone Jupyter Notebook as the standard ML development environment in 2026. It gives you IntelliSense, inline type checking, debugging with breakpoints inside notebook cells, Git integration, and notebook support in a single editor — with none of the browser tab management overhead of classic Jupyter. The critical configuration is selecting the correct Python interpreter from your virtual environment. Get this wrong and every import will fail with ModuleNotFoundError while the library is sitting correctly installed in a different environment. Configure settings.json per project rather than globally so team members get consistent behavior automatically.
Python (ms-python.python) — core language support, interpreter selection, and test runner integration
Jupyter (ms-toolsai.jupyter) — notebook support with variable explorer and cell-level debugging
Black Formatter (ms-python.black-formatter) — automatic formatting on save, consistent style across teams
Pylance (ms-python.vscode-pylance) — fast IntelliSense, import resolution, and type checking powered by Pyright
GitLens — commit history and blame annotations per line, essential for tracking when a model change was introduced
Thunder Client — lightweight REST client for testing your FastAPI prediction endpoints without leaving VS Code
Production Insight
Selecting the wrong Python interpreter is the single most common cause of ModuleNotFoundError in VS Code — the library is installed correctly, but VS Code is using system Python.
Always set python.defaultInterpreterPath in the project-level .vscode/settings.json, not just through the status bar selector — the status bar selection does not persist for teammates who clone the repo.
FormatOnSave with Black takes zero effort and eliminates style debates in code review.
Key Takeaway
VS Code with the Jupyter extension is the 2026 standard — faster, more debuggable, and better integrated than standalone Jupyter Notebook.
Set the Python interpreter path in settings.json per project — relying on the status bar selector breaks when teammates clone the repo.
Install all six extensions before starting any project — they pay for themselves within the first hour.
Step 5: GPU Setup for Deep Learning
GPU acceleration reduces deep learning training time from hours to minutes for medium-sized models and from days to hours for large ones. NVIDIA GPUs with CUDA support are required for both PyTorch and TensorFlow. The setup requires three components installed in a specific order: NVIDIA driver, CUDA toolkit, and cuDNN library. Version compatibility between all three is critical — mismatched versions produce cryptic CUDA errors or, worse, silent CPU fallback where training appears to work but runs 40 times slower without any warning. If you do not have an NVIDIA GPU, skip local GPU setup entirely and use Google Colab or Kaggle Notebooks — both provide free GPU access sufficient for learning and small projects.
gpu_setup_verify.shBASH
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
# Step1: VerifyNVIDIAGPU is detected by the system
lspci | grep -i nvidia
# OnWindows: DeviceManager > DisplayAdapters
# Step2: Check installed NVIDIA driver and supported CUDA version
nvidia-smi
# Top-right corner shows maximum supported CUDA version
# Example: CUDAVersion: 12.4 means your driver supports CUDA up to 12.4
# Step3: MatchPyTorchCUDA build to your driver's supported CUDA version
# Get the exact install command from: https://pytorch.org/get-started/locally/
# ExampleforCUDA12.1:
pip install torch==2.3.0 torchvision==0.18.0 --index-url https://download.pytorch.org/whl/cu121
# Step4: VerifyCUDA is detected by PyTorch
python -c "
import torch
print(f'PyTorch version: {torch.__version__}')
print(f'CUDA available: {torch.cuda.is_available()}')
if torch.cuda.is_available():
print(f'CUDA version: {torch.version.cuda}')
print(f'GPU count: {torch.cuda.device_count()}')
print(f'GPU name: {torch.cuda.get_device_name(0)}')
print(f'GPU memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB')
else:
print('No CUDA GPU detected — running on CPU')
print('Tip: install the CPU-only PyTorch build if you do not have a GPU')
"
# Step5: BenchmarkCPU vs GPU to confirm GPU is being used
python -c "
import torch
import time
size = 4096
a_cpu = torch.randn(size, size)
b_cpu = torch.randn(size, size)
start = time.time()
_ = torch.matmul(a_cpu, b_cpu)
cpu_time = time.time() - start
print(f'CPU matrix multiply ({size}x{size}): {cpu_time:.3f}s')
if torch.cuda.is_available():
a_gpu = a_cpu.cuda()
b_gpu = b_cpu.cuda()
# Warm up the GPU
torch.matmul(a_gpu, b_gpu)
torch.cuda.synchronize()
start = time.time()
_ = torch.matmul(a_gpu, b_gpu)
torch.cuda.synchronize()
gpu_time = time.time() - start
print(f'GPU matrix multiply ({size}x{size}): {gpu_time:.3f}s')
print(f'Speedup: {cpu_time / gpu_time:.1f}x')
else:
print('GPU not available — using CPU only')
print('For learning, Google Colab provides free GPU: colab.research.google.com')
"
Output
PyTorch version: 2.3.0+cu121
CUDA available: True
CUDA version: 12.1
GPU count: 1
GPU name: NVIDIA RTX 4090
GPU memory: 24.6 GB
CPU matrix multiply (4096x4096): 4.823s
GPU matrix multiply (4096x4096): 0.119s
Speedup: 40.5x
GPU Version Compatibility Matrix — Check Before Installing
Run nvidia-smi first and note the maximum supported CUDA version shown in the top-right corner
Install the PyTorch wheel that matches or is below your driver's maximum CUDA version — not above it
cuDNN is bundled with modern PyTorch wheels — you do not need to install it separately for PyTorch
TensorFlow still requires manual cuDNN installation in some configurations — check the TF GPU installation guide
If nvidia-smi is not found, your NVIDIA driver is not installed — install the driver before anything else
Production Insight
GPU setup is the most frustrating part of ML environment setup because the error messages are cryptic and version mismatches look identical to missing drivers.
The correct debugging sequence is always: driver first (nvidia-smi), then CUDA version, then PyTorch wheel — never backwards.
For learning and small projects, Google Colab and Kaggle Notebooks eliminate local GPU setup entirely and provide free access to T4 and A100 GPUs.
For production training on large models, cloud GPU instances on Lambda Labs, AWS, or GCP are more cost-effective than local RTX cards when you factor in electricity and downtime.
Key Takeaway
GPU setup requires driver, CUDA, and PyTorch wheel version alignment — check nvidia-smi before installing anything.
For learning, skip local GPU setup and use Google Colab or Kaggle — the friction is not worth it until you need it.
The 40x speedup from a GPU only matters for deep learning — classical ML on CPU is fast enough for most projects.
GPU Setup Strategy by Situation
IfNo NVIDIA GPU on your machine and you are learning
→
UseUse Google Colab (free T4 GPU) or Kaggle Notebooks (free 30 hours per week) — zero local setup, start training in under 5 minutes
IfHave NVIDIA GPU but want the simplest possible CUDA setup
→
UseUse conda to install PyTorch — conda resolves CUDA dependencies automatically based on your driver version
IfHave NVIDIA GPU and need full control over CUDA version
→
UseInstall CUDA toolkit manually matching your driver, then install the matching PyTorch wheel from pytorch.org
IfTraining models larger than 10B parameters or need multi-GPU
→
UseUse cloud GPU instances (AWS p4d, GCP A100, Lambda Labs) — local hardware is impractical at this scale
Step 6: Project Structure and Reproducibility
A well-structured ML project prevents confusion as it grows from one notebook to ten files to a deployed API. Every project needs a standard directory layout, a pinned requirements.txt, a README with setup instructions, and version control. Reproducibility means another developer — or future you six months from now — can clone the repo, run one setup command, and get identical results. This requires four things working together: pinned dependencies, documented Python version, deterministic random seeds, and a setup script that does not require tribal knowledge. In 2026, add a .env.example file to show collaborators what environment variables the project needs without committing actual API keys, and add a pre-commit configuration to enforce formatting and prevent secrets from being committed accidentally.
project_structure.txtTEXT
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
my_ml_project/
├── README.md # Problem statement, setup instructions, results
├── requirements.txt # Pinned dependencies — pip freeze output
├── setup.sh # One-command environment setup script
├── .env.example # Templatefor required environment variables (no real keys)
├── .gitignore # Excludes: data/, models/, .env, __pycache__, *.pkl, *.pt
├── .pre-commit-config.yaml # Black formatting + detect-secrets hook
├── .vscode/
│ └── settings.json # Project-level VSCode configuration
├── data/
│ ├── raw/ # Original unmodified source data — never edit these
│ ├── processed/ # Cleaned and transformed data ready for modeling
│ └── .gitkeep # Preserves directory structure in Git without committing data
├── notebooks/
│ ├── 01_eda.ipynb # Exploratory data analysis
│ ├── 02_feature_engineering.ipynb
│ └── 03_modeling.ipynb # Training, evaluation, model selection
├── src/
│ ├── __init__.py
│ ├── data/
│ │ ├── __init__.py
│ │ ├── load.py # Data loading — reads from data/raw/
│ │ └── preprocess.py # Cleaning and transformation pipeline
│ ├── features/
│ │ ├── __init__.py
│ │ └── build_features.py # Feature engineering functions
│ ├── models/
│ │ ├── __init__.py
│ │ ├── train.py # Training script with MLflow logging
│ │ └── predict.py # Inference logic — used by API and tests
│ └── visualization/
│ ├── __init__.py
│ └── visualize.py
├── models/ # Saved model artifacts — tracked with DVC, not Git
│ └── .gitkeep
├── tests/
│ ├── test_data.py # Validate data loading and preprocessing
│ ├── test_features.py
│ └── test_models.py # Smoke tests for prediction output shape and type
├── api/
│ ├── app.py # FastAPI prediction endpoint
│ ├── Dockerfile # Container definition for deployment
│ └── docker-compose.yml # LocalAPI + MLflow server orchestration
└── mlruns/ # MLflow experiment tracking (add to .gitignore for large teams)
Reproducibility Checklist for 2026
Pin all dependency versions with == in requirements.txt — not >= or ~=
Document Python version in README.md and in setup.sh — 'Python 3' is not specific enough
Set random seeds for numpy, Python random module, and PyTorch at the top of every training script
Add .env.example to show collaborators required environment variables — never commit .env
Include a setup.sh that recreates the environment in one command — test it on a clean machine
Track large model artifacts with DVC, not Git — repositories with pickle files in version control are painful to work with
Production Insight
Unpinned dependencies break reproducibility within weeks on active projects — a teammate's pip install updates a transitive dependency and suddenly your model outputs differ.
A tested setup.sh saves hours of environment troubleshooting for every new team member and every CI run.
Do not commit .env files — use .env.example to document what variables are needed, and load them with python-dotenv in local development.
Model artifacts belong in DVC or cloud object storage, not Git. A 200MB pickle file in Git history makes every clone painful forever.
Key Takeaway
Standard project structure prevents the 'works on my machine' problem as projects grow.
Pin dependencies, document Python version, include a tested setup script, and use .env.example for API keys.
Reproducibility is an engineering requirement in 2026 — not a nice-to-have that you add later.
● Production incidentPOST-MORTEMseverity: high
Model Gives Different Results on Developer Laptop vs Production Server
Symptom
Model accuracy dropped from 94% to 61% immediately after deployment. No code changes between local and production. Same dataset, same algorithm, same random seed. The model had passed all local tests.
Assumption
The team assumed environment differences only affected installation speed and error messages — not model behavior. They tracked code versions in Git but never verified library versions across environments.
Root cause
Local environment used scikit-learn 1.3.0 while production used 1.1.0. The RandomForestClassifier default for max_features changed between these versions from 'auto' to 'sqrt'. This silently altered the model's feature sampling strategy on every tree, producing a fundamentally different model — no import errors, no warnings, no indication anything was wrong until predictions landed in production.
Fix
1. Added requirements.txt with pinned versions using == for every dependency
2. Added an environment verification script that checks library versions at application startup and fails loudly if versions do not match expected
3. Implemented a CI pipeline step that runs model evaluation tests against a Docker image matching the production environment exactly
4. Replaced ad-hoc deployment with a Docker container that carries the environment definition with it — local and production are now guaranteed identical
Key lesson
Always pin dependency versions in requirements.txt using == — not >= or ~=
Library version mismatches change model behavior, not just installation behavior — this is the dangerous case
Verify library versions match between local and production before every deployment, not after something breaks
Docker is the only reliable way to guarantee environment consistency across machines and teammates
Production debug guideSymptom to action mapping for common setup issues7 entries
Symptom · 01
ImportError: DLL load failed when importing tensorflow on Windows
→
Fix
Install Microsoft Visual C++ Redistributable 2019 or later from the Microsoft download page. Verify your Python version matches the TensorFlow wheel — TensorFlow 2.16+ requires Python 3.10 to 3.12 on Windows. Mismatched Python versions produce this error silently even when the install appears to succeed.
Symptom · 02
ModuleNotFoundError for a library you just installed
→
Fix
You installed into a different environment than the one you are running. Run 'which python' on macOS/Linux or 'where python' on Windows, then 'pip list' to confirm the active environment contains the library. The fix is always: activate the correct environment first, then install.
Symptom · 03
Jupyter kernel crashes when importing torch or tensorflow
→
Fix
Jupyter is using a different Python than your virtual environment. Register the virtual environment as a Jupyter kernel: python -m ipykernel install --user --name ml_env --display-name 'ML 2026'. Then restart Jupyter and select the new kernel from the kernel menu.
Symptom · 04
CUDA not available error when running PyTorch on an NVIDIA GPU
→
Fix
Run nvidia-smi and note the CUDA version shown in the top-right corner — this is the maximum CUDA version your driver supports. Install the matching PyTorch build from pytorch.org using the official install selector. Installing PyTorch without specifying the CUDA build installs the CPU-only version by default and produces exactly this error.
Symptom · 05
pip install takes forever or fails with connection timeout
→
Fix
Increase the timeout threshold: pip install --timeout 300 package_name. If behind a corporate proxy, set HTTP_PROXY and HTTPS_PROXY environment variables. As a last resort, use an alternative mirror: pip install -i https://pypi.tuna.tsinghua.edu.cn/simple package_name.
Symptom · 06
openai or anthropic SDK imports fail after installation
→
Fix
Confirm you are in the correct virtual environment with 'pip list | grep openai'. If present but still failing, check that your OPENAI_API_KEY environment variable is set — the SDK validates the key at import time in some versions. Use python-dotenv to load a .env file rather than hardcoding keys in source files.
Symptom · 07
numpy version conflict error after installing torch or tensorflow
→
Fix
Deep learning libraries frequently require a specific NumPy range. Check the error message for the required version range, then pin NumPy explicitly: pip install 'numpy>=1.24,<2.0'. If the conflict persists, recreate the environment from scratch and install in the order specified in Step 3 of this guide.
★ ML Environment Setup Quick ReferenceImmediate commands for environment setup, verification, and debugging in 2026
Need to verify Python and library versions across the full stack−
Immediate action
Run version check commands for all core libraries in one pass
Commands
python --version && pip list | grep -E 'numpy|pandas|scikit-learn|torch|tensorflow|openai|anthropic|mlflow'
python -c "import torch, time; size=4096; a=torch.randn(size,size); b=torch.randn(size,size); s=time.time(); torch.matmul(a,b); print(f'CPU: {time.time()-s:.2f}s'); a,b=a.cuda(),b.cuda(); torch.cuda.synchronize(); s=time.time(); torch.matmul(a,b); torch.cuda.synchronize(); print(f'GPU: {time.time()-s:.2f}s')" 2>/dev/null || echo 'GPU not available — CPU only'
Fix now
If CUDA is False despite having an NVIDIA GPU, run nvidia-smi to confirm driver is installed, then reinstall PyTorch with the matching CUDA build from pytorch.org
Need to freeze current environment for reproducibility+
Immediate action
Generate pinned requirements.txt and verify it captures all dependencies
If any package fails to install, check for OS-specific wheels or CUDA build mismatches — the error message will name the conflicting package
ML Environment Tools Comparison
Tool
Package Manager
Complexity
Best For
GPU Support
venv + pip
pip (PyPI)
Low
Individual projects and beginners — simplest path from zero to working
Manual CUDA install required
Anaconda
conda (defaults + conda-forge)
Medium
Data science teams managing compiled dependencies and multiple Python versions
conda resolves CUDA automatically
Miniconda
conda (minimal install)
Medium
Experienced users who want conda's dependency resolution without the 3GB Anaconda base install
conda resolves CUDA automatically
Docker
pip inside container
High
Team reproducibility and production deployment — the only approach that guarantees identical environments
NVIDIA Container Toolkit required
Google Colab
pip (pre-installed stack)
Very Low
Learning, quick experiments, free GPU access without any local setup
Free T4 and A100 GPU
Poetry
poetry (PyPI with lock file)
Medium
Production Python projects that need dependency lock files and clean package publishing
Manual CUDA install required
Key takeaways
1
Install Python 3.12 and create a dedicated virtual environment for every project
this single habit prevents 90% of ML environment issues
2
Install libraries in order
NumPy first, then pandas, then scikit-learn, then MLflow and LLM SDKs, then deep learning last
3
Never mix TensorFlow and PyTorch in the same environment without explicit version pinning for every shared dependency
4
Pin all dependency versions with == in requirements.txt and commit it to version control
treat it as part of the model artifact
5
VS Code with the Jupyter extension is the 2026 standard development environment
configure the Python interpreter path per project in settings.json
6
For GPU setup, run nvidia-smi first, note the maximum CUDA version, then install the matching PyTorch wheel
always in that order
Common mistakes to avoid
6 patterns
×
Installing TensorFlow and PyTorch in the same environment
Symptom
Cryptic import errors, NumPy version conflicts, or one library silently downgrading the other's dependencies. Models trained with one library may produce different numerical results after the other library modifies shared C extensions. This class of error is nearly impossible to diagnose without knowing it is a version conflict.
Fix
Create separate virtual environments for TensorFlow and PyTorch projects — one project, one framework, one environment. If a project genuinely requires both, pin every shared dependency version explicitly in requirements.txt, verify the combination on a clean environment, and document the constraint prominently in README.md.
×
Using system Python for ML development
Symptom
Installing ML packages breaks macOS or Linux system tools. Permission errors require sudo. Upgrading Python for an ML project breaks system utilities that depend on the original version. pip install modifies shared packages that other system processes depend on.
Fix
Always use a virtual environment. Never run pip install without an active virtual environment. On macOS, the pre-installed Python is owned by the OS — treat it as read-only and install your own Python via Homebrew or pyenv.
×
Not pinning dependency versions in requirements.txt
Symptom
Model works today and breaks next month when a library releases an update that changes a default parameter or removes a function. Different team members with different installation dates get different results. Production deployment diverges from local development silently.
Fix
Use pip freeze > requirements.txt with exact version pins (==) after every environment setup. Commit requirements.txt to version control. Run pip install -r requirements.txt on every new machine and in every CI pipeline run. Treat requirements.txt as part of the model artifact, not just the codebase.
×
Installing Jupyter globally instead of inside the virtual environment
Symptom
Jupyter cannot find libraries installed in your virtual environment. Import errors appear in notebooks that work fine when you run the same code from the terminal. The wrong Python version is used by the Jupyter kernel even after activating the virtual environment.
Fix
Install jupyter and ipykernel inside the virtual environment. Register the environment as a named Jupyter kernel: python -m ipykernel install --user --name ml_2026 --display-name 'ML 2026'. Select this kernel explicitly in Jupyter or VS Code — never rely on the default kernel being correct.
×
Skipping GPU driver and CUDA version verification before installing deep learning libraries
Symptom
PyTorch reports CUDA is not available despite having a GPU. Training silently falls back to CPU at 40 times slower speed with no warning. Cryptic CUDA runtime errors appear mid-training after hours of computation.
Fix
Always run nvidia-smi before installing any deep learning library. Note the maximum CUDA version shown, then install the matching PyTorch wheel. After installation, run torch.cuda.is_available() and confirm it returns True before starting any training run.
×
Committing API keys or environment variables to version control
Symptom
OpenAI API keys, database credentials, or Anthropic API keys appear in .env files committed to Git. Security scanners flag the repository. Keys are exposed in public repositories, incurring unexpected API charges.
Fix
Add .env to .gitignore immediately when creating the project. Create a .env.example with placeholder values to document what variables are needed. Load environment variables with python-dotenv in local development. Use repository secrets or a secrets manager for CI/CD and production.
INTERVIEW PREP · PRACTICE MODE
Interview Questions on This Topic
Q01SENIOR
How would you set up a reproducible ML environment for a team of five de...
Q02SENIOR
Your model gives different results on your laptop versus your colleague'...
Q03JUNIOR
Explain the difference between pip and conda. When would you choose one ...
Q04SENIOR
Why would a model trained locally produce different predictions in produ...
Q01 of 04SENIOR
How would you set up a reproducible ML environment for a team of five developers?
ANSWER
Start with a requirements.txt using exact version pins for every dependency — pip freeze output, not manually curated. Include a setup.sh script that creates a virtual environment, installs from requirements.txt, and registers the Jupyter kernel in one command. Add a .env.example documenting every environment variable the project needs. Use Docker for production parity — a Dockerfile that starts from a pinned Python base image and installs the same requirements.txt. Add a GitHub Actions workflow that builds the Docker image and runs the test suite on every pull request so environment drift gets caught before it merges. Document the Python version, CUDA version if applicable, and any OS-specific requirements in README.md. The goal is that any developer can clone the repo and have a working environment in under ten minutes without asking anyone for help.
Q02 of 04SENIOR
Your model gives different results on your laptop versus your colleague's laptop. How do you debug this?
ANSWER
The systematic approach: first, compare pip freeze output from both machines and diff the results — this immediately identifies version mismatches. Focus on scikit-learn, NumPy, and PyTorch, since these are the libraries most likely to change model behavior across versions. Second, confirm random seeds are set identically for numpy, Python's random module, and the ML library — unset seeds produce different results across runs. Third, check data loading — different OS line endings, encoding assumptions, or pandas parsing behavior can alter the input data before the model sees it. Fourth, if using a GPU, verify both machines are using the same compute path — a CPU result and a GPU result for the same model can differ due to floating-point operation order. The long-term fix is Docker — if the environment definition travels with the code, this class of problem cannot occur.
Q03 of 04JUNIOR
Explain the difference between pip and conda. When would you choose one over the other?
ANSWER
pip installs Python packages from PyPI and handles pure-Python dependencies well. conda is a cross-language package manager that can install Python packages, compiled C/Fortran libraries, CUDA toolkits, and system-level dependencies — things pip cannot manage. For most ML work involving pure Python libraries, pip with venv is simpler, faster, and sufficient. Conda becomes necessary when you need to manage CUDA versions alongside PyTorch or TensorFlow, when working with geospatial libraries that have complex compiled dependencies, or when you need to switch between multiple Python versions on the same machine. The critical rule: never mix pip and conda package installations in the same environment without understanding the implications — conda can overwrite pip-installed packages silently, causing confusing behavior.
Q04 of 04SENIOR
Why would a model trained locally produce different predictions in production without any code changes?
ANSWER
This is one of the most dangerous failure modes in ML deployment, and it is almost always an environment issue. The most common cause: library version mismatch where a default parameter changed between versions. The sklearn RandomForestClassifier max_features default changed from 'auto' to 'sqrt' between version 1.1 and 1.3 — a model trained locally on 1.3 and deployed on 1.1 uses fundamentally different feature sampling without raising any error. Other causes include NumPy floating-point behavior differences across versions, different random state handling, or a preprocessing step that behaves differently due to a pandas API change. The prevention: pin all dependency versions with ==, verify versions match between environments, and run prediction tests against production-matched environments in CI before every deployment.
01
How would you set up a reproducible ML environment for a team of five developers?
SENIOR
02
Your model gives different results on your laptop versus your colleague's laptop. How do you debug this?
SENIOR
03
Explain the difference between pip and conda. When would you choose one over the other?
JUNIOR
04
Why would a model trained locally produce different predictions in production without any code changes?
SENIOR
FAQ · 6 QUESTIONS
Frequently Asked Questions
01
Should I use Anaconda or pip for ML development?
For most beginners in 2026, pip with venv is the right starting point. It is simpler, faster to install, and sufficient for pure Python ML work with scikit-learn, PyTorch, and the OpenAI SDK. Anaconda provides better handling of compiled dependencies — CUDA, MKL, HDF5 — and manages multiple Python versions, but it is heavier and its solver is slower. Use pip with venv if your stack is pure Python packages from PyPI. Switch to conda if you need CUDA management, complex compiled dependencies, or multiple Python versions across projects. The rule that overrides everything else: never install packages with both pip and conda in the same environment without understanding exactly what each one is managing — the two package managers can silently conflict in ways that are very difficult to diagnose.
Was this helpful?
02
Do I need a GPU to learn machine learning?
No. All classical ML — scikit-learn, XGBoost, random forests, gradient boosting — runs efficiently on CPU. You only need a GPU when training deep learning models on large datasets. For learning deep learning, Google Colab provides free GPU access with T4 and A100 options, and Kaggle Notebooks provides 30 free GPU hours per week. Both require zero local setup. For production deep learning at scale, cloud GPU instances on Lambda Labs, AWS, or GCP are more practical than buying local hardware when you factor in cost per compute hour, maintenance, and the ability to scale to multi-GPU training.
Was this helpful?
03
How do I fix the 'No module named sklearn' error after installing scikit-learn?
This error almost always means you installed scikit-learn into a different Python environment than the one currently running. Debug it in this order: run 'which python' to see which Python is active, then run 'pip list | grep scikit-learn' to check if scikit-learn is visible. If it is not listed, you are in the wrong environment — activate the correct virtual environment first, then reinstall. If using Jupyter, the kernel may be using a different Python than your terminal. Fix it by registering the correct environment as a kernel: python -m ipykernel install --user --name ml_2026, then select that kernel in Jupyter or VS Code.
Was this helpful?
04
What is the difference between Jupyter Notebook and JupyterLab?
Jupyter Notebook is the original single-document browser interface for running code cells interactively. JupyterLab is the successor — it adds multiple document tabs, an integrated file browser, a terminal, and extension support in a single browser window. In 2026, VS Code with the Jupyter extension has largely superseded both for daily development. It provides notebook support plus a full IDE — IntelliSense, debugging, Git integration, and extensions — without a browser. Use VS Code for all development work. Keep JupyterLab available for situations where you need to share or present a live notebook in a browser environment without VS Code installed.
Was this helpful?
05
How do I make my ML project reproducible on another machine?
Four things working together: pinned dependencies in requirements.txt using pip freeze with == pins; Python version documented explicitly in README.md and in setup.sh — '3.12.3', not 'Python 3'; a setup.sh script that creates the virtual environment, installs requirements.txt, and registers the Jupyter kernel in one command; and random seeds set in every training script for numpy, Python's random module, and PyTorch. Test reproducibility by cloning the repo on a fresh machine and running only setup.sh — if you need to run anything else, your documentation is incomplete. For production-grade reproducibility, add a Dockerfile so the environment definition is version-controlled alongside the code.
Was this helpful?
06
Should I add LLM SDK libraries to my ML environment?
Yes, from the start. In 2026, LLM API calls — OpenAI, Anthropic, or local models via Ollama — are a standard component of ML projects, not an advanced specialty. Adding 'pip install openai anthropic python-dotenv' to your baseline environment costs nothing and makes LLM integration available when you need it. Store API keys in a .env file loaded with python-dotenv, and add .env to .gitignore immediately. The .env.example pattern — a committed file with placeholder values — documents what keys collaborators need without exposing real credentials.