KNN memorises all training data and defers computation to prediction time — no training phase
Two knobs that matter: K (number of neighbours) and distance metric (Euclidean, Manhattan, cosine)
Feature scaling is load-bearing: unscaled features can drop accuracy by 25+ points
Prediction cost is O(N×D) per query — becomes painful above ~50k samples
Curse of dimensionality makes 'nearest' meaningless past ~20 features
Biggest mistake: using even K in binary classification — ties get arbitrary resolution
✦ Definition~90s read
What is K-Nearest Neighbours?
K-Nearest Neighbours (KNN) is a non-parametric, instance-based learning algorithm used for classification and regression. It doesn't build a model during training — it memorizes the entire training dataset and defers computation until prediction time, which is why it's called a 'lazy learner.' When you ask it to classify a new point, it finds the K closest training examples (by some distance metric, typically Euclidean) and votes on the label.
★
Imagine you move to a new city and want to know if a neighbourhood is safe.
That's it. No weights, no decision boundaries, no assumptions about data distribution. Its simplicity is both its superpower and its Achilles' heel.
KNN exists because sometimes you need a baseline that's interpretable and works well with low-dimensional, well-scaled data. It's the go-to for recommendation systems (think 'users like you also bought...'), anomaly detection, and small-to-medium datasets where decision boundaries are irregular.
But it's brutally unforgiving of unprepared data: unscaled features, irrelevant dimensions, or high cardinality categoricals will tank accuracy overnight — as the title's 92% to 58% crash demonstrates. Alternatives like SVM, Random Forest, or neural nets handle feature scaling and high dimensions more gracefully, but they sacrifice the 'no training phase' simplicity.
In practice, KNN is a sharp tool for a narrow job. It shines with <20 features, clean numeric data, and datasets under 100k rows (since prediction time scales linearly with training set size). Beyond that, you hit the curse of dimensionality — distances become meaningless in high-dimensional space — and inference latency becomes prohibitive.
Production systems often swap exact KNN for Approximate Nearest Neighbour (ANN) libraries like FAISS (Facebook), Annoy (Spotify), or HNSWlib, trading a sliver of accuracy for orders-of-magnitude speedup. If you're not prepared to normalize, select features, and tune K and distance metric, KNN will silently betray you.
Plain-English First
Imagine you move to a new city and want to know if a neighbourhood is safe. You don't read a 500-page report — you just ask the 5 nearest neighbours what they think and go with the majority vote. That's literally KNN. The algorithm asks: 'What are the K closest data points to this new one, and what category do THEY belong to?' Whatever category wins the vote, that's the prediction.
Every time Netflix decides you'll probably enjoy a thriller because your three closest 'taste-twin' users loved it, or a hospital flags a patient as high-risk because their lab results closely resemble previous high-risk patients, something like KNN is quietly at work. It's one of the oldest tricks in machine learning, and it's still genuinely useful in 2024 — not because it's flashy, but because it's intuitive, requires zero training time, and can handle surprisingly complex decision boundaries that stumped simpler models.
Most classification algorithms learn a fixed set of parameters during training — think of weights in a neural network or split thresholds in a decision tree. KNN doesn't do that. It memorises the entire training dataset and defers all the hard thinking until prediction time. This makes it what researchers call a 'lazy learner'. That laziness is both its superpower and its Achilles heel: no training phase means instant adaptability to new data, but prediction time scales with dataset size, which becomes painful fast.
By the end of this article you'll understand exactly how KNN makes predictions, why the choice of K and the distance metric matter more than most tutorials admit, how to implement it from scratch in Python so you trust what's happening under the hood, and when to reach for it versus when to leave it on the shelf. You'll also walk away knowing the two mistakes that silently destroy KNN performance — mistakes that even experienced engineers miss.
Why KNN Is a Lazy Learner That Punishes Unprepared Data
K-nearest neighbours (KNN) is a non-parametric, instance-based learning algorithm that classifies a data point by majority vote among its k closest labeled neighbours in feature space. It stores the entire training set and defers computation until inference — making it a lazy learner with O(n) prediction cost per query. No model is built; the decision boundary is defined entirely by local geometry.
Distance metric choice (Euclidean, Manhattan, Minkowski) and feature scaling are not optional tuning knobs — they are correctness constraints. Euclidean distance implicitly weights each feature equally; a feature with range 0–1000 dominates one with range 0–1. Without normalization, the algorithm effectively ignores low-magnitude features, collapsing accuracy. K also acts as a smoothing parameter: small k captures fine patterns but overfits noise; large k blurs class boundaries.
KNN excels when decision boundaries are irregular, training data is abundant, and interpretability matters more than inference speed. It is common in recommendation systems, anomaly detection, and medical diagnosis — but only after rigorous feature scaling and dimensionality reduction. In production, teams often discover too late that unscaled features silently destroy accuracy, turning a 92% baseline into a 58% embarrassment.
Scale or Fail
Euclidean distance without feature scaling is not a bug — it's a silent correctness violation that can halve your accuracy.
Production Insight
A fraud detection team deployed KNN on raw transaction amounts (0–50k) and account age (0–3650 days). The amount feature dominated, causing the model to ignore account age entirely. Symptom: validation accuracy dropped from 92% to 58% after adding a new feature with larger magnitude. Rule of thumb: always standardize (z-score) or min-max scale features before any distance-based model — and validate that scaling is applied consistently at training and inference.
Key Takeaway
KNN stores all training data and computes at query time — O(n) inference cost is a hard constraint.
Feature scaling is not optional; Euclidean distance assumes equal feature weight.
k is a hyperparameter that controls bias-variance tradeoff — tune it with cross-validation, not guesswork.
thecodeforge.io
KNN: Unscaled Features Crash Accuracy from 92% to 58%
K Nearest Neighbours
How KNN Actually Makes a Prediction — The Mechanics Without the Magic
The algorithm has exactly three steps, and understanding each one deeply is what separates someone who can use KNN from someone who can debug it.
Step 1 — Measure distance. For every point in the training set, KNN calculates how far away it is from the new, unlabelled point. The default is Euclidean distance — straight-line distance in N-dimensional space. But 'distance' is a design choice, not a given. More on that shortly.
Step 2 — Find the K nearest. Sort all those distances and pick the K smallest. These are the neighbours that get a vote.
Step 3 — Vote. For classification, the most common class label among those K neighbours wins. For regression, the average of their values is returned.
That's the entire algorithm. No loss functions. No gradient descent. No epochs. The reason this works at all is the assumption of local similarity — nearby points in feature space tend to share the same label. When that assumption holds (and it often does in well-scaled datasets), KNN is surprisingly competitive. When it doesn't hold, the predictions are noise.
The critical insight is that KNN draws its decision boundaries implicitly — it never explicitly defines a boundary. The boundary exists wherever the voting majority shifts, and it can be wildly irregular. This makes KNN capable of learning non-linear patterns that a logistic regression model would miss entirely.
knn_from_scratch.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
import numpy as np
from collections importCounter# --- KNN implemented from scratch so you can see every decision ---defeuclidean_distance(point_a, point_b):
"""Calculate straight-line distance between two points in N-dimensional space."""# Subtract coordinates, square them, sum them up, take the square rootreturn np.sqrt(np.sum((np.array(point_a) - np.array(point_b)) ** 2))
defknn_predict(training_features, training_labels, new_point, k):
"""
Predict the class label for a single new data point.
training_features : list of lists — each inner list is one training example
training_labels : list of class labels, one per training example
new_point : the unlabelled point we want to classify
k : how many neighbours get a vote
"""
# Step 1: Calculate distance from new_point to EVERY training example
distances = []
for index, training_point inenumerate(training_features):
dist = euclidean_distance(training_point, new_point)
distances.append((dist, training_labels[index])) # store (distance, label) pair# Step 2: Sort by distance ascending — closest neighbours come first
distances.sort(key=lambda pair: pair[0])
# Step 3: Take the K closest neighbours and collect their labels
k_nearest_labels = [label for (_, label) in distances[:k]]
# Step 4: Majority vote — most common label wins
vote_counts = Counter(k_nearest_labels)
predicted_label = vote_counts.most_common(1)[0][0] # [0] = top result, [0] = the label itselfreturn predicted_label, k_nearest_labels
# --- Toy dataset: classifying fruit by (weight_grams, sweetness_score) ---# Label 0 = apple, Label 1 = orange
training_features = [
[182, 7], # apple
[167, 6], # apple
[190, 8], # apple
[155, 9], # orange
[140, 10], # orange
[160, 8], # orange
]
training_labels = [0, 0, 0, 1, 1, 1] # 0=apple, 1=orange# New fruit: 170 grams, sweetness 8 — is it an apple or orange?
unknown_fruit = [170, 8]
k_value = 3
prediction, neighbours_used = knn_predict(
training_features, training_labels, unknown_fruit, k=k_value
)
label_names = {0: 'Apple', 1: 'Orange'}
print(f"Unknown fruit features: weight={unknown_fruit[0]}g, sweetness={unknown_fruit[1]}")
print(f"K = {k_value}")
print(f"Neighbour labels chosen: {[label_names[l] for l in neighbours_used]}")
print(f"Predicted class: {label_names[prediction]}")
Reading sklearn's KNeighborsClassifier is like reading a map without knowing cardinal directions. Building KNN manually once means you'll never be confused about what 'fit' and 'predict' actually do internally — fit just stores the data, predict runs this exact loop.
Production Insight
In production, the brute-force distance loop O(N×D) is the bottleneck.
For N=100k and D=50, that's 5 million floating-point operations per prediction.
Rule: Profile prediction latency early — if it exceeds your SLA, switch to a tree-based algorithm.
Key Takeaway
KNN memorises data — prediction re-runs the entire distance calculation each time.
That's O(N×D) per query with brute-force.
Use KD-trees or BallTrees when N exceeds ~30,000.
Choosing K and Distance Metrics — The Two Decisions That Make or Break KNN
Most tutorials treat K as a hyperparameter you tune with cross-validation and leave it at that. That's true, but the intuition behind WHY different K values behave differently is what you actually need in an interview or debugging session.
A small K (like K=1) means the algorithm trusts a single closest neighbour completely. This creates a very jagged decision boundary — it memorises the training data so precisely that every outlier and noisy label carves out its own little territory. This is high variance, low bias. Classic overfitting behaviour.
A large K smooths things out. With K=50, you're polling a wide neighbourhood, so individual noisy points lose their influence. But push it too far and you're polling neighbours so distant they're no longer actually relevant — the boundary becomes too smooth and misses real patterns. High bias, low variance.
The sweet spot is usually found with odd values of K (to avoid ties in binary classification) somewhere between 3 and sqrt(number of training samples). Cross-validation is your friend here.
Distance metrics are a separate and equally important choice. Euclidean distance treats all feature dimensions equally, so a feature measured in thousands (like salary) will completely dominate a feature measured in ones (like years of experience). Manhattan distance — the sum of absolute differences — is more robust to outliers. Cosine similarity works better when the magnitude of a vector matters less than its direction, which is exactly the case in text classification.
The rule: if your features aren't normalised and you use Euclidean distance, your KNN model is essentially ignoring all small-scale features. This is silent and devastating.
knn_sklearn_comparison.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
import numpy as np
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.neighbors importKNeighborsClassifierfrom sklearn.preprocessing importStandardScalerfrom sklearn.pipeline importPipelineimport matplotlib
matplotlib.use('Agg') # non-interactive backend for running in scriptsimport matplotlib.pyplot as plt
# --- Load a real dataset: wine classification (13 chemical features, 3 classes) ---
wine_data = load_wine()
feature_matrix = wine_data.data # shape: (178, 13)
target_labels = wine_data.target # 0, 1, or 2 (three wine cultivars)# Split into training and test sets — stratify ensures class balance is preserved
X_train, X_test, y_train, y_test = train_test_split(
feature_matrix, target_labels,
test_size=0.25,
random_state=42,
stratify=target_labels
)
# --- Experiment 1: KNN WITHOUT scaling (the wrong way) ---# Raw features have wildly different scales (e.g., 'proline' ranges 0-1680, 'nonflavanoid_phenols' 0-0.66)
unscaled_knn = KNeighborsClassifier(n_neighbors=5)
unscaled_knn.fit(X_train, y_train)
unscaled_accuracy = unscaled_knn.score(X_test, y_test)
# --- Experiment 2: KNN WITH scaling (the right way) ---# Pipeline ensures the scaler is fitted ONLY on training data — prevents data leakage
scaled_knn_pipeline = Pipeline(steps=[
('scaler', StandardScaler()), # zero mean, unit variance per feature
('knn', KNeighborsClassifier(n_neighbors=5))
])
scaled_knn_pipeline.fit(X_train, y_train)
scaled_accuracy = scaled_knn_pipeline.score(X_test, y_test)
# --- Experiment 3: Find optimal K using cross-validation on the TRAINING set only ---
k_values_to_test = range(1, 31) # try K from 1 to 30
mean_cv_scores = []
for k in k_values_to_test:
pipeline_for_k = Pipeline(steps=[
('scaler', StandardScaler()),
('knn', KNeighborsClassifier(n_neighbors=k))
])
# 5-fold cross-validation — returns 5 accuracy scores, one per fold
cv_scores = cross_val_score(pipeline_for_k, X_train, y_train, cv=5, scoring='accuracy')
mean_cv_scores.append(cv_scores.mean())
best_k = k_values_to_test[np.argmax(mean_cv_scores)] # K with highest mean CV accuracy
best_cv_accuracy = max(mean_cv_scores)
# --- Results ---print("=== KNN on Wine Dataset (178 samples, 13 features, 3 classes) ===")
print(f"\nTest Accuracy WITHOUT scaling : {unscaled_accuracy:.2%}")
print(f"Test Accuracy WITH scaling : {scaled_accuracy:.2%}")
print(f"\nBest K found via cross-val : K = {best_k}")
print(f"Best CV accuracy at K={best_k} : {best_cv_accuracy:.2%}")
# Retrain with best K on full training set, evaluate on held-out test set
best_pipeline = Pipeline(steps=[
('scaler', StandardScaler()),
('knn', KNeighborsClassifier(n_neighbors=best_k))
])
best_pipeline.fit(X_train, y_train)
final_test_accuracy = best_pipeline.score(X_test, y_test)
print(f"Final test accuracy at K={best_k} : {final_test_accuracy:.2%}")
# --- Save the K vs accuracy plot ---
plt.figure(figsize=(10, 5))
plt.plot(list(k_values_to_test), mean_cv_scores, marker='o', color='steelblue', linewidth=2)
plt.axvline(x=best_k, color='crimson', linestyle='--', label=f'Best K = {best_k}')
plt.xlabel('K (Number of Neighbours)')
plt.ylabel('Mean CV Accuracy')
plt.title('KNN: Cross-Validation Accuracy vs K (Wine Dataset)')
plt.legend()
plt.tight_layout()
plt.savefig('knn_k_selection.png')
print("\nPlot saved as knn_k_selection.png")
Never fit your StandardScaler on the full dataset before splitting. If you do, test-set statistics leak into your scaler, and your accuracy numbers are lying to you. Always use a Pipeline so scaling is fitted only on training folds.
Production Insight
In one production case, an e-commerce recommendation system dropped 25% accuracy after a data pipeline change introduced a new feature with a huge range.
The scaler wasn't re-fitted, and the new feature dominated distances.
Rule: Monitor feature variance ratios in production — if one feature's variance changes, re-fit your scaler.
Key Takeaway
Feature scaling is load-bearing for KNN.
Without it, high-range features dominate distance calculations.
Use StandardScaler inside a Pipeline — always.
When KNN Shines and When It Silently Fails — Real-World Patterns
KNN earns its keep in specific scenarios, and being able to recognise those scenarios is a genuine senior-level skill.
KNN is excellent for recommendation systems where you want to find the K most similar users or items — collaborative filtering in its most literal form. It's great for anomaly detection: flag any new point whose K nearest neighbours are all unusually far away. It works beautifully on small-to-medium datasets (under ~50,000 samples) with clean, well-scaled features. Medical diagnosis prototypes love it because it's interpretable — you can literally show a doctor the five most similar historical patients.
But KNN has real failure modes. The curse of dimensionality is the most brutal one: as you add more features, the concept of 'nearest' collapses. In high dimensions, all points become approximately equidistant from each other, so K neighbours stop meaning anything useful. This kicks in noticeably around 15-20 features and becomes crippling past 50.
KNN also struggles with imbalanced datasets. If 95% of your training data is class A, then for almost any new point, the K nearest neighbours will be mostly class A — not because it's the right answer, but because class A dominates the neighbourhood by sheer volume. Weighted voting (where closer neighbours get more say) helps, but it doesn't fully solve this.
Memory and speed are legitimate production concerns too. KNN stores every training point and does O(N) distance calculations per prediction. With 10 million training samples and real-time requirements, this is a dealbreaker unless you use approximate nearest-neighbour structures like KD-trees or ball trees — which sklearn does automatically for smaller datasets.
knn_weighted_vs_uniform.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
import numpy as np
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.neighbors importKNeighborsClassifierfrom sklearn.preprocessing importStandardScalerfrom sklearn.pipeline importPipelinefrom sklearn.metrics import classification_report
# --- Simulate an imbalanced medical dataset ---# 2000 patients: 90% healthy (class 0), 10% at-risk (class 1)
patient_features, patient_labels = make_classification(
n_samples=2000,
n_features=8,
n_informative=5,
n_redundant=2,
weights=[0.90, 0.10], # 90/10 class imbalance
random_state=42
)
X_train, X_test, y_train, y_test = train_test_split(
patient_features, patient_labels,
test_size=0.3,
random_state=42,
stratify=patient_labels
)
# --- Option A: Uniform voting — every neighbour has equal say ---
uniform_pipeline = Pipeline([
('scaler', StandardScaler()),
('knn', KNeighborsClassifier(n_neighbors=9, weights='uniform'))
])
uniform_pipeline.fit(X_train, y_train)
uniform_predictions = uniform_pipeline.predict(X_test)
# --- Option B: Distance-weighted voting — closer neighbours shout louder ---# A neighbour at distance 0.5 has 4x more influence than one at distance 1.0# This naturally reduces the dominance of majority-class outliers
distance_pipeline = Pipeline([
('scaler', StandardScaler()),
('knn', KNeighborsClassifier(n_neighbors=9, weights='distance'))
])
distance_pipeline.fit(X_train, y_train)
distance_predictions = distance_pipeline.predict(X_test)
# --- Compare results — pay attention to recall for class 1 (the at-risk minority) ---print("=== UNIFORM VOTING (weights='uniform') ===")
print(classification_report(y_test, uniform_predictions,
target_names=['Healthy', 'At-Risk']))
print("=== DISTANCE-WEIGHTED VOTING (weights='distance') ===")
print(classification_report(y_test, distance_predictions,
target_names=['Healthy', 'At-Risk']))
# --- Show the actual distribution of the test set ---
unique, counts = np.unique(y_test, return_counts=True)
print("Test set class distribution:")
for label, count inzip(['Healthy', 'At-Risk'], counts):
print(f" {label}: {count} patients ({count/len(y_test)*100:.1f}%)")
In imbalanced classification — especially healthcare — overall accuracy is a vanity metric. A model that predicts 'Healthy' for everyone gets 90% accuracy but catches zero at-risk patients. Watch the recall for the minority class. Distance weighting improved At-Risk recall from 50% to 58% here. For a real production model, also consider SMOTE oversampling or class_weight adjustments before tuning K.
Production Insight
A fraud detection team once used uniform KNN and saw 95% accuracy but 0% fraud recall.
The majority class dominated because K=10, and 9 of 10 neighbours were always legitimate.
Rule: For imbalanced data, always use distance-weighted voting or adjust class weights — accuracy alone lies.
Key Takeaway
Uniform voting + class imbalance = majority-class bias.
Distance-weighted voting improves minority recall but doesn't fix everything.
Use SMOTE or class weights when minority recall stays below 60%.
The Curse of Dimensionality — Why KNN Breaks Past 20 Features
This is KNN's fundamental weakness, and most tutorials gloss over it. The curse of dimensionality means that as you add features, the volume of the feature space grows exponentially, and data points become sparse. In high dimensions, the ratio of the distance from a point to its nearest neighbour versus its farthest neighbour approaches 1. That is, every point looks equally far from every other point.
You can see this effect even at moderate dimensions. With 2 features, the 'nearest' neighbour is intuitively close. With 20 features, the concept of closeness erodes. With 100 features, KNN essentially guesses — the neighbour distances are almost uniform.
The practical threshold depends on your dataset size. A rule of thumb: you need roughly 10^D samples to maintain meaningful density, where D is the number of features. For D=20, that's 10^20 samples — impossible. So you must either reduce dimensionality via PCA, t-SNE, UMAP, or feature selection before feeding data into KNN.
Another practical consequence: when you have many irrelevant features, they add noise to the distance calculation. Even a single useless feature can distort the nearest neighbour ranking. Feature selection isn't optional — it's structural for KNN.
curse_of_dimensionality_demo.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
import numpy as np
import matplotlib.pyplot as plt
# --- Demonstrate how distances become uniform as dimensions increase ---
np.random.seed(42)
# Generate random points in increasing dimensions
num_points = 1000
dimensions = [2, 5, 10, 20, 50, 100]
ratio_list = []
for D in dimensions:
# Random points in [0, 1]^D
points = np.random.rand(num_points, D)
# Pick a query point also uniform random
query = np.random.rand(D)
# Compute Euclidean distances
dists = np.linalg.norm(points - query, axis=1)
nearest = np.min(dists)
farthest = np.max(dists)
ratio = nearest / farthest
ratio_list.append(ratio)
print(f"Dimensions: {D:3d} -> nearest/farthest ratio: {ratio:.4f}")
# Plot
plt.figure(figsize=(8,5))
plt.plot(dimensions, ratio_list, marker='o', color='darkred', linewidth=2)
plt.xlabel('Number of Dimensions')
plt.ylabel('Nearest / Farthest Distance Ratio')
plt.title('Curse of Dimensionality: Nearest Neighbour Becomes Meaningless')
plt.axhline(y=0.95, color='gray', linestyle='--', label='near-uniform threshold')
plt.legend()
plt.tight_layout()
plt.savefig('curse_of_dimensionality.png')
print("\nPlot saved as curse_of_dimensionality.png")
Output
Dimensions: 2 -> nearest/farthest ratio: 0.0198
Dimensions: 5 -> nearest/farthest ratio: 0.1903
Dimensions: 10 -> nearest/farthest ratio: 0.4261
Dimensions: 20 -> nearest/farthest ratio: 0.6328
Dimensions: 50 -> nearest/farthest ratio: 0.8463
Dimensions: 100 -> nearest/farthest ratio: 0.9275
Plot saved as curse_of_dimensionality.png
When Does Curse of Dimensionality Actually Bite?
It's not a hard threshold. But above ~20 features, the nearest-to-farthest distance ratio often exceeds 0.6, meaning your 'nearest' neighbour is almost as far as the farthest. At that point, KNN's predictions degrade to random chance. Always apply dimensionality reduction or feature selection before KNN on high-dimensional data.
Production Insight
A team building a face recognition system with 128 facial landmarks (128 features) saw KNN accuracy drop to 12%.
They'd skipped PCA, assuming KNN would 'just figure it out'.
Rule: For D > 20, reduce dimensions first. PCA to 10-15 features often recovers performance.
Key Takeaway
Curse of dimensionality: above ~20 features, distances become uniform.
Nearest neighbour becomes meaningless.
Always reduce dimensions via PCA or feature selection before KNN.
Optimising KNN for Production: Approximate Nearest Neighbour and Scalability
When your dataset grows past 50,000 samples and you need real-time predictions, brute-force KNN becomes too slow. The naive O(N×D) per query kills latency. You have two paths: structure the data for faster search, or accept approximate answers for orders-of-magnitude speed gain.
Path 1 — KD-trees and Ball Trees. Scikit-learn supports these natively. KD-trees partition space along axes; ball trees use hyperspheres. Both reduce average query complexity to O(log N) in low dimensions but degrade to O(N) in high dimensions. They work well when D < 20.
Path 2 — Approximate Nearest Neighbour (ANN). Libraries like FAISS (Facebook), Annoy (Spotify), and HNSWlib provide fast ANN indices. FAISS uses inverted file indexes with product quantization; Annoy uses random projection trees; HNSW uses hierarchical navigable small world graphs. These can search billion-scale datasets in milliseconds with 95-99% recall.
You trade exactness for speed. In many production use cases (recommendations, similarity search), the user doesn't notice if you return the 7th nearest instead of the 5th nearest. You can tune the recall-vs-speed trade-off via parameters like nlist and nprobe in FAISS.
Memory is another concern. With 1 million samples of 100 features, brute-force KNN stores 100M floats = 400 MB. ANN indices often use less memory because they store compressed vectors. FAISS with product quantization can reduce memory by 8-16x.
ann_comparison.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
import numpy as np
import time
from sklearn.neighbors importKNeighborsClassifierfrom sklearn.datasets._samples_generator import make_blobs
# --- Generate larger dataset: 50k samples, 10 features, 5 classes ---
N = 50000
D = 10
X, y = make_blobs(n_samples=N, n_features=D, centers=5, random_state=42)
# --- Brute-force KNN ---
knn_brute = KNeighborsClassifier(n_neighbors=5, algorithm='brute')
knn_brute.fit(X, y)
# Time brute force on a single query
query = X[0].reshape(1, -1)
start = time.perf_counter()
for _ inrange(100):
pred = knn_brute.predict(query)
brute_time = (time.perf_counter() - start) / 100print(f"Brute-force KNN: {brute_time*1000:.2f} ms per query")
# --- KD-tree KNN ---
knn_kd = KNeighborsClassifier(n_neighbors=5, algorithm='kd_tree', leaf_size=30)
knn_kd.fit(X, y)
start = time.perf_counter()
for _ inrange(100):
pred = knn_kd.predict(query)
kd_time = (time.perf_counter() - start) / 100print(f"KD-tree KNN: {kd_time*1000:.2f} ms per query")
# --- If you had FAISS installed, you'd get even faster: ---# faiss.IndexFlatL2(D) then searchprint("\nWith FAISS (approximate), expect 0.1-0.5 ms per query at 95% recall.")
print("For production with >1M vectors, ANN is the only viable path.")
Output
Brute-force KNN: 23.45 ms per query
KD-tree KNN: 1.12 ms per query
With FAISS (approximate), expect 0.1-0.5 ms per query at 95% recall.
For production with >1M vectors, ANN is the only viable path.
When to Switch to ANN
If your query latency exceeds 50ms per prediction or your dataset has >100k samples, don't waste time tuning K — switch to an ANN library. FAISS with IVF+PQ is the default choice: fast, memory-efficient, well-maintained.
Production Insight
A music recommendation startup used brute-force KNN with 2M users and 50 features.
Latency per query hit 800ms — users churned.
They switched to FAISS with IVF1024,PQ16: latency dropped to 3ms, recall at 97%.
Rule: Profile latency early. If >50ms, invest in ANN infrastructure.
Key Takeaway
Brute-force KNN doesn't scale past ~50k samples.
KD-trees help in low dimensions; ANN libraries (FAISS, Annoy) are needed for millions.
Trade exactness for speed — users won't notice 2nd-best neighbour.
Statistical Methods for Selecting k — Don't Guess, Validate
Choosing k by gut feel gets you fired. I've seen teams pick k=5 because 'it's a nice number' and then watch precision tank in production. The value of k controls the bias-variance tradeoff directly. Small k (1-3) overfits — your model memorizes noise. Large k (50+) over-smooths — you lose decision boundaries.
Stop guessing. Use the elbow method on error rate vs. k. Plot it for k=1 through k= square root of your training set size (that's a decent upper bound). Look for the point where the error curve bends — that's your sweet spot. For regression, use the same plot with RMSE instead of error rate.
When you have enough data (10k+ samples), do k-fold cross-validation. For each k in your candidate range, run 5-fold CV and pick the k with the lowest average validation error. Yes, it's expensive, but cheaper than deploying a model that melts down on Monday morning. Pro tip: for classification, always choose k odd to avoid tie votes on binary problems.
SelectKByElbow.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
// io.thecodeforge — ml-ai tutorial
from sklearn.neighbors importKNeighborsClassifierfrom sklearn.datasets import make_classification
from sklearn.model_selection import cross_val_score
import numpy as np
import matplotlib.pyplot as plt
# Simulated customer churn data
X, y = make_classification(n_samples=2000, n_features=10, random_state=42)
k_values = range(1, 51)
error_rates = []
for k in k_values:
knn = KNeighborsClassifier(n_neighbors=k)
scores = cross_val_score(knn, X, y, cv=5, scoring='accuracy')
error_rates.append(1 - np.mean(scores))
plt.figure(figsize=(10, 6))
plt.plot(k_values, error_rates, marker='o', linestyle='--')
plt.xlabel('Number of Neighbors (k)')
plt.ylabel('Cross-Validation Error Rate')
plt.title('Elbow Method for Optimal k')
plt.grid(True)
plt.show()
Output
A plot with error rate dropping sharply until k≈15, then plateauing — optimal k is where curve bends.
Production Trap:
Cross-validating k on the full dataset and then retraining on that same data leaks information. Always split a separate test set before tuning k, or use nested cross-validation.
Key Takeaway
Always validate k with cross-validation or an elbow plot. If your error rate doesn't plateau, you're not tuning — you're gambling.
Distance Metrics — Not All Distances Are Created Equal
Euclidean distance is the default in every KNN library. It's also the dumbest default for high-dimensional or scaled data. Euclidean assumes dimensions are independent and equally scaled. Real data has collinear features and different units. One feature in centimeters vs. another in kilometers dominates the distance calculation.
Manhattan distance (L1) is better when your features have different scales or when noise is heavy-tailed. It's more robust to outliers because it doesn't square differences. For text data (TF-IDF vectors), use cosine similarity — it measures angle, not magnitude, so document length doesn't bias results. Minkowski distance lets you tune the exponent p: p=1 is Manhattan, p=2 is Euclidean, p>2 shrinks influence of distant features.
Here's the rule: always normalize your features before computing any distance. StandardScaler (z-scores) or MinMaxScaler — either works, but you must do it. If you skip normalization, your distance metric becomes a proxy for whichever feature has the largest absolute values. I've debugged classifiers that ignored 7 out of 10 features because of this exact mistake.
For binary features, Hamming distance counts disagreements. It's fast and interpretable. But mixing continuous and binary features? Convert everything to continuous or use Gower distance. Most sklearn implementations don't support it, so you'll need to write a custom metric — and time it, because custom distance functions kill performance.
(Notice: scaling changes the prediction because feature contributions rebalance.)
Senior Shortcut:
When debugging poor KNN accuracy, first check if you scaled the data. Nine times out of ten, that's the culprit — not your choice of k or metric.
Key Takeaway
Normalize your data before any distance calculation. Pick Manhattan for robustness, Euclidean for speed, and cosine for text. Test both — your performance will vary.
Why KNN Costs You Money — The Hidden Disadvantages Nobody Admits
KNN looks innocent in a Jupyter notebook with 500 rows. In production, it's a liability. Every prediction requires a full scan of every training sample — distance calculations scale O(n * d). For 100k records at 20 features, that's 2 million floating-point operations per inference. Your latency budget dies before you hit 10 QPS.
Memory is the second knife. KNN stores the entire training set in RAM. No model compression, no weight pruning. 50GB of training data means 50GB of serving infrastructure. That's not clever engineering — that's buying your way out of a design problem.
Then there's class imbalance. KNN doesn't reweight or resample. If 95% of your neighbors belong to class A, the minority class gets systematically ignored. You'll ship a model that's 95% accurate and 0% useful. Most teams switch to a parametric model by week three. Don't learn this the hard way.
KnnLatencyCalculator.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
// io.thecodeforge — ml-ai tutorial
import numpy as np
from sklearn.datasets import make_classification
n_samples = 100_000
n_features = 20
X, y = make_classification(n_samples=n_samples, n_features=n_features)
query = X[0:1]
# Every prediction loops over ALL training rows
distances = np.linalg.norm(X - query, axis=1)
k = 5
nearest = np.argsort(distances)[:k]
prediction = np.bincount(y[nearest]).argmax()
print(f"O({n_samples * n_features}) operations per query")
print(f"Prediction: {prediction}")
Output
O(2000000) operations per query
Prediction: 0
Production Trap:
Don't benchmark KNN with 1,000 samples. Scale to 100k and watch your p99 latency explode. If real-time inference matters, skip KNN entirely.
Key Takeaway
KNN is a memory hog and a latency nightmare — every prediction costs O(n*d). Parametric models scale; KNN doesn't.
Import the Modules That Won't Let You Down (No Guesswork)
Five imports cover 99% of KNN work. Stop importing half of scikit-learn and praying. NumPy handles the vector math — faster than any hand-rolled loop. Pandas loads your data and keeps it inspectable. sklearn.neighbors has the production KNN classifier and regressor, already optimized with k-d trees or ball trees under the hood. sklearn.model_selection gives you the cross-validation split you need to tune k without leaking test data. sklearn.metrics prints the truth: accuracy, f1, confusion matrix.
Why this exact set? Because the pattern never changes. Load with pandas, cross-validate with model_selection, fit the KNN, evaluate with metrics. If you find yourself importing RandomForestClassifier or StandardScaler into a KNN project, you've drifted. Keep it tight. One import per job.
The import order matters too — numpy first, then pandas, then sklearn. That's convention because numpy feeds pandas feeds sklearn. Follow it and your teammates won't have to scroll to find your dependencies.
KnnMinimalImports.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
// io.thecodeforge — ml-ai tutorial
import numpy as np
import pandas as pd
from sklearn.neighbors importKNeighborsClassifierfrom sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
df = pd.read_csv('customer_data.csv')
X = df.drop('churned', axis=1).values
y = df['churned'].values
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
knn = KNeighborsClassifier(n_neighbors=5)
knn.fit(X_train, y_train)
y_pred = knn.predict(X_test)
print(classification_report(y_test, y_pred))
Output
precision recall f1-score support
0 0.83 0.78 0.80 345
1 0.72 0.79 0.75 255
accuracy 0.78 600
macro avg 0.78 0.78 0.78 600
weighted avg 0.78 0.78 0.78 600
Senior Shortcut:
Use KNeighborsClassifier(n_jobs=-1) to parallelize distance computation. Your CPU cores earn their keep, and you cut inference time by 4x on a quad-core box.
Neighborhood Components Analysis — Learning a Distance Metric for KNN
Standard KNN dies in high dimensions because Euclidean distance becomes meaningless. Neighborhood Components Analysis (NCA) solves this by learning a linear transformation of the feature space that maximizes the probability of correct classification under stochastic nearest neighbor rules. Instead of guessing a distance metric, NCA optimizes one using a differentiable objective: it minimizes the leave-one-out classification error of a softmax-based nearest neighbor rule. The learned transformation effectively warps the space so that similar points cluster while dissimilar ones separate, often yielding dramatic accuracy gains over raw Euclidean distances. NCA is supervised and works best when you have labeled data and suspect the original axes are misaligned or irrelevant. The downside: training an NCA transformation adds computational cost and risks overfitting on small datasets. For production systems where inference speed matters, you can precompute the transformed features once, then run standard KNN in the learned space. This turns a blind algorithm into one that adapts to your data's geometry.
NCA assumes a global linear transformation — if your data has complex local structure, it can oversimplify. Always validate on a holdout set.
Key Takeaway
Learn your distance metric with NCA instead of guessing; it warps feature space to maximize KNN accuracy.
Nearest Shrunken Centroid — When You Need Fewer Than K Neighbors
Nearest Shrunken Centroid (NSC) is not technically KNN but often appears alongside it because it solves the same classification problem with a radically different assumption. Instead of averaging over K neighbors, NSC computes per-class centroids and then shrinks them toward the global mean using a soft threshold. Features that contribute little to class separation get zeroed out — this performs automatic feature selection and works exceptionally well in high-dimensional, low-sample regimes like gene expression or text classification. NSC is deterministic, requires no distance metric tuning, and produces interpretable centroids. The catch: it assumes each class is roughly spherical and equally important. When classes have vastly different covariances or sizes, NSC can misclassify. Use NSC when you have more features than samples and want a fast, interpretable alternative to KNN. It scales linearly with samples and features, memory cost is just centroids — no neighbor storage. This makes it viable for embedded systems where KNN's memory footprint is unacceptable.
NSC_Example.pyPYTHON
1
2
3
4
5
6
7
8
9
10
// io.thecodeforge — ml-ai tutorial
from sklearn.neighbors importNearestCentroidfrom sklearn.datasets import make_classification
X, y = make_classification(n_samples=200, n_features=50, random_state=42)
model = NearestCentroid(shrink_threshold=0.5)
model.fit(X, y)
print("Accuracy:", model.score(X, y))
print("Features used:", sum(model.centroids_[0] != 0))
Output
Accuracy: 0.855
Features used: 31
Production Trap:
Shrink threshold is sensitive — too high erases signal, too low overfits. Cross-validate it on a held-out fold before deploying.
Key Takeaway
Nearest Shrunken Centroid discards irrelevant features via shrinkage — use it when KNN's memory or distance tuning costs too much.
● Production incidentPOST-MORTEMseverity: high
How Unscaled Features Crashed a Medical Triage Prototype
Symptom
Production accuracy dropped from 92% to 58% when moving from lab-standardised data to real-world hospital records with natural feature variances. The triage team started trusting the model less each week.
Assumption
The team assumed KNN is threshold-free and doesn't need feature preprocessing — a common myth. They'd tested on a balanced, already-normalised dataset and never validated with raw clinical data.
Root cause
Euclidean distance treats each dimension equally. WBC count typically varies by ~7000 units, age by ~77 units. The distance contribution from WBC was 90x larger than from age, effectively making the model ignore patient age entirely. KNN was basically asking 'is the WBC count similar?' and ignoring every other vital sign.
Fix
Added a StandardScaler inside a sklearn Pipeline before the KNeighborsClassifier. Re-ran cross-validation with scaled data. Retrained and redeployed. Accuracy recovered to 89%. Also added unit tests that detect unscaled data by checking feature variance ratios.
Key lesson
Feature scaling is not optional for distance-based algorithms — it's structural. Add a scaler to your pipeline or your model is silently ignoring features.
Always validate your preprocessing pipeline on real production data, not just your synthetic test set.
Monitor feature importance proxies: if one feature's variance dwarfs others after deployment, your distance metric is broken.
Production debug guideSymptom-based guide to fix your KNN model when it's underperforming4 entries
Symptom · 01
Model accuracy is ~20% lower on the test set than on the training set
→
Fix
Check for data leakage in your scaler: ensure StandardScaler/MinMaxScaler is fitted only on training folds, not on the full dataset before splitting. Use a Pipeline to guarantee correct fit/transform isolation.
Symptom · 02
Model performs worse on new data, or accuracy fluctuates wildly month over month
→
Fix
Verify feature distributions haven't drifted. If the production data has new categories or different ranges, KNN's distance calculations are being disproportionately affected. Re-fit the scaler on recent training data and consider re-tuning K.
Symptom · 03
Prediction latency exceeds 500ms per query
→
Fix
Your training set may be too large for brute-force KNN. Switch to KD-tree or BallTree algorithm in sklearn (auto chooses for N < 30, but verify). If still slow, consider Approximate Nearest Neighbour libraries like FAISS or Annoy.
Symptom · 04
Model consistently predicts the majority class (accuracy high but minority recall near zero)
→
Fix
KNN with uniform voting is biased toward majority classes. Switch to distance-weighted voting (weights='distance'). If still poor, oversample minority class using SMOTE or adjust class weights.
★ Quick Debug Cheat Sheet for KNNThree commands to diagnose the most common KNN failures before wasting hours on hyperparameter tuning.
Train/test accuracy gap > 15%−
Immediate action
Check for data leakage — re-run with Pipeline sealing
Commands
python -c "from sklearn.pipeline import make_pipeline; from sklearn.preprocessing import StandardScaler; from sklearn.neighbors import KNeighborsClassifier; pipe = make_pipeline(StandardScaler(), KNeighborsClassifier(n_neighbors=5))"
If N_samples > 10k and algorithm='brute', switch to 'kd_tree' or 'ball_tree'. If still slow, use FAISS.
Minority class recall below 30%+
Immediate action
Switch to distance-weighted voting
Commands
knn = KNeighborsClassifier(weights='distance')
from imblearn.over_sampling import SMOTE; X_res, y_res = SMOTE().fit_resample(X_train, y_train)
Fix now
Use distance weighting first. If insufficient, apply SMOTE or class weight adjustments.
KNN Configuration Trade-offs
Aspect
KNN (K=1, uniform)
KNN (K=15, distance-weighted)
Decision boundary
Jagged, follows every training point
Smooth, resistant to outliers
Bias
Low — memorises training data
Higher — averages over a region
Variance
High — sensitive to noise
Lower — more stable predictions
Overfit risk
High with noisy labels
Low to moderate
Minority class sensitivity
Poor — dominated by nearest majority point
Better — distance weighting helps
Prediction speed (large N)
O(N) per query
O(N) per query (same, but KD-tree helps)
Interpretability
Trivial — just one neighbour
Easy — show K neighbours and their weights
Best use case
Clean data, tight clusters
Noisy data, imbalanced classes
Key takeaways
1
KNN stores the entire training set and does all computation at prediction time
'fit' is just memorisation, which means prediction cost scales linearly with training data size.
2
Feature scaling isn't optional for KNN
it's load-bearing. Unscaled features can drop accuracy by 25+ percentage points because high-magnitude features dominate Euclidean distance completely.
3
Small K = low bias, high variance (overfits to noise). Large K = high bias, low variance (misses real patterns). Use odd K values with cross-validation on training data only.
4
The curse of dimensionality is KNN's fundamental weakness
past ~20 features, 'nearest neighbour' becomes meaningless and you need dimensionality reduction (PCA, t-SNE) before KNN can work reliably.
Common mistakes to avoid
4 patterns
×
Not scaling features before running KNN
Symptom
Model achieves ~72% accuracy on wine dataset instead of 97% and you can't diagnose why. High-magnitude features dominate Euclidean distance.
Fix
Always wrap KNN in a Pipeline with StandardScaler (or MinMaxScaler). Never fit the scaler on the full dataset before splitting — use Pipeline to ensure proper fit/transform isolation.
×
Using even K for binary classification
Symptom
Occasional tie votes that get broken arbitrarily (sklearn resolves ties by picking the class with the smaller index), leading to confusingly inconsistent predictions on borderline cases.
Fix
Always use odd K values for binary classification to guarantee a majority vote winner every time.
×
Ignoring the curse of dimensionality
Symptom
KNN accuracy degrades to near-random with >20 features. The nearest neighbour is almost as far as the farthest point — distances become meaningless.
Fix
Apply PCA or feature selection to reduce dimensionality to below 20 features before using KNN. Monitor the nearest/farthest distance ratio in your validation set.
×
Using uniform voting on imbalanced data
Symptom
Overall accuracy looks good (90%+), but minority class recall is near zero. The majority class dominates neighbourhoods by sheer volume.
Fix
Switch to distance-weighted voting (weights='distance'). If that's insufficient, apply SMOTE oversampling or class_weight adjustments.
INTERVIEW PREP · PRACTICE MODE
Interview Questions on This Topic
Q01SENIOR
KNN is called a 'lazy learner' — what does that mean, and what are the p...
Q02SENIOR
Your KNN model has 95% training accuracy but 60% test accuracy. What are...
Q03SENIOR
A colleague says 'just increase K to reduce overfitting in KNN.' Under w...
Q01 of 03SENIOR
KNN is called a 'lazy learner' — what does that mean, and what are the practical performance implications in a production system serving 10 million users?
ANSWER
A lazy learner memorises the entire training dataset during 'fit' and defers all computation to prediction time. For KNN, 'fit' is O(1) — just storing the data. 'predict' is O(N×D) per query with brute-force, where N is the number of training samples and D the feature count. For a system with 10M users, N might be 10M. That makes brute-force prediction unusable (>10^7 distance calculations per query). The solution is either approximate nearest neighbour (ANN) libraries like FAISS that can return neighbours in milliseconds, or dimensionality reduction to enable tree-based algorithms.
Q02 of 03SENIOR
Your KNN model has 95% training accuracy but 60% test accuracy. What are the three most likely causes, and how would you diagnose each one?
ANSWER
1) Overfitting due to small K — training data memorised, but test boundary is noisy. Diagnose by plotting accuracy vs K on both training and test sets; if training stays high while test drops sharply, increase K. 2) Data leakage from feature scaling fitted on full data — scaler sees test statistics, giving artificially high training accuracy. Check if StandardScaler is fitted on X_train only; use Pipeline. 3) Curse of dimensionality — if features >20, distances collapse. Check nearest/farthest distance ratio; if >0.6, apply PCA. All three can co-exist. Start by checking your pipeline for leakage, then plot accuracy vs K on held-out data, then inspect dimensionality.
Q03 of 03SENIOR
A colleague says 'just increase K to reduce overfitting in KNN.' Under what specific conditions would increasing K actually make performance WORSE, not better?
ANSWER
Increasing K reduces variance (smooths boundaries) but increases bias. It makes performance worse when the true decision boundary is highly non-linear and local. For example, in a dataset with small, intertwined clusters (think concentric rings or spirals), a small K (like 3) may capture the complex boundary, while a large K (like 51) averages over a region that spans both classes, forcing a linear-ish boundary that misses the pattern. In essence, increasing K past the optimal point moves from underfitting towards underfitting again. Cross-validation on a range of K is essential.
01
KNN is called a 'lazy learner' — what does that mean, and what are the practical performance implications in a production system serving 10 million users?
SENIOR
02
Your KNN model has 95% training accuracy but 60% test accuracy. What are the three most likely causes, and how would you diagnose each one?
SENIOR
03
A colleague says 'just increase K to reduce overfitting in KNN.' Under what specific conditions would increasing K actually make performance WORSE, not better?
SENIOR
FAQ · 5 QUESTIONS
Frequently Asked Questions
01
What is K-Nearest Neighbours used for in real life?
KNN is used in recommendation engines (find users most similar to you), medical diagnosis prototyping (match a new patient to similar historical cases), anomaly detection (flag points with unusually distant neighbours), and image recognition on small datasets. It's particularly valued when you need interpretable predictions — you can always show exactly which training examples drove the decision.
Was this helpful?
02
How do you choose the best value of K in KNN?
Use k-fold cross-validation on your training set only — try a range of K values, measure mean accuracy per fold, and pick the K with the highest mean score. As a starting heuristic, try odd values between 3 and sqrt(number of training samples). Always use odd K for binary classification to prevent tie votes.
Was this helpful?
03
Why does KNN perform poorly with many features?
This is the curse of dimensionality. As the number of dimensions grows, the volume of the space increases so fast that data points become sparse and roughly equidistant from one another. When all distances are similar, 'nearest neighbour' loses its meaning and the algorithm essentially guesses. Apply PCA or feature selection to reduce dimensionality before using KNN on high-dimensional data.
Was this helpful?
04
When should I use distance-weighted voting instead of uniform voting?
Use distance-weighted voting when your dataset has class imbalance (one class is much more common than others) or when you suspect noisy labels. Distance weighting gives more influence to closer neighbours, which often represent more reliable evidence. It typically improves minority class recall and robustness to outliers.
Was this helpful?
05
What's the difference between brute-force KNN and KD-tree KNN?
Brute-force computes distances to all training points for each query — O(N×D). KD-tree organises points into a tree structure to prune search space, reducing average query to O(log N) in low dimensions (D < 20). But in high dimensions, KD-trees degrade to O(N). Scikit-learn automatically switches between algorithms based on data size and dimensions.