Senior 4 min · March 06, 2026
Keras for Deep Learning

Keras OOM Error Kills Training — Fix GPU Memory Growth

Keras training stops mid-epoch with CUDA_OOM when TensorFlow claims all GPU memory.

N
Naren Founder & Principal Engineer

20+ years shipping production ML systems and the infrastructure behind them. Written from production experience, not tutorials.

Follow
Production
production tested
May 24, 2026
last updated
1,554
articles · all by Naren
 ● Production Incident 🔎 Debug Guide ⚙ Triage Commands
Quick Answer
  • Keras is a high-level API for building neural networks on top of TensorFlow
  • Sequential API for linear stacks, Functional API for complex graphs with multiple inputs/outputs
  • Keras models run on GPU automatically via TensorFlow backend — no manual device placement needed
  • Using callbacks like EarlyStopping and ModelCheckpoint can reduce training time by 30-50%
  • Biggest mistake: not normalizing input data — neural networks fail to converge without scaled inputs
✦ Definition~90s read
What is Keras for Deep Learning?

Keras is a high-level neural network API designed for fast experimentation, originally developed by François Chollet as a user-friendly frontend to lower-level deep learning frameworks. It abstracts away the complexity of tensor operations, gradient computation, and graph construction, letting you define and train models in a few lines of Python.

Think of building a neural network like constructing a skyscraper.

The core problem Keras solves is the steep learning curve of frameworks like TensorFlow or PyTorch — you get a clean, composable interface for layers, optimizers, and loss functions without needing to manage computational graphs manually. In practice, Keras sits on top of TensorFlow (since TensorFlow 2.0, it's the official high-level API), meaning every Keras call ultimately translates to TensorFlow ops.

This is why OOM errors often feel like a Keras issue but are actually a TensorFlow memory management problem: TensorFlow by default grabs all available GPU memory, and Keras doesn't override that behavior unless you explicitly configure GPU memory growth. For production deployments, you'd typically use Keras models via TensorFlow Serving or convert them to TensorFlow Lite for edge devices, but the memory growth setting must be applied before any model construction.

Alternatives include PyTorch's dynamic graphs (which give finer control over memory) or JAX for functional transformations, but Keras remains the go-to for rapid prototyping and standard architectures like CNNs, RNNs, and transformers. When not to use Keras: if you need custom gradient flows, distributed training with exotic topologies, or memory-constrained environments where every MB counts — you'll fight the abstraction layer more than it helps.

Plain-English First

Think of building a neural network like constructing a skyscraper. You could mix your own concrete, forge your own steel, and hand-wire every elevator — or you could hire a construction company that handles all that and lets you focus on the building's design. Keras is that construction company. It sits on top of TensorFlow and handles the low-level math so you can focus on designing your model architecture. You describe the floors (layers), the blueprint (loss function), and the safety inspections (metrics) — Keras builds it.

Every few years, a tool comes along that lowers the barrier to an entire field without lowering the ceiling. Keras did that for deep learning. Before Keras, building a neural network meant wrestling with raw TensorFlow graphs, manually wiring forward passes, and debugging tensor shape mismatches at 2am. Keras changed the economics of that work — research teams at Google, Netflix, and Airbnb adopted it because it meant fewer lines of code and faster iteration, not because it was a toy.

The real problem Keras solves isn't syntax — it's cognitive load. Deep learning has enough hard problems: choosing the right architecture, fighting overfitting, tuning hyperparameters. When your framework forces you to also manage computational graphs and session lifecycles, you spend your mental budget on plumbing instead of thinking. Keras abstracts the plumbing without hiding it from you when you need it. You can go shallow (Sequential API) for straightforward models or go deep (Functional API, custom layers) when your problem demands it.

By the end of this article you'll understand exactly when to use the Sequential API versus the Functional API, how to build a real image classifier with proper training loops, how to use callbacks to stop wasting GPU time, and what the three mistakes nearly every beginner makes in Keras — and how to sidestep them completely.

Why Keras OOM Errors Are a Memory Management Problem, Not a Model Size Problem

Keras is a high-level neural network API that runs on top of TensorFlow, designed for fast prototyping and production deployment. Its core mechanic is the sequential or functional composition of layers into a computational graph, which TensorFlow then executes on CPU, GPU, or TPU. The OOM error occurs when the GPU memory allocated by TensorFlow's default behavior exceeds the available VRAM, killing the training process entirely.

By default, TensorFlow pre-allocates all available GPU memory at the start of a session, regardless of actual model size. This means even a small model can trigger an OOM if another process holds any VRAM. The fix is to enable memory growth, which allocates memory on demand, starting small and growing only as needed. This is a one-line configuration change that prevents the entire training job from crashing due to memory fragmentation or concurrent GPU usage.

In practice, you use Keras with memory growth enabled when running multiple experiments on a shared GPU, serving models alongside training, or debugging memory leaks. Without it, a single OOM kills the entire training run, wasting hours of compute. This is not a model size issue — it's a resource management issue that every production ML engineer must handle explicitly.

Default Behavior Is Dangerous
TensorFlow's default GPU memory allocation grabs all VRAM upfront — a single OOM kills training even if your model fits, because another process holds a tiny fraction.
Production Insight
Teams running multiple training jobs on a shared GPU cluster see random OOM kills when jobs overlap, even with small models.
The exact failure: 'Resource exhausted: OOM when allocating tensor with shape[...]' — no partial recovery, entire process terminates.
Rule of thumb: always call tf.config.experimental.set_memory_growth(gpu, True) before any Keras model build or fit call.
Key Takeaway
OOM in Keras is almost never about model size — it's about TensorFlow's greedy GPU allocation.
Enable memory growth once, at the start of every script, to allocate VRAM lazily and avoid crashes.
Always check for concurrent GPU processes with nvidia-smi before training — memory growth alone won't fix a fully occupied GPU.
Keras OOM Error Fix: GPU Memory Growth THECODEFORGE.IO Keras OOM Error Fix: GPU Memory Growth Architecture and best practices to avoid memory exhaustion Keras on TensorFlow Backend High-level API with eager execution GPU Memory Allocation Default: pre-allocates all GPU memory OOM Error on Large Models Training fails due to memory exhaustion Enable Memory Growth tf.config.experimental.set_memory_growth Custom Validation Callback Avoid model.evaluate() blind usage Stable Training at Scale Memory-efficient model deployment ⚠ Default GPU memory pre-allocation causes OOM Always enable memory growth before model construction THECODEFORGE.IO
thecodeforge.io
Keras OOM Error Fix: GPU Memory Growth
Keras Deep Learning

The Keras Architecture: Why It Sits on Top of TensorFlow

Keras is not a standalone library; it's a high-level API specification. Think of it as the UI for the powerful TensorFlow engine. In the early days, you had to choose between the 'user-friendliness' of Keras and the 'power' of TensorFlow. Today, they are one and the same. By using Keras, you are writing TensorFlow code, but through a lens that prioritizes developer experience and modularity.

At the Forge, we emphasize the two primary ways to build models: the Sequential API (for stacks of layers where each layer has exactly one input and one output) and the Functional API (for complex models with multiple inputs, shared layers, or non-linear topology). Mastery of both is what separates a script-kiddie from a production engineer.

model_definition.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
import tensorflow as tf
from tensorflow.keras import layers, models

# io.thecodeforge: Standard Sequential model for binary classification
def build_forge_model(input_shape):
    model = models.Sequential([
        layers.Input(shape=input_shape),
        layers.Dense(64, activation='relu'),
        layers.Dropout(0.2), # Standard Forge practice to prevent overfitting
        layers.Dense(32, activation='relu'),
        layers.Dense(1, activation='sigmoid')
    ])
    
    model.compile(
        optimizer='adam',
        loss='binary_crossentropy',
        metrics=['accuracy']
    )
    return model

# Initialize and summarize
my_model = build_forge_model((10,))
my_model.summary()
Output
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense (Dense) (None, 64) 704
... (output truncated)
Total params: 2,753 (10.75 KB)
Forge Tip: Use the Functional API for Production
While the Sequential API is great for learning, the Functional API is more robust for production. It allows you to create models that handle multiple data streams simultaneously—like a model that takes both an image and metadata as inputs.
Production Insight
Sequential API fails silently when layers require multiple inputs.
This is why the Functional API dominates production code.
Rule: start with Sequential for prototyping, switch to Functional before shipping.
Key Takeaway
The API choice determines your ceiling.
Sequential for simple stacks, Functional for real-world graphs.
Rule: if your model has branches, you need the Functional API.

Deploying Keras Models at Scale

Building the model is only half the battle. To make it work in a production environment, you need to ensure the environment is reproducible. This is where Docker comes in. We wrap our Keras training and inference scripts in a container so that local CUDA issues don't break our production pipeline.

DockerfileDOCKERFILE
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# io.thecodeforge: Production Deep Learning Environment
FROM tensorflow/tensorflow:latest-gpu

WORKDIR /app

# Install standard ML stack dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY . .

# Use environment variables for model paths
ENV MODEL_NAME=forge_classifier_v1

EXPOSE 8501
CMD ["python", "inference_service.py"]
Output
Successfully built image io.thecodeforge/keras-app:latest
Deployment Strategy:
Always use the GPU-tagged base images if your production environment supports it. Training on CPUs is a common beginner mistake that leads to 10x slower iteration cycles.
Production Insight
GPU image without proper CUDA drivers fails during model.fit().
Always pin TensorFlow version to specific CUDA version.
Rule: use Docker images with matching TF/CUDA versions.
Key Takeaway
Containerization is not optional for ML.
Reproducible environments save weeks of debugging.
Rule: never train or serve without a Dockerfile.

Persistence and Integration: Saving Model State

A model is useless if it disappears when the Python process ends. In an enterprise setting, you often need to save model architecture and weights separately or log training metadata to a database. Here is how we track model versioning in a SQL-compliant environment.

io/thecodeforge/models/schema.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
-- Production-grade schema for model version tracking
CREATE TABLE IF NOT EXISTS io.thecodeforge.trained_models (
    model_id UUID PRIMARY KEY,
    model_name VARCHAR(100) NOT NULL,
    accuracy DECIMAL(5,4),
    loss DECIMAL(5,4),
    weights_path TEXT NOT NULL,
    trained_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Log a successful training run
INSERT INTO io.thecodeforge.trained_models (model_id, model_name, accuracy, loss, weights_path)
VALUES (gen_random_uuid(), 'ResNet50_V2', 0.9421, 0.1023, '/mnt/models/weights/v1.h5');
Output
1 row inserted successfully.
Data Governance:
Never save raw weights directly in your database. Store them in a blob storage (like S3) and keep the file reference (URI) in your SQL table to ensure high-performance querying.
Production Insight
Saving models with .h5 format can lead to compatibility issues across TensorFlow versions.
Use SavedModel format for production to ensure forward compatibility.
Rule: always save in SavedModel format for deployment.
Key Takeaway
Model persistence is not optional; version your trained models in a registry.
Store weights in blob storage, not in database blobs.
Rule: automate model versioning from day one.

Data Preprocessing and Normalization for Neural Networks

Neural networks are sensitive to the scale of input data. Features with large numerical ranges (e.g., pixel values 0–255, income in thousands) can dominate the gradient updates, causing training to diverge or converge slowly. Keras provides the tf.keras.utils.normalize and layers.Rescaling to handle this inside the model.

A common production pattern is to integrate preprocessing directly into the model using the Functional API. This ensures the same normalization logic is applied during inference without additional code. For image data, typical ranges are [0,1] or [-1,1]. For tabular data, standard scaling (zero mean, unit variance) is preferred.

preprocessing.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
import tensorflow as tf

# Normalize pixel values inline with Rescaling layer
inputs = tf.keras.Input(shape=(224, 224, 3))
x = tf.keras.layers.Rescaling(1./255)(inputs)  # [0,255] -> [0,1]
# Continue with model...

# For tabular data, store statistics from training
mean = [0.485, 0.456, 0.406]  # example ImageNet means
std = [0.229, 0.224, 0.225]
normalizer = tf.keras.layers.Normalization(mean=mean, variance=std**2)
x = normalizer(x)
Output
No output — layer integration ensures portability.
Production Insight
Missing normalization is the #1 cause of model convergence failure in production.
A team spent 2 weeks debugging a classifier that only worked in Jupyter — the inference pipeline lacked the Rescaling layer.
Rule: bake normalization into the model graph, never rely on external preprocessing scripts.
Key Takeaway
Normalization must be part of the model, not the data pipeline.
Use Normalization or Rescaling layers to ship one artifact.
Rule: if your model expects [0,1] input, enforce it inside the model.

Callbacks: Training Intelligence Without Reinventing Wheels

Keras callbacks are objects that hook into the training loop at specific points (epoch start, batch end, etc.). They let you implement early stopping, model checkpointing, learning rate scheduling, and custom logging without writing complex boilerplate. In production, callbacks are the difference between babysitting training and letting it run unattended.

Three essential callbacks: EarlyStopping (stop when validation loss stops improving), ModelCheckpoint (save the best model during training), ReduceLROnPlateau (lower learning rate when improvement stalls). Together they form a robust training regimen that adapts to convergence behavior automatically.

training_with_callbacks.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
import tensorflow as tf

callbacks = [
    tf.keras.callbacks.EarlyStopping(
        monitor='val_loss',
        patience=5,
        restore_best_weights=True
    ),
    tf.keras.callbacks.ModelCheckpoint(
        filepath='best_model.keras',
        monitor='val_accuracy',
        save_best_only=True
    ),
    tf.keras.callbacks.ReduceLROnPlateau(
        monitor='val_loss',
        factor=0.5,
        patience=3,
        min_lr=1e-6
    )
]

# During training
history = model.fit(
    x_train, y_train,
    validation_data=(x_val, y_val),
    epochs=100,
    callbacks=callbacks
)
Output
Epoch 15/100: val_loss did not improve from 0.345 — early stopping triggered.
Best model saved to best_model.keras
Production Insight
Without callbacks, GPU time on failed training runs is wasted.
A team trained 200 epochs on a ResNet only to find the best checkpoint at epoch 23.
Rule: always use EarlyStopping + ModelCheckpoint in production training pipelines.
Key Takeaway
Callbacks are training automation.
Early stopping saves compute, checkpointing saves best results.
Rule: never run training without at least these two callbacks.

Stop Using Model.Evaluate Blindly: Build Custom Validation Loops for Production

model.evaluate() is a leaky abstraction. It hides per-class metrics, masks confidence calibration, and ignores edge-case failures. In production, you need surgical insight, not a single loss number. Build a custom validation loop with tf.GradientTape to control every step. Track precision-recall curves, log false positives per batch, and compute confidence histograms. The overhead is minimal; the debugging power is enormous. You'll catch data drift before it hits users and know exactly which class is failing. Your model might score 98% accuracy, but that 2% could be your most valuable customers. Don't trust a black-box number. Instrument your validation like you instrument your logs.

custom_validation.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
# io.thecodeforge: custom validation loop
import tensorflow as tf
import numpy as np

def custom_validate(model, val_dataset, loss_fn):
    total_loss = 0.0
    all_preds, all_labels = [], []
    
    for x_batch, y_batch in val_dataset:
        logits = model(x_batch, training=False)
        loss = loss_fn(y_batch, logits)
        total_loss += loss.numpy()
        all_preds.append(tf.nn.softmax(logits).numpy())
        all_labels.append(y_batch.numpy())
    
    all_preds = np.concatenate(all_preds)
    all_labels = np.concatenate(all_labels)
    
    # Per-class accuracy
    pred_classes = np.argmax(all_preds, axis=1)
    for c in range(all_labels.shape[1]):
        mask = all_labels[:, c] == 1
        acc = np.mean(pred_classes[mask] == c)
        print(f"Class {c} accuracy: {acc:.3f}")
    
    # Confidence distribution
    conf = np.max(all_preds, axis=1)
    print(f"Mean confidence: {np.mean(conf):.3f}, Std: {np.std(conf):.3f}")
    
    return total_loss / len(val_dataset)
Output
Class 0 accuracy: 0.992
Class 1 accuracy: 0.874
Class 2 accuracy: 0.985
Mean confidence: 0.967, Std: 0.033
Production Trap:
A 98% overall accuracy hides a class with 40% failure rate. Custom validation loops caught a production model that was confidently wrong on high-value transactions. You can't fix what you can't see.
Key Takeaway
model.evaluate() is a black box. Build custom loops to debug the 2% that matters most.

Keras' Sequential API is fine for toy demos. But production models often have multiple inputs: image, text, and numerical features combined into one prediction. The Functional API handles this natively. Define input tensors, branch them through separate preprocessing pipelines, concatenate the learned representations, and output a single logit. No custom layers, no messy hacks. TensorFlow's graph can optimize the entire DAG automatically. The key insight: each input branch can have its own learning rate or regularization. Your image branch might need heavy dropout; your numerical branch might not. With the Functional API, you control each pathway independently. Stop forcing everything into a linear stack.

multimodal_model.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
# io.thecodeforge: functional API multi-modal model
import tensorflow as tf
from tensorflow.keras import layers, Model

# Image input branch
img_input = layers.Input(shape=(224, 224, 3), name='image')
x = layers.Conv2D(32, 3, activation='relu')(img_input)
x = layers.GlobalAveragePooling2D()(x)
img_branch = layers.Dropout(0.5)(x)

# Numerical input branch
num_input = layers.Input(shape=(10,), name='numerical')
y = layers.Dense(64, activation='relu')(num_input)
y = layers.Dropout(0.1)(y)
num_branch = layers.Dense(32, activation='relu')(y)

# Concatenate and classify
combined = layers.concatenate([img_branch, num_branch])
z = layers.Dense(64, activation='relu')(combined)
output = layers.Dense(1, activation='sigmoid')(z)

model = Model(inputs=[img_input, num_input], outputs=output)
model.compile(optimizer='adam', loss='binary_crossentropy')
model.summary()
Output
Model: "model"
____________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
====================================================================================
image (InputLayer) [(None, 224, 224, 3) 0 []
numerical (InputLayer) [(None, 10)] 0 []
conv2d (Conv2D) (None, 222, 222, 32) 896 ['image[0][0]']
global_average_pooling2d (Glob (None, 32) 0 ['conv2d[0][0]']
dropout (Dropout) (None, 32) 0 ['global_average_pooling2d[0][0]]
dense (Dense) (None, 64) 704 ['numerical[0][0]']
dropout_1 (Dropout) (None, 64) 0 ['dense[0][0]']
concatenate (Concatenate) (None, 96) 0 ['dropout[0][0]']
dense_2 (Dense) (None, 64) 6208 ['concatenate[0][0]']
dense_3 (Dense) (None, 1) 65 ['dense_2[0][0]']
====================================================================================
Total params: 7,873
Real-World Insight:
We used this architecture for a medical diagnosis system: one branch processed MRI scans, another processed patient vitals. The Functional API let us freeze the image branch during fine-tuning while retraining the numerical branch for each hospital's demographics.
Key Takeaway
When your model has multiple data modalities, the Functional API is not optional—it's the only clean solution.
● Production incidentPOST-MORTEMseverity: high

OOM Error Silently Kills Training at 3 AM

Symptom
Training stops mid-epoch with CUDA_ERROR_OUT_OF_MEMORY. No Python traceback, only GPU-side error.
Assumption
Batch size is too high. Reducing it from 64 to 16 didn't help.
Root cause
TensorFlow's default memory configuration claims all GPU memory upfront (memory growth disabled). If another process uses even a small amount of GPU memory, the training process fails to allocate its full slice.
Fix
Enable memory growth early in the Python script: ``python tf.config.experimental.set_memory_growth(gpu, True) ` And allow memory growth per GPU: `python for gpu in tf.config.experimental.list_physical_devices('GPU'): tf.config.experimental.set_memory_growth(gpu, True) ``
Key lesson
  • Always enable memory growth when running on shared GPUs or with multiple processes.
  • Use nvidia-smi to monitor GPU memory before starting training.
  • Set a per-process memory limit with tf.config.experimental.set_virtual_device_configuration.
Production debug guideSymptom → Root cause → Action for production ML pipelines4 entries
Symptom · 01
Model accuracy does not improve after several epochs
Fix
Check loss function for classification vs regression. If using sigmoid output, use binary_crossentropy. For softmax, categorical_crossentropy. Verify learning rate is not too high (try 1e-3) or too low (1e-5). Try adding batch normalization layers.
Symptom · 02
Training loss decreases but validation loss increases
Fix
Classic overfitting. Add dropout layers (rate 0.3–0.5), L2 regularization on dense layers, or reduce model size. Use EarlyStopping callback with restore_best_weights=True.
Symptom · 03
Loading a saved model gives different predictions
Fix
Check if you saved only weights (model.save_weights) without architecture. Use model.save('model.keras') for complete model serialization. Also ensure training and inference have same preprocessing pipeline.
Symptom · 04
GPU memory grows unboundedly during training
Fix
Check for memory leaks in custom layers or data pipeline. Use tf.data.Dataset.prefetch to avoid tensors staying in memory. Enable memory growth as described in the incident above.
★ Keras Training TroubleshooterThree top production scenarios and the exact commands to diagnose and fix them.
Model never converges (loss stays flat)
Immediate action
Verify input data ranges (e.g., images divided by 255). Check activation function output ranges.
Commands
print("Data min:", np.min(x_train), "max:", np.max(x_train))
model.optimizer = tf.keras.optimizers.Adam(learning_rate=0.001)
Fix now
Add a single Dense(1, activation='linear') layer and confirm loss drops.
Overfitting evident from epoch 2+
Immediate action
Add Dropout and reduce network capacity. Increase batch size.
Commands
model.add(tf.keras.layers.Dropout(rate=0.5))
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
Fix now
Set early_stopping = EarlyStopping(patience=3, restore_best_weights=True)
Model.train_on_batch() returns NaN loss+
Immediate action
Check for NaN in input data. Reduce learning rate and add gradient clipping.
Commands
print("Check data stats:", np.isnan(x_train).any(), np.isinf(x_train).any())
optimizer = tf.keras.optimizers.Adam(clipnorm=1.0)
Fix now
Insert BatchNormalization layers after each Dense layer.
Keras APIs Compared
FeatureSequential APIFunctional APISubclassing API
ComplexityLowMediumHigh
FlexibilityRigid (Single path)Flexible (DAGs)Maximum (Imperative)
Best Use CaseSimple ClassifiersMulti-input/OutputResearch/Custom Ops
SerializabilityVery EasyEasyDifficult (Harder to save)

Key takeaways

1
Keras is the 'UX of Deep Learning'
it abstracts the heavy math of TensorFlow without sacrificing its power.
2
Choose your API wisely
Sequential for stacks, Functional for multi-input systems, and Subclassing only for advanced research.
3
Standardize your environment with Docker to prevent 'version hell' between local development and production servers.
4
Think beyond the Python script
use SQL to track model versions and metrics as part of a proper MLOps pipeline.
5
Neural networks are only as good as the data you feed them. Pre-processing and normalization are 80% of the real work.
6
Callbacks automate training best practices
early stopping saves GPU, checkpointing saves the best model.

Common mistakes to avoid

4 patterns
×

Not scaling input data

Symptom
Neural network fails to converge; loss oscillates or stays flat.
Fix
Normalize pixel values to [0,1] by dividing by 255, or use StandardScaler for tabular data. Include a Rescaling or Normalization layer in the model.
×

Forgetting to set a random seed

Symptom
Low validation accuracy that can't be reproduced — each run gives different results.
Fix
Set tf.random.set_seed(42) at the start of your script. Also set numpy and Python random seeds for full reproducibility.
×

Overfitting the validation set by tuning too long

Symptom
High validation accuracy but poor test set performance; the model memorized validation set patterns.
Fix
Hold out a separate test set that is never used for hyperparameter tuning. Use cross-validation or a fixed validation split.
×

Using .h5 format for production models

Symptom
Loading a saved model fails on a different TensorFlow version (e.g., 2.12 vs 2.15).
Fix
Always use the SavedModel format: model.save('my_model') creates a directory with version-independent assets. .keras format is also version-safe.
INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01SENIOR
What is the difference between Sequential and Functional API in Keras?
Q02SENIOR
How do you prevent overfitting in a Keras model?
Q03JUNIOR
How do you save and load a complete Keras model (architecture + weights ...
Q04SENIOR
Explain the concept of 'transfer learning' in Keras. How would you imple...
Q01 of 04SENIOR

What is the difference between Sequential and Functional API in Keras?

ANSWER
The Sequential API creates models by stacking layers linearly — each layer has exactly one input and one output. The Functional API supports complex architectures like multi-input, multi-output, shared layers, and residual connections. Functional API also makes it easier to save and inspect intermediate layer outputs. For 95% of production models, you want the Functional API.
FAQ · 5 QUESTIONS

Frequently Asked Questions

01
Is Keras still relevant in 2026 with the rise of PyTorch?
02
What is the difference between Keras and TensorFlow? (LeetCode AI Standard)
03
How do I handle vanishing gradients in Keras?
04
Why does my model have high training accuracy but low validation accuracy?
05
What is the best way to keep training experiments organized?
N
Naren Founder & Principal Engineer

20+ years shipping production ML systems and the infrastructure behind them. Written from production experience, not tutorials.

Follow
Verified
production tested
May 24, 2026
last updated
1,554
articles · all by Naren
🔥

That's Tools. Mark it forged?

4 min read · try the examples if you haven't

Previous
PyTorch Basics
4 / 12 · Tools
Next
Jupyter Notebook Guide