Senior 3 min · March 10, 2026

TF Model Save/Load — Avoid Wrong Version in Serving

A flat H5 save path caused TF Serving to serve the wrong model for six months.

N
Naren · Founder
Plain-English first. Then code. Then the interview question.
About
 ● Production Incident 🔎 Debug Guide
Quick Answer
  • SavedModel (directory format) is the TF 2.x production standard — saves architecture, weights, optimizer state, and serving signature
  • H5 (single file) is a legacy format — convenient for sharing but lacks TF Serving compatibility and cross-language loading
  • Checkpoints save weights only — use ModelCheckpoint callback with save_best_only=True to guard against overfitting regressions
  • JSON/YAML saves architecture only — useful for version controlling model design separately from weights
  • TFLite (.tflite) is the mobile/edge format — converted from SavedModel, not saved directly
  • Biggest mistake: trying to load an H5 model into a SavedModel directory path — they have completely different directory structures
Plain-English First

Imagine you're playing a massive video game that takes 100 hours to beat. You wouldn't want to leave your console on for weeks; you use a 'Save Point.' Saving a model is exactly like that. It freezes the model's 'brain' (its weights) and its 'skeleton' (its architecture) into a file on your hard drive, so you can pick up exactly where you left off or send that file to someone else to run on their computer.

Training a deep learning model can take hours, days, or even weeks. Without a robust saving strategy, a simple power outage or a crashed script could wipe out thousands of dollars in compute time.

TensorFlow provides two primary ways to save: saving the entire model (architecture + weights) or saving just the weights (checkpoints). Understanding when to use the standard TensorFlow 'SavedModel' format versus the older 'H5' format is critical for moving models from research into production environments like TensorFlow Serving or TFLite. At TheCodeForge, we treat model serialization as a core DevOps task, ensuring that every training run is reproducible and every artifact is versioned.

1. Saving the Entire Model (SavedModel vs. H5)

The 'SavedModel' is the recommended format for TensorFlow 2.x. It saves the model architecture, weights, and even the compilation labels in a directory. Alternatively, the H5 format (Legacy Keras) stores everything in a single file, which is convenient for simple sharing but lacks the metadata required for advanced serving features.

save_and_load.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
import tensorflow as tf
from tensorflow.keras import models, layers

# io.thecodeforge: Standard Model Serialization
# Create a simple model
model = models.Sequential([layers.Dense(10, input_shape=(5,))])
model.compile(optimizer='adam', loss='mse')

# 1. Save as a directory (SavedModel format - Recommended for Production)
# Version directory is mandatory for TF Serving compatibility
model.save('forge_production_v1/2')

# 2. Save as a single file (H5 format - For simple sharing only)
model.save('forge_legacy_model.h5')

# Loading back
new_model = models.load_model('forge_production_v1/2')
print("Model loaded successfully!")

# Inspect serving signature
import subprocess
result = subprocess.run(['saved_model_cli', 'show', '--dir', 'forge_production_v1/2', '--all'], capture_output=True, text=True)
print(result.stdout)
Output
Model loaded successfully!
The given SavedModel SignatureDef contains the following input(s):
inputs['dense_input'] tensor_info: dtype: DT_FLOAT, shape: (-1, 5)
SavedModel Directory Structure is Non-Negotiable for TF Serving
TF Serving requires a versioned directory structure: /models/classifier/1/saved_model.pb. Saving to /models/classifier.h5 or /models/classifier/saved_model.pb (without the version subdirectory) causes TF Serving to silently continue serving the previous version. Always use versioned directories and verify the loaded version via the health endpoint.
Production Insight
SavedModel format stores the computation graph, not just weight matrices — this is why it loads in Java, C++, and TF Serving without a Python runtime.
H5 requires the original Python Keras code to deserialize — it is a Python artifact, not a portable artifact.
For the serving and mobile deployment workflow, see tensorflow-lite-mobile for TFLite conversion.
Key Takeaway
SavedModel = portable, cross-language, TF-Serving-compatible, version-directory-required.
H5 = portable file, Python-only deserialization, not suitable for TF Serving or Java loading.
Always save to a versioned path: model.save('models/classifier/2/').

2. Using Checkpoints during Training

A 'ModelCheckpoint' callback allows you to save your model automatically at the end of every epoch. This is a lifesaver for long training runs. It ensures that if the process is interrupted, you only lose a single epoch of work rather than the entire session.

checkpointing.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
# io.thecodeforge: Automated Checkpoint Strategy
checkpoint_path = "training_checkpoints/forge_model_{epoch:02d}"

cp_callback = tf.keras.callbacks.ModelCheckpoint(
    filepath=checkpoint_path,
    save_weights_only=False, # Save full model for easy resume
    save_best_only=True,     # Keeps only the version with the lowest validation loss
    monitor='val_loss',
    verbose=1
)

early_stop = tf.keras.callbacks.EarlyStopping(
    monitor='val_loss',
    patience=5,
    restore_best_weights=True
)

# The model will now save its 'progress' after every epoch
# model.fit(train_data, train_labels, epochs=50,
#           validation_data=(val_data, val_labels),
#           callbacks=[cp_callback, early_stop])
Output
Epoch 00012: val_loss improved from 0.3241 to 0.2987, saving model to training_checkpoints/forge_model_12
Pro Tip: save_weights_only vs. Full Model
set 'save_best_only=True' in your checkpoint callback. This ensures that if your model starts 'overfitting' (getting worse) later in training, you keep the version that performed the best on your validation data. Use save_weights_only=False to save the full model — this allows resuming training or switching serving environments without rebuilding the model in code.
Production Insight
save_weights_only=True is compact but requires you to perfectly reconstruct the model architecture in code to resume.
save_weights_only=False is 3–5x larger but self-contained — the safer choice for production pipelines where training resumes across machines.
EarlyStopping + ModelCheckpoint together is the minimum viable callback stack for any training run longer than 20 epochs.
Key Takeaway
save_best_only=True on ModelCheckpoint is non-negotiable — you want epoch 18's weights, not epoch 50's.
Combine with EarlyStopping(restore_best_weights=True) for defense-in-depth against overfitting.
save_weights_only=False is the safer default for distributed or cloud training.

3. Implementation: Java Model Loader

In many enterprise environments, models are trained in Python but executed in Java-based backend services. TensorFlow's SavedModel format is specifically designed to be cross-language compatible.

io/thecodeforge/ml/ModelLoader.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
package io.thecodeforge.ml;

import org.tensorflow.SavedModelBundle;
import org.tensorflow.Session;
import org.tensorflow.Tensor;

public class ModelLoader {
    /**
     * io.thecodeforge: Loading a Python-trained SavedModel in Java
     */
    public static void loadAndPredict(String modelPath) {
        try (SavedModelBundle model = SavedModelBundle.load(modelPath, "serve")) {
            Session session = model.session();
            // Logic for wrapping inputs into Tensors and running session.runner()
            System.out.println("Forge Model successfully loaded in Java runtime.");
        }
    }
}
Output
Forge Model successfully loaded in Java runtime.
Production Insight
The modelPath in Java must point to the SavedModel directory containing saved_model.pb — not the parent directory and not the .pb file itself.
Always verify the serving signature before writing Java binding code: run saved_model_cli show --dir path --all from the training environment.
For the full transfer learning model serving pattern in Java, see the tensorflow-transfer-learning guide.
Key Takeaway
Java can only load SavedModel format — not H5.
The serving tag 'serve' is the default for Keras-exported models — verify it with saved_model_cli.
Close the SavedModelBundle in a try-with-resources block to prevent native memory leaks.

4. Audit Persistence: Logging Artifact Metadata

We don't just save files; we track them. This SQL pattern allows us to link a specific saved model file to the exact training metrics it produced.

io/thecodeforge/db/model_audit.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
-- io.thecodeforge: Registering Model Artifacts
INSERT INTO io.thecodeforge.model_artifacts (
    version_tag,
    format_type,
    storage_path,
    final_accuracy,
    created_at
) VALUES (
    'v1.2.0-prod',
    'SavedModel',
    '/mnt/storage/models/forge_production_v1/2/',
    0.9421,
    CURRENT_TIMESTAMP
);
Production Insight
The storage_path must include the version subdirectory — /forge_production_v1/2/ not /forge_production_v1/.
Add a model_hash column (SHA-256 of the saved_model.pb) to detect silent file corruption or unauthorized replacement.
For automated artifact tracking at scale, see experiment-tracking-mlflow.
Key Takeaway
Track the versioned path, not just the model name.
A SHA-256 hash of the artifact guards against corruption and unauthorized replacement.
This is the SQL floor — MLflow automates complete lineage tracking.

5. Packaging for Deployment

To serve the model, we use a Docker container that includes TensorFlow Serving. This allows the model to be accessed via a REST or gRPC API.

DockerfileDOCKERFILE
1
2
3
4
5
6
7
8
9
10
11
12
13
# io.thecodeforge: Production Model Serving
FROM tensorflow/serving:latest

# Set the model name
ENV MODEL_NAME=forge_model

# Copy the SavedModel directory into the container
# The path /models/model_name/version_number/ is mandatory for TF Serving
COPY /forge_production_v1/2 /models/forge_model/2

# Expose the gRPC and REST ports
EXPOSE 8500
EXPOSE 8501
Output
Successfully built image thecodeforge/model-server:latest
Production Insight
TF Serving scans /models/model_name/ for integer-named subdirectories and serves the highest version number by default.
Do not use tensorflow/serving:latest in production — pin to an exact version tag to prevent unexpected behavior changes.
After deploying, verify the loaded version: curl localhost:8501/v1/models/forge_model — check model_version_status.
Key Takeaway
TF Serving requires /models/name/version/ directory structure — flat H5 files will not be served.
Pin the TF Serving image version — :latest is a production liability.
Always verify the served model version via the REST health endpoint post-deployment.
● Production incidentPOST-MORTEMseverity: high

Six Months of Inference on the Wrong Model Version

Symptom
After a model update deployment, A/B testing showed no improvement despite the new model having higher offline val_accuracy. After six months, a sudden accuracy cliff triggered investigation.
Assumption
The team assumed model.save('model.h5') was overwriting the previous production model. They verified the file modification timestamp — it was updated — but never verified the loaded model's version identifier.
Root cause
TF Serving was configured to load from a versioned directory: /models/classifier/1/. The new H5 model was saved to the root /models/classifier/ directory, not /models/classifier/2/. TF Serving continued serving version 1 because no new versioned subdirectory was created.
Fix
Always use SavedModel format with explicit version directories: model.save('/models/classifier/2/'). Add a model version identifier as a TF serving metadata field. Verify the loaded model version via the TF Serving health endpoint: GET /v1/models/classifier — check model_version_status.
Key lesson
  • Never save models to flat file paths for production — always use versioned SavedModel directories
  • Verify the serving model version via the TF Serving metadata endpoint after every deployment
  • Write a post-deployment test that sends a known input and verifies the output matches the expected model version
Production debug guideDiagnosing SavedModel, checkpoint, and serving failures4 entries
Symptom · 01
OSError when loading a SavedModel — 'SavedModel file does not exist'
Fix
Verify the path points to the SavedModel directory (contains saved_model.pb), not a parent directory. Use: ls -la model_dir/ and check for saved_model.pb + variables/ subdirectory. H5 paths and SavedModel paths are not interchangeable.
Symptom · 02
Model loads in Python but fails in TF Serving with 'Input shape mismatch'
Fix
Inspect the serving signature: saved_model_cli show --dir model_dir --tag_set serve --signature_def serving_default. The input key and shape must match exactly what Serving receives. Rebuild the model with tf.keras.Input to ensure the signature is explicit.
Symptom · 03
Checkpoint restores but training metrics restart from epoch 0
Fix
You restored weights but not the optimizer state. Load the full model (not just weights): model = tf.keras.models.load_model(checkpoint_dir). For weight-only checkpoints, the optimizer warm-up must be replayed manually.
Symptom · 04
model.load_weights() raises 'You are loading a weight file containing 5 layers into a model with 4 layers'
Fix
Architecture mismatch between checkpoint and current model. The number of layers has changed since the checkpoint was saved. Ensure the model architecture is identical before calling load_weights. Consider saving the full model (architecture + weights) instead of weights-only checkpoints.
★ SavedModel Quick Inspection CommandsFast commands for verifying model artifacts before serving
Need to verify what a SavedModel expects as input
Immediate action
Inspect serving signatures
Commands
saved_model_cli show --dir /path/to/model --all
saved_model_cli run --dir /path/to/model --tag_set serve --signature_def serving_default --input_exprs 'input_1=np.ones((1,224,224,3))'
Fix now
Match the input key and dtype exactly in your serving client code
TF Serving not picking up a new model version+
Immediate action
Check model version directory structure
Commands
curl http://localhost:8501/v1/models/model_name
ls -la /models/model_name/
Fix now
SavedModel must be in a numbered subdirectory: /models/model_name/1/, /models/model_name/2/, etc.
TensorFlow Model Saving Formats
MethodWhat is saved?Best Use Case
SavedModelArchitecture, Weights, Optimizer state, AssetsProduction, TF Serving, Java/C++ Loading
H5 FileArchitecture, Weights, Optimizer stateSimple sharing as a single portable file
CheckpointsWeights onlySaving progress during long training sessions
JSON/YAMLArchitecture onlySharing the structure without any weights
TensorFlow LiteOptimized Graph, Quantized WeightsMobile (Android/iOS) and Edge Deployment

Key takeaways

1
SavedModel is the default, multi-file format for TensorFlow 2.x production and is language-agnostic.
2
H5 is a legacy single-file format that is popular for quick research sharing but lacks deployment flexibility.
3
Callbacks allow for automatic 'save points' during the training loop, protecting against hardware failure.
4
Loading a model restores not just the math (weights), but the entire state, including the optimizer and loss function.
5
Always log your model metadata in a central database to ensure long-term model governance.

Common mistakes to avoid

4 patterns
×

Trying to load an H5 file using a SavedModel directory path (or vice versa)

Symptom
OSError: SavedModel file does not exist at the specified path — or — ValueError: Unknown layer — the deserialization fails entirely
Fix
SavedModel: model = tf.keras.models.load_model('path/to/directory/') — the path must point to a directory containing saved_model.pb. H5: model = tf.keras.models.load_model('model.h5') — the path must point to the .h5 file directly.
×

Not saving the optimizer state before resuming training

Symptom
Resumed training loss is higher than the last checkpoint's training loss — the optimizer restarts with default learning rate and zero momentum, losing its adaptive state
Fix
Use model.save() (not model.save_weights()) to include the optimizer state. When resuming, load the full model: model = tf.keras.models.load_model(checkpoint_path). The optimizer will continue from its saved state.
×

Using hardcoded absolute paths in save/load calls

Symptom
Code works on the training machine but crashes on the serving cluster with FileNotFoundError — the absolute path does not exist on the target machine
Fix
Use relative paths or environment variables: MODEL_PATH = os.environ.get('MODEL_PATH', './models/classifier/1/'). Never hardcode user-specific or machine-specific paths in model loading code.
×

Mismatching Keras versions between training and loading environments

Symptom
ValueError: Unknown layer: 'CustomLayer' or AttributeError on model load in a Docker container or different machine
Fix
Pin the exact TensorFlow version in requirements.txt. Use the same TF image tag for training and serving containers. If using custom layers, ensure they are registered with @tf.keras.utils.register_keras_serializable() before saving.
INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01SENIOR
Explain the difference between model.save() and model.save_weights() in ...
Q02SENIOR
Why is the SavedModel format preferred over H5 for cross-platform deploy...
Q03SENIOR
How do you implement a custom callback to save a model only when a speci...
Q04SENIOR
What is the role of the 'saved_model.pb' file inside a SavedModel direct...
Q05SENIOR
How does the 'CheckpointManager' class differ from the 'ModelCheckpoint'...
Q01 of 05SENIOR

Explain the difference between model.save() and model.save_weights() in terms of memory and future utility.

ANSWER
model.save() serializes the complete model: architecture (layer types and configurations), trained weights, optimizer state (learning rate, momentum, running averages), and compilation settings (loss function, metrics). The output is self-contained — you can load and immediately run predictions or resume training without any additional code. model.save_weights() serializes only the weight matrices as a TensorFlow checkpoint. To use these weights, you must first reconstruct the identical model architecture in code, then call model.load_weights(). This is 5–10x smaller on disk but requires the original model-building code to be available and identical. Use save() for production artifacts; use save_weights() for experiment checkpointing where you version the architecture code in git.
FAQ · 5 QUESTIONS

Frequently Asked Questions

01
Can I save a model and resume training on a different machine?
02
Is it possible to save only the architecture without the weights?
03
How do I load a model if I only have the .ckpt files?
04
What is the 'assets' folder in a SavedModel directory?
05
How do I convert a SavedModel to TensorFlow Lite for mobile deployment?
🔥

That's TensorFlow & Keras. Mark it forged?

3 min read · try the examples if you haven't

Previous
Transfer Learning with TensorFlow
9 / 10 · TensorFlow & Keras
Next
TensorFlow Lite for Mobile Deployment