Skip to content
Home ML / AI Saving and Loading Models in TensorFlow — Serialization and Persistence

Saving and Loading Models in TensorFlow — Serialization and Persistence

Where developers are forged. · Structured learning · Free forever.
📍 Part of: TensorFlow & Keras → Topic 9 of 10
Master model persistence in TensorFlow.
⚙️ Intermediate — basic ML / AI knowledge assumed
In this tutorial, you'll learn
Master model persistence in TensorFlow.
  • SavedModel is the default, multi-file format for TensorFlow 2.x production and is language-agnostic.
  • H5 is a legacy single-file format that is popular for quick research sharing but lacks deployment flexibility.
  • Callbacks allow for automatic 'save points' during the training loop, protecting against hardware failure.
✦ Plain-English analogy ✦ Real code with output ✦ Interview questions
Quick Answer
  • SavedModel (directory format) is the TF 2.x production standard — saves architecture, weights, optimizer state, and serving signature
  • H5 (single file) is a legacy format — convenient for sharing but lacks TF Serving compatibility and cross-language loading
  • Checkpoints save weights only — use ModelCheckpoint callback with save_best_only=True to guard against overfitting regressions
  • JSON/YAML saves architecture only — useful for version controlling model design separately from weights
  • TFLite (.tflite) is the mobile/edge format — converted from SavedModel, not saved directly
  • Biggest mistake: trying to load an H5 model into a SavedModel directory path — they have completely different directory structures
🚨 START HERE
SavedModel Quick Inspection Commands
Fast commands for verifying model artifacts before serving
🟡Need to verify what a SavedModel expects as input
Immediate ActionInspect serving signatures
Commands
saved_model_cli show --dir /path/to/model --all
saved_model_cli run --dir /path/to/model --tag_set serve --signature_def serving_default --input_exprs 'input_1=np.ones((1,224,224,3))'
Fix NowMatch the input key and dtype exactly in your serving client code
🟡TF Serving not picking up a new model version
Immediate ActionCheck model version directory structure
Commands
curl http://localhost:8501/v1/models/model_name
ls -la /models/model_name/
Fix NowSavedModel must be in a numbered subdirectory: /models/model_name/1/, /models/model_name/2/, etc.
Production IncidentSix Months of Inference on the Wrong Model VersionA production ML service was updated to a 'better' model, but due to an H5 path naming collision, TF Serving loaded the previous model version for six months. Accuracy metrics showed no regression because both models were similar — until a new data distribution exposed the difference.
SymptomAfter a model update deployment, A/B testing showed no improvement despite the new model having higher offline val_accuracy. After six months, a sudden accuracy cliff triggered investigation.
AssumptionThe team assumed model.save('model.h5') was overwriting the previous production model. They verified the file modification timestamp — it was updated — but never verified the loaded model's version identifier.
Root causeTF Serving was configured to load from a versioned directory: /models/classifier/1/. The new H5 model was saved to the root /models/classifier/ directory, not /models/classifier/2/. TF Serving continued serving version 1 because no new versioned subdirectory was created.
FixAlways use SavedModel format with explicit version directories: model.save('/models/classifier/2/'). Add a model version identifier as a TF serving metadata field. Verify the loaded model version via the TF Serving health endpoint: GET /v1/models/classifier — check model_version_status.
Key Lesson
Never save models to flat file paths for production — always use versioned SavedModel directoriesVerify the serving model version via the TF Serving metadata endpoint after every deploymentWrite a post-deployment test that sends a known input and verifies the output matches the expected model version
Production Debug GuideDiagnosing SavedModel, checkpoint, and serving failures
OSError when loading a SavedModel — 'SavedModel file does not exist'Verify the path points to the SavedModel directory (contains saved_model.pb), not a parent directory. Use: ls -la model_dir/ and check for saved_model.pb + variables/ subdirectory. H5 paths and SavedModel paths are not interchangeable.
Model loads in Python but fails in TF Serving with 'Input shape mismatch'Inspect the serving signature: saved_model_cli show --dir model_dir --tag_set serve --signature_def serving_default. The input key and shape must match exactly what Serving receives. Rebuild the model with tf.keras.Input to ensure the signature is explicit.
Checkpoint restores but training metrics restart from epoch 0You restored weights but not the optimizer state. Load the full model (not just weights): model = tf.keras.models.load_model(checkpoint_dir). For weight-only checkpoints, the optimizer warm-up must be replayed manually.
model.load_weights() raises 'You are loading a weight file containing 5 layers into a model with 4 layers'Architecture mismatch between checkpoint and current model. The number of layers has changed since the checkpoint was saved. Ensure the model architecture is identical before calling load_weights. Consider saving the full model (architecture + weights) instead of weights-only checkpoints.

Training a deep learning model can take hours, days, or even weeks. Without a robust saving strategy, a simple power outage or a crashed script could wipe out thousands of dollars in compute time.

TensorFlow provides two primary ways to save: saving the entire model (architecture + weights) or saving just the weights (checkpoints). Understanding when to use the standard TensorFlow 'SavedModel' format versus the older 'H5' format is critical for moving models from research into production environments like TensorFlow Serving or TFLite. At TheCodeForge, we treat model serialization as a core DevOps task, ensuring that every training run is reproducible and every artifact is versioned.

1. Saving the Entire Model (SavedModel vs. H5)

The 'SavedModel' is the recommended format for TensorFlow 2.x. It saves the model architecture, weights, and even the compilation labels in a directory. Alternatively, the H5 format (Legacy Keras) stores everything in a single file, which is convenient for simple sharing but lacks the metadata required for advanced serving features.

save_and_load.py · PYTHON
1234567891011121314151617181920212223
import tensorflow as tf
from tensorflow.keras import models, layers

# io.thecodeforge: Standard Model Serialization
# Create a simple model
model = models.Sequential([layers.Dense(10, input_shape=(5,))])
model.compile(optimizer='adam', loss='mse')

# 1. Save as a directory (SavedModel format - Recommended for Production)
# Version directory is mandatory for TF Serving compatibility
model.save('forge_production_v1/2')

# 2. Save as a single file (H5 format - For simple sharing only)
model.save('forge_legacy_model.h5')

# Loading back
new_model = models.load_model('forge_production_v1/2')
print("Model loaded successfully!")

# Inspect serving signature
import subprocess
result = subprocess.run(['saved_model_cli', 'show', '--dir', 'forge_production_v1/2', '--all'], capture_output=True, text=True)
print(result.stdout)
▶ Output
Model loaded successfully!
The given SavedModel SignatureDef contains the following input(s):
inputs['dense_input'] tensor_info: dtype: DT_FLOAT, shape: (-1, 5)
⚠ SavedModel Directory Structure is Non-Negotiable for TF Serving
TF Serving requires a versioned directory structure: /models/classifier/1/saved_model.pb. Saving to /models/classifier.h5 or /models/classifier/saved_model.pb (without the version subdirectory) causes TF Serving to silently continue serving the previous version. Always use versioned directories and verify the loaded version via the health endpoint.
📊 Production Insight
SavedModel format stores the computation graph, not just weight matrices — this is why it loads in Java, C++, and TF Serving without a Python runtime.
H5 requires the original Python Keras code to deserialize — it is a Python artifact, not a portable artifact.
For the serving and mobile deployment workflow, see tensorflow-lite-mobile for TFLite conversion.
🎯 Key Takeaway
SavedModel = portable, cross-language, TF-Serving-compatible, version-directory-required.
H5 = portable file, Python-only deserialization, not suitable for TF Serving or Java loading.
Always save to a versioned path: model.save('models/classifier/2/').

2. Using Checkpoints during Training

A 'ModelCheckpoint' callback allows you to save your model automatically at the end of every epoch. This is a lifesaver for long training runs. It ensures that if the process is interrupted, you only lose a single epoch of work rather than the entire session.

checkpointing.py · PYTHON
123456789101112131415161718192021
# io.thecodeforge: Automated Checkpoint Strategy
checkpoint_path = "training_checkpoints/forge_model_{epoch:02d}"

cp_callback = tf.keras.callbacks.ModelCheckpoint(
    filepath=checkpoint_path,
    save_weights_only=False, # Save full model for easy resume
    save_best_only=True,     # Keeps only the version with the lowest validation loss
    monitor='val_loss',
    verbose=1
)

early_stop = tf.keras.callbacks.EarlyStopping(
    monitor='val_loss',
    patience=5,
    restore_best_weights=True
)

# The model will now save its 'progress' after every epoch
# model.fit(train_data, train_labels, epochs=50,
#           validation_data=(val_data, val_labels),
#           callbacks=[cp_callback, early_stop])
▶ Output
Epoch 00012: val_loss improved from 0.3241 to 0.2987, saving model to training_checkpoints/forge_model_12
💡Pro Tip: save_weights_only vs. Full Model
set 'save_best_only=True' in your checkpoint callback. This ensures that if your model starts 'overfitting' (getting worse) later in training, you keep the version that performed the best on your validation data. Use save_weights_only=False to save the full model — this allows resuming training or switching serving environments without rebuilding the model in code.
📊 Production Insight
save_weights_only=True is compact but requires you to perfectly reconstruct the model architecture in code to resume.
save_weights_only=False is 3–5x larger but self-contained — the safer choice for production pipelines where training resumes across machines.
EarlyStopping + ModelCheckpoint together is the minimum viable callback stack for any training run longer than 20 epochs.
🎯 Key Takeaway
save_best_only=True on ModelCheckpoint is non-negotiable — you want epoch 18's weights, not epoch 50's.
Combine with EarlyStopping(restore_best_weights=True) for defense-in-depth against overfitting.
save_weights_only=False is the safer default for distributed or cloud training.

3. Implementation: Java Model Loader

In many enterprise environments, models are trained in Python but executed in Java-based backend services. TensorFlow's SavedModel format is specifically designed to be cross-language compatible.

io/thecodeforge/ml/ModelLoader.java · JAVA
123456789101112131415161718
package io.thecodeforge.ml;

import org.tensorflow.SavedModelBundle;
import org.tensorflow.Session;
import org.tensorflow.Tensor;

public class ModelLoader {
    /**
     * io.thecodeforge: Loading a Python-trained SavedModel in Java
     */
    public static void loadAndPredict(String modelPath) {
        try (SavedModelBundle model = SavedModelBundle.load(modelPath, "serve")) {
            Session session = model.session();
            // Logic for wrapping inputs into Tensors and running session.runner()
            System.out.println("Forge Model successfully loaded in Java runtime.");
        }
    }
}
▶ Output
Forge Model successfully loaded in Java runtime.
📊 Production Insight
The modelPath in Java must point to the SavedModel directory containing saved_model.pb — not the parent directory and not the .pb file itself.
Always verify the serving signature before writing Java binding code: run saved_model_cli show --dir path --all from the training environment.
For the full transfer learning model serving pattern in Java, see the tensorflow-transfer-learning guide.
🎯 Key Takeaway
Java can only load SavedModel format — not H5.
The serving tag 'serve' is the default for Keras-exported models — verify it with saved_model_cli.
Close the SavedModelBundle in a try-with-resources block to prevent native memory leaks.

4. Audit Persistence: Logging Artifact Metadata

We don't just save files; we track them. This SQL pattern allows us to link a specific saved model file to the exact training metrics it produced.

io/thecodeforge/db/model_audit.sql · SQL
1234567891011121314
-- io.thecodeforge: Registering Model Artifacts
INSERT INTO io.thecodeforge.model_artifacts (
    version_tag,
    format_type,
    storage_path,
    final_accuracy,
    created_at
) VALUES (
    'v1.2.0-prod',
    'SavedModel',
    '/mnt/storage/models/forge_production_v1/2/',
    0.9421,
    CURRENT_TIMESTAMP
);
📊 Production Insight
The storage_path must include the version subdirectory — /forge_production_v1/2/ not /forge_production_v1/.
Add a model_hash column (SHA-256 of the saved_model.pb) to detect silent file corruption or unauthorized replacement.
For automated artifact tracking at scale, see experiment-tracking-mlflow.
🎯 Key Takeaway
Track the versioned path, not just the model name.
A SHA-256 hash of the artifact guards against corruption and unauthorized replacement.
This is the SQL floor — MLflow automates complete lineage tracking.

5. Packaging for Deployment

To serve the model, we use a Docker container that includes TensorFlow Serving. This allows the model to be accessed via a REST or gRPC API.

Dockerfile · DOCKERFILE
12345678910111213
# io.thecodeforge: Production Model Serving
FROM tensorflow/serving:latest

# Set the model name
ENV MODEL_NAME=forge_model

# Copy the SavedModel directory into the container
# The path /models/model_name/version_number/ is mandatory for TF Serving
COPY /forge_production_v1/2 /models/forge_model/2

# Expose the gRPC and REST ports
EXPOSE 8500
EXPOSE 8501
▶ Output
Successfully built image thecodeforge/model-server:latest
📊 Production Insight
TF Serving scans /models/model_name/ for integer-named subdirectories and serves the highest version number by default.
Do not use tensorflow/serving:latest in production — pin to an exact version tag to prevent unexpected behavior changes.
After deploying, verify the loaded version: curl localhost:8501/v1/models/forge_model — check model_version_status.
🎯 Key Takeaway
TF Serving requires /models/name/version/ directory structure — flat H5 files will not be served.
Pin the TF Serving image version — :latest is a production liability.
Always verify the served model version via the REST health endpoint post-deployment.
🗂 TensorFlow Model Saving Formats
When to use each format
MethodWhat is saved?Best Use Case
SavedModelArchitecture, Weights, Optimizer state, AssetsProduction, TF Serving, Java/C++ Loading
H5 FileArchitecture, Weights, Optimizer stateSimple sharing as a single portable file
CheckpointsWeights onlySaving progress during long training sessions
JSON/YAMLArchitecture onlySharing the structure without any weights
TensorFlow LiteOptimized Graph, Quantized WeightsMobile (Android/iOS) and Edge Deployment

🎯 Key Takeaways

  • SavedModel is the default, multi-file format for TensorFlow 2.x production and is language-agnostic.
  • H5 is a legacy single-file format that is popular for quick research sharing but lacks deployment flexibility.
  • Callbacks allow for automatic 'save points' during the training loop, protecting against hardware failure.
  • Loading a model restores not just the math (weights), but the entire state, including the optimizer and loss function.
  • Always log your model metadata in a central database to ensure long-term model governance.

⚠ Common Mistakes to Avoid

    Trying to load an H5 file using a SavedModel directory path (or vice versa)
    Symptom

    OSError: SavedModel file does not exist at the specified path — or — ValueError: Unknown layer — the deserialization fails entirely

    Fix

    SavedModel: model = tf.keras.models.load_model('path/to/directory/') — the path must point to a directory containing saved_model.pb. H5: model = tf.keras.models.load_model('model.h5') — the path must point to the .h5 file directly.

    Not saving the optimizer state before resuming training
    Symptom

    Resumed training loss is higher than the last checkpoint's training loss — the optimizer restarts with default learning rate and zero momentum, losing its adaptive state

    Fix

    Use model.save() (not model.save_weights()) to include the optimizer state. When resuming, load the full model: model = tf.keras.models.load_model(checkpoint_path). The optimizer will continue from its saved state.

    Using hardcoded absolute paths in save/load calls
    Symptom

    Code works on the training machine but crashes on the serving cluster with FileNotFoundError — the absolute path does not exist on the target machine

    Fix

    Use relative paths or environment variables: MODEL_PATH = os.environ.get('MODEL_PATH', './models/classifier/1/'). Never hardcode user-specific or machine-specific paths in model loading code.

    Mismatching Keras versions between training and loading environments
    Symptom

    ValueError: Unknown layer: 'CustomLayer' or AttributeError on model load in a Docker container or different machine

    Fix

    Pin the exact TensorFlow version in requirements.txt. Use the same TF image tag for training and serving containers. If using custom layers, ensure they are registered with @tf.keras.utils.register_keras_serializable() before saving.

Interview Questions on This Topic

  • QExplain the difference between model.save() and model.save_weights() in terms of memory and future utility.Mid-levelReveal
    model.save() serializes the complete model: architecture (layer types and configurations), trained weights, optimizer state (learning rate, momentum, running averages), and compilation settings (loss function, metrics). The output is self-contained — you can load and immediately run predictions or resume training without any additional code. model.save_weights() serializes only the weight matrices as a TensorFlow checkpoint. To use these weights, you must first reconstruct the identical model architecture in code, then call model.load_weights(). This is 5–10x smaller on disk but requires the original model-building code to be available and identical. Use save() for production artifacts; use save_weights() for experiment checkpointing where you version the architecture code in git.
  • QWhy is the SavedModel format preferred over H5 for cross-platform deployment?SeniorReveal
    H5 serialization relies on Python's Keras deserialization machinery — it stores the model as a JSON config + NumPy arrays. Loading an H5 model requires the exact same Python environment and Keras class registry. SavedModel, in contrast, serializes the computation graph as a protocol buffer (saved_model.pb) independent of any Python class structure. This allows TF Serving (C++ runtime), the Java TF API, the TF.js converter, and TFLite to load and execute the model without a Python process. SavedModel also includes TF functions with their graph signatures, enabling TF Serving to serve the model via REST and gRPC without any wrapper code.
  • QHow do you implement a custom callback to save a model only when a specific custom metric improves?SeniorReveal
    Subclass tf.keras.callbacks.Callback and override on_epoch_end. Inside, access self.model and logs (a dict of metric values for the epoch). Compare the custom metric against a stored best value and call self.model.save(filepath) conditionally. Example: class ForgeCustomCheckpoint(tf.keras.callbacks.Callback): def __init__(self, filepath, metric): self.filepath = filepath; self.metric = metric; self.best = float('inf'). def on_epoch_end(self, epoch, logs=None): current = logs.get(self.metric); if current < self.best: self.best = current; self.model.save(self.filepath); print(f'Epoch {epoch}: {self.metric} improved to {current:.4f}'). Register with callbacks=[ForgeCustomCheckpoint('best_model', 'val_custom_metric')] in model.fit().
  • QWhat is the role of the 'saved_model.pb' file inside a SavedModel directory?SeniorReveal
    saved_model.pb is the serialized TensorFlow graph stored as a Protocol Buffer binary. It contains the computation graph (all TF ops, their connections, and data flow), the serving signatures (which define the named input/output tensors for the REST API), and the MetaGraph which includes the model's functions exported with @tf.function. The actual weight values are not stored here — they live in the variables/ subdirectory as TF checkpoint files. When TF Serving loads a model, it reads saved_model.pb to understand the graph structure and then populates the variables from the checkpoint files.
  • QHow does the 'CheckpointManager' class differ from the 'ModelCheckpoint' callback for handling multiple model versions?SeniorReveal
    ModelCheckpoint is a Keras training callback — it integrates with model.fit() and saves based on epoch events and monitored metrics. It is the right tool for standard supervised training. CheckpointManager is a lower-level TF utility designed for custom training loops with tf.GradientTape. It manages a tf.train.Checkpoint object which can track any combination of tf.Variables, models, and optimizers — not just Keras models. CheckpointManager supports keeping the last N checkpoints automatically (max_to_keep parameter) and restoring the latest checkpoint with manager.restore_or_initialize(). Use ModelCheckpoint for model.fit() workflows; use CheckpointManager for custom training loops where you need fine-grained control.

Frequently Asked Questions

Can I save a model and resume training on a different machine?

Yes. As long as you use model.save() (which includes the optimizer state), you can load it on any machine with a compatible TensorFlow version and pick up training exactly where you left off.

Is it possible to save only the architecture without the weights?

Absolutely. You can use model.to_json() or model.to_yaml(). This creates a lightweight text representation of the layers, which is useful for version controlling the design itself separately from the trained weights.

How do I load a model if I only have the .ckpt files?

If you only have checkpoints (weights), you must first recreate the identical model architecture in code, then call model.load_weights('path/to/checkpoint'). This is why model.save() (full model) is preferred over save_weights() for production artifacts.

What is the 'assets' folder in a SavedModel directory?

The assets folder is used to store auxiliary files that your model might need during inference, such as vocabulary files for text processing or lookup tables for feature engineering.

How do I convert a SavedModel to TensorFlow Lite for mobile deployment?

Use the TFLiteConverter: converter = tf.lite.TFLiteConverter.from_saved_model('path/to/saved_model') then converter.convert() to produce the .tflite binary. For quantization and the full mobile deployment workflow, see the dedicated tensorflow-lite-mobile guide.

🔥
Naren Founder & Author

Developer and founder of TheCodeForge. I built this site because I was tired of tutorials that explain what to type without explaining why it works. Every article here is written to make concepts actually click.

← PreviousTransfer Learning with TensorFlowNext →TensorFlow Lite for Mobile Deployment
Forged with 🔥 at TheCodeForge.io — Where Developers Are Forged