Intermediate 6 min · March 10, 2026

Saving and Loading Models in TensorFlow

TF Model Save/Load — Avoid Wrong Version in Serving

Q: Can I save a model and resume training on a different machine?

Yes. As long as you use model.save() (which includes the optimizer state), you can load it on any machine with a compatible TensorFlow version and pick up training exactly where you left off.

Q: Is it possible to save only the architecture without the weights?

Absolutely. You can use model.to_json() or model.to_yaml(). This creates a lightweight text representation of the layers, which is useful for version controlling the design itself separately from the trained weights.

Q: How do I load a model if I only have the .ckpt files?

If you only have checkpoints (weights), you must first recreate the identical model architecture in code, then call model.load_weights('path/to/checkpoint'). This is why model.save() (full model) is preferred over save_weights() for production artifacts.

Q: What is the 'assets' folder in a SavedModel directory?

The assets folder is used to store auxiliary files that your model might need during inference, such as vocabulary files for text processing or lookup tables for feature engineering.

Q: How do I convert a SavedModel to TensorFlow Lite for mobile deployment?

Use the TFLiteConverter: converter = tf.lite.TFLiteConverter.from_saved_model('path/to/saved_model') then converter.convert() to produce the .tflite binary. For quantization and the full mobile deployment workflow, see the dedicated tensorflow-lite-mobile guide.

A flat H5 save path caused TF Serving to serve the wrong model for six months.

Naren Founder & Principal Engineer

20+ years shipping production ML systems and the infrastructure behind them. Lessons pulled from things that broke in production.

✓ Production

production tested

July 19, 2026

last updated

2,466

articles · all by Naren

Before you start⏱ 25 min

✓Solid grasp of fundamentals
✓Comfortable reading code examples
✓Basic production concepts

● Production Incident 🔎 Debug Guide ⚙ Triage Commands

⚡Quick Answer

SavedModel (directory format) is the TF 2.x production standard — saves architecture, weights, optimizer state, and serving signature
H5 (single file) is a legacy format — convenient for sharing but lacks TF Serving compatibility and cross-language loading
Checkpoints save weights only — use ModelCheckpoint callback with save_best_only=True to guard against overfitting regressions
JSON/YAML saves architecture only — useful for version controlling model design separately from weights
TFLite (.tflite) is the mobile/edge format — converted from SavedModel, not saved directly
Biggest mistake: trying to load an H5 model into a SavedModel directory path — they have completely different directory structures

✦ Definition~90s read

What is Saving and Loading Models in TensorFlow?

TensorFlow model save/load is the mechanism for serializing trained model architectures, weights, and training state to disk and reconstructing them later for inference or resumption of training. The core problem it solves is that a model exists only in memory during training—without persistence, you'd lose hours or days of compute on a crash, and you couldn't deploy the model to production.

★

Imagine you're playing a massive video game that takes 100 hours to beat.

TF offers two primary formats: the legacy H5 (Keras HDF5) which bundles everything into a single file but lacks TF Serving compatibility, and the modern SavedModel directory format that includes a signature-definition protocol buffer, variable checkpoints, and asset files—this is the only format TF Serving, TensorFlow Lite, and TensorFlow.js can consume reliably. The critical nuance most engineers miss is that SavedModel serializes the computation graph's concrete functions, not just the Python-layer Keras model, meaning version mismatches between training and serving environments (e.g., Python 3.8 vs.

Python 3.10, or TF 2.4 vs. TF 2.9) can silently corrupt op definitions or change default behaviors, leading to silent accuracy degradation or outright serving failures. For production pipelines, you must pair model saving with explicit artifact metadata logging—recording TF version, Python version, op-level graph hash, and input/output tensor dtypes—so you can audit which model version is actually loaded by your Java or Go serving binary, preventing the all-too-common scenario where a retrained model gets deployed against an incompatible runtime and serves wrong predictions for days before anyone notices.

Plain-English First

Imagine you're playing a massive video game that takes 100 hours to beat. You wouldn't want to leave your console on for weeks; you use a 'Save Point.' Saving a model is exactly like that. It freezes the model's 'brain' (its weights) and its 'skeleton' (its architecture) into a file on your hard drive, so you can pick up exactly where you left off or send that file to someone else to run on their computer.

Training a deep learning model can take hours, days, or even weeks. Without a robust saving strategy, a simple power outage or a crashed script could wipe out thousands of dollars in compute time.

TensorFlow provides two primary ways to save: saving the entire model (architecture + weights) or saving just the weights (checkpoints). Understanding when to use the standard TensorFlow 'SavedModel' format versus the older 'H5' format is critical for moving models from research into production environments like TensorFlow Serving or TFLite. At TheCodeForge, we treat model serialization as a core DevOps task, ensuring that every training run is reproducible and every artifact is versioned.

What TF Model Save/Load Actually Does

TensorFlow model save/load is the mechanism for serializing a trained model's graph structure and trained weights to disk, then restoring them for inference or further training. The core mechanic uses the SavedModel format, which bundles a TensorFlow MetaGraphDef (the computation graph) with checkpoint variables and asset files into a single directory. This ensures the model is self-contained and versioned.

In practice, the tf.saved_model.save() function exports the model with a specific signature — typically a serving_default signature that defines input/output tensors. When loading via tf.saved_model.load(), TensorFlow reconstructs the graph and restores the variable values. A critical property: the loaded model is a Python object, not a frozen graph, so it retains training ops unless explicitly stripped. This matters because serving with training ops can cause silent failures or memory bloat.

You use save/load when you need to decouple training from serving — for example, training on a GPU cluster and serving on CPU-only instances. It's also essential for model versioning in production pipelines, where you must roll back to a previous version without retraining. Without proper save/load, you cannot reliably deploy models across environments.

⚠ Signature Mismatch

The saved model's signature must match the serving input format exactly; a mismatch in tensor names or dtypes causes cryptic runtime errors.

📊 Production Insight

A fraud detection pipeline served a model saved with training ops enabled, causing OOM on inference instances within 2 hours.

Symptom: memory grew linearly with each request until the pod was killed by the OOM killer; no error in model logs.

Rule: always export with tf.saved_model.save(model, export_path, signatures=model.call.get_concrete_function(...)) to strip training-only ops.

🎯 Key Takeaway

SavedModel is the only format that guarantees graph + weights + assets in one directory.

Always specify a concrete function signature at save time to control input/output shapes.

Never load a model for serving without verifying the signature_def_map matches your serving contract.

thecodeforge.io

Tensorflow Save Load Model

1. Saving the Entire Model (SavedModel vs. H5)

The 'SavedModel' is the recommended format for TensorFlow 2.x. It saves the model architecture, weights, and even the compilation labels in a directory. Alternatively, the H5 format (Legacy Keras) stores everything in a single file, which is convenient for simple sharing but lacks the metadata required for advanced serving features.

save_and_load.pyPYTHON

import tensorflow as tf
from tensorflow.keras import models, layers

# io.thecodeforge: Standard Model Serialization
# Create a simple model
model = models.Sequential([layers.Dense(10, input_shape=(5,))])
model.compile(optimizer='adam', loss='mse')

# 1. Save as a directory (SavedModel format - Recommended for Production)
# Version directory is mandatory for TF Serving compatibility
model.save('forge_production_v1/2')

# 2. Save as a single file (H5 format - For simple sharing only)
model.save('forge_legacy_model.h5')

# Loading back
new_model = models.load_model('forge_production_v1/2')
print("Model loaded successfully!")

# Inspect serving signature
import subprocess
result = subprocess.run(['saved_model_cli', 'show', '--dir', 'forge_production_v1/2', '--all'], capture_output=True, text=True)
print(result.stdout)

Output

Model loaded successfully!

The given SavedModel SignatureDef contains the following input(s):

inputs['dense_input'] tensor_info: dtype: DT_FLOAT, shape: (-1, 5)

⚠ SavedModel Directory Structure is Non-Negotiable for TF Serving

TF Serving requires a versioned directory structure: /models/classifier/1/saved_model.pb. Saving to /models/classifier.h5 or /models/classifier/saved_model.pb (without the version subdirectory) causes TF Serving to silently continue serving the previous version. Always use versioned directories and verify the loaded version via the health endpoint.

📊 Production Insight

SavedModel format stores the computation graph, not just weight matrices — this is why it loads in Java, C++, and TF Serving without a Python runtime.

H5 requires the original Python Keras code to deserialize — it is a Python artifact, not a portable artifact.

For the serving and mobile deployment workflow, see tensorflow-lite-mobile for TFLite conversion.

🎯 Key Takeaway

SavedModel = portable, cross-language, TF-Serving-compatible, version-directory-required.

H5 = portable file, Python-only deserialization, not suitable for TF Serving or Java loading.

Always save to a versioned path: model.save('models/classifier/2/').

2. Using Checkpoints during Training

A 'ModelCheckpoint' callback allows you to save your model automatically at the end of every epoch. This is a lifesaver for long training runs. It ensures that if the process is interrupted, you only lose a single epoch of work rather than the entire session.

checkpointing.pyPYTHON

# io.thecodeforge: Automated Checkpoint Strategy
checkpoint_path = "training_checkpoints/forge_model_{epoch:02d}"

cp_callback = tf.keras.callbacks.ModelCheckpoint(
    filepath=checkpoint_path,
    save_weights_only=False, # Save full model for easy resume
    save_best_only=True,     # Keeps only the version with the lowest validation loss
    monitor='val_loss',
    verbose=1
)

early_stop = tf.keras.callbacks.EarlyStopping(
    monitor='val_loss',
    patience=5,
    restore_best_weights=True
)

# The model will now save its 'progress' after every epoch
# model.fit(train_data, train_labels, epochs=50,
#           validation_data=(val_data, val_labels),
#           callbacks=[cp_callback, early_stop])

Output

Epoch 00012: val_loss improved from 0.3241 to 0.2987, saving model to training_checkpoints/forge_model_12

💡Pro Tip: save_weights_only vs. Full Model

set 'save_best_only=True' in your checkpoint callback. This ensures that if your model starts 'overfitting' (getting worse) later in training, you keep the version that performed the best on your validation data. Use save_weights_only=False to save the full model — this allows resuming training or switching serving environments without rebuilding the model in code.

📊 Production Insight

save_weights_only=True is compact but requires you to perfectly reconstruct the model architecture in code to resume.

save_weights_only=False is 3–5x larger but self-contained — the safer choice for production pipelines where training resumes across machines.

EarlyStopping + ModelCheckpoint together is the minimum viable callback stack for any training run longer than 20 epochs.

🎯 Key Takeaway

save_best_only=True on ModelCheckpoint is non-negotiable — you want epoch 18's weights, not epoch 50's.

Combine with EarlyStopping(restore_best_weights=True) for defense-in-depth against overfitting.

save_weights_only=False is the safer default for distributed or cloud training.

thecodeforge.io

Tensorflow Save Load Model

3. Implementation: Java Model Loader

In many enterprise environments, models are trained in Python but executed in Java-based backend services. TensorFlow's SavedModel format is specifically designed to be cross-language compatible.

io/thecodeforge/ml/ModelLoader.javaJAVA

package io.thecodeforge.ml;

import org.tensorflow.SavedModelBundle;
import org.tensorflow.Session;
import org.tensorflow.Tensor;

public class ModelLoader {
    /**
     * io.thecodeforge: Loading a Python-trained SavedModel in Java
     */
    public static void loadAndPredict(String modelPath) {
        try (SavedModelBundle model = SavedModelBundle.load(modelPath, "serve")) {
            Session session = model.session();
            // Logic for wrapping inputs into Tensors and running session.runner()
            System.out.println("Forge Model successfully loaded in Java runtime.");
        }
    }
}

Output

Forge Model successfully loaded in Java runtime.

📊 Production Insight

The modelPath in Java must point to the SavedModel directory containing saved_model.pb — not the parent directory and not the .pb file itself.

Always verify the serving signature before writing Java binding code: run saved_model_cli show --dir path --all from the training environment.

For the full transfer learning model serving pattern in Java, see the tensorflow-transfer-learning guide.

🎯 Key Takeaway

Java can only load SavedModel format — not H5.

The serving tag 'serve' is the default for Keras-exported models — verify it with saved_model_cli.

Close the SavedModelBundle in a try-with-resources block to prevent native memory leaks.

4. Audit Persistence: Logging Artifact Metadata

We don't just save files; we track them. This SQL pattern allows us to link a specific saved model file to the exact training metrics it produced.

io/thecodeforge/db/model_audit.sqlSQL

-- io.thecodeforge: Registering Model Artifacts
INSERT INTO io.thecodeforge.model_artifacts (
    version_tag,
    format_type,
    storage_path,
    final_accuracy,
    created_at
) VALUES (
    'v1.2.0-prod',
    'SavedModel',
    '/mnt/storage/models/forge_production_v1/2/',
    0.9421,
    CURRENT_TIMESTAMP
);

📊 Production Insight

The storage_path must include the version subdirectory — /forge_production_v1/2/ not /forge_production_v1/.

Add a model_hash column (SHA-256 of the saved_model.pb) to detect silent file corruption or unauthorized replacement.

For automated artifact tracking at scale, see experiment-tracking-mlflow.

🎯 Key Takeaway

Track the versioned path, not just the model name.

A SHA-256 hash of the artifact guards against corruption and unauthorized replacement.

This is the SQL floor — MLflow automates complete lineage tracking.

5. Packaging for Deployment

To serve the model, we use a Docker container that includes TensorFlow Serving. This allows the model to be accessed via a REST or gRPC API.

DockerfileDOCKERFILE

# io.thecodeforge: Production Model Serving
FROM tensorflow/serving:latest

# Set the model name
ENV MODEL_NAME=forge_model

# Copy the SavedModel directory into the container
# The path /models/model_name/version_number/ is mandatory for TF Serving
COPY /forge_production_v1/2 /models/forge_model/2

# Expose the gRPC and REST ports
EXPOSE 8500
EXPOSE 8501

Output

Successfully built image thecodeforge/model-server:latest

📊 Production Insight

TF Serving scans /models/model_name/ for integer-named subdirectories and serves the highest version number by default.

Do not use tensorflow/serving:latest in production — pin to an exact version tag to prevent unexpected behavior changes.

After deploying, verify the loaded version: curl localhost:8501/v1/models/forge_model — check model_version_status.

🎯 Key Takeaway

TF Serving requires /models/name/version/ directory structure — flat H5 files will not be served.

Pin the TF Serving image version — :latest is a production liability.

Always verify the served model version via the REST health endpoint post-deployment.

Save Your Sanity: Why save_weights() Beats Full Model Dumps

Most tutorials treat save() like a magic wand. Wave it, you get a model back. Fine for demos. In production, you're shipping weights-only checkpoints at least 10x more often than full model saves. Here's why.

model.save_weights() writes nothing but the trainable parameters -- no architecture, no optimizer state, no custom layer registration. That means your deployment pipeline owns the architecture definition. You can mutate the graph, swap activations, fix bugs in custom layers without invalidating the weights. Try that with a frozen SavedModel directory.

Weight files sit at roughly 10-30% the disk footprint of a full model. For a ResNet-50, that's ~90 MB vs. 400 MB. Multiply by nightly checkpoints across 50 experiments and you're looking at real infrastructure savings. The trade-off? You must keep the exact model code that produced those weights. Version your model definitions like you version your database schemas -- because they are your schema.

checkpoint_backup.pyPYTHON

// io.thecodeforge — ml-ai tutorial

import tensorflow as tf
from tensorflow import keras

# Model must exist before weights can land on it
model = keras.Sequential([
    keras.layers.Dense(128, activation='relu', input_shape=(784,)),
    keras.layers.Dropout(0.2),
    keras.layers.Dense(10, activation='softmax')
])

# Train a bit to get real weights
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy')

# Save ONLY the learned parameters
model.save_weights('production_weights/run_2024_11_21/epoch_12.weights.h5')

# Load back into an identical architecture
model.load_weights('production_weights/run_2024_11_21/epoch_12.weights.h5')
predictions = model.predict(some_batch)
print(f"Predictions shape: {predictions.shape}")

Output

Predictions shape: (32, 10)

⚠ Production Trap:

If you refactor layer names -- even innocently renaming a Dense(..., name='fc1') to 'dense_1' -- load_weights() silently fails to map old keys and loads nothing. Always verify with model.count_params() after loading.

🎯 Key Takeaway

Save weights, not entire models. Version your architecture code. Never guess if weights actually loaded.

Checkpoint Every 10 Minutes — Or Regret It

You've seen the Colab timeout message at hour 11. The Kaggle kernel that dies at epoch 47. The Spot instance pre-emption notice. If you aren't checkpointing every 10-15 minutes, you're burning money.

tf.keras.callbacks.ModelCheckpoint writes model state at epoch boundaries. The smart play is to save both a best-so-far copy and a periodic backup. Set save_best_only=True for the golden copy -- the one with lowest validation loss. Set a second callback with period=5 to dump everything every fifth epoch. Now if training blows up at epoch 43, you lose at most 5 epochs of work, not 42.

Use the save_weights_only=True flag in your checkpoint callback. Full model saves in checkpoints are a waste of IO bandwidth and disk space. You only need the weights and the optimizer state if you plan to resume training mid-epoch. For inference checkpoints, weights alone are plenty.

Store checkpoints on a distributed filesystem or cloud bucket with versioned paths. Example: gs://ml-experiments/project_alpha/checkpoint_e{epoch:04d}_loss{val_loss:.4f}. This turns your checkpoint directory into a searchable artifact store.

checkpoint_discipline.pyPYTHON

// io.thecodeforge — ml-ai tutorial

import tensorflow as tf
from tensorflow import keras

model = keras.Sequential([
    keras.layers.Dense(256, activation='relu', input_shape=(784,)),
    keras.layers.Dropout(0.3),
    keras.layers.Dense(10, activation='softmax')
])
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

# Best model by validation loss
best_cp = keras.callbacks.ModelCheckpoint(
    filepath='checkpoints/best/best_model.weights.h5',
    monitor='val_loss',
    save_best_only=True,
    save_weights_only=True
)

# Periodic snapshot every 5 epochs
periodic_cp = keras.callbacks.ModelCheckpoint(
    filepath='checkpoints/epoch_{epoch:04d}_loss_{loss:.4f}.weights.h5',
    save_weights_only=True,
    period=5
)

model.fit(x_train, y_train, epochs=50, validation_split=0.2, callbacks=[best_cp, periodic_cp])

Output

Epoch 1/50

-> checkpoint saved: val_loss improved from inf to 0.3421

...

Epoch 5/50

-> periodic checkpoint saved

Epoch 50/50

-> checkpoint saved: val_loss improved from 0.0874 to 0.0671

💡Senior Shortcut:

Hook your checkpoint callback into the experiment tracker. Pass checkpoint_path as an artefact to MLflow or Weights & Biases. Now every run's golden checkpoint is one click away from the dashboard.

🎯 Key Takeaway

Two checkpoints: one for the best model, one for disaster recovery. Never trust a single save point.

Why save_weights() Beats Full Model Dumps for Iteration Speed

When you're iterating on architecture or training logic, saving the full model every time is a waste of disk and nerves. The model topology rarely changes between experiments — only the weights do. That's where save_weights() comes in. It serializes just the learnable parameters, not the layer definitions, optimizer state, or custom objects. This cuts file size by orders of magnitude and saves 2-3 seconds per save in production training loops.

Here's the workflow: define your model once, compile it with the same optimizer settings, and dump only the weights after each epoch. When you need to restore, rebuild the exact same architecture from code, call load_weights(), and you're running inference in under a second. No garbage .h5 file with fifteen copies of the same architecture. This is the standard pattern in distributed training and CI/CD pipelines where full model reloads are too slow. It forces you to keep your model definition version-controlled and reproducible — which you should already be doing. If you absolutely must restore optimizer momentum for fine-tuning, use the full SavedModel. For everything else, save_weights() is the correct call.

checkpoint_weights.pyPYTHON

// io.thecodeforge — ml-ai tutorial

import tensorflow as tf

model = tf.keras.Sequential([
    tf.keras.layers.Dense(256, activation='relu', input_shape=(784,)),
    tf.keras.layers.Dropout(0.3),
    tf.keras.layers.Dense(10, activation='softmax')
])

model.compile(optimizer='adam', loss='sparse_categorical_crossentropy')

# Save only weights after training
model.fit(x_train, y_train, epochs=5)
model.save_weights('mnist_weights.h5')

# Restore on identical architecture
new_model = tf.keras.models.clone_model(model)
new_model.build((None, 784))
new_model.load_weights('mnist_weights.h5')

print("Weights restored to fresh model.")

Output

Epoch 5/5

[...training logs...]

Weights restored to fresh model.

⚠ Production Trap:

save_weights() does NOT save optimizer state. If your training is interrupted, you lose momentum and learning rate schedules. For long-running training jobs, combine with tf.train.Checkpoint to snapshot the full optimizer state every N minutes.

🎯 Key Takeaway

save_weights() for fast iteration; full SavedModel only when you need optimizer state or deployment artifacts.

Exporting Models with tf.saved_model.save() for Production

Saving a full model is not the same as exporting it. Exporting strips training-only ops, freezes the computation graph into a platform-agnostic format (SavedModel), and creates a signature map that tells inference servers exactly how to feed inputs and read outputs. When you call tf.saved_model.save(model, export_dir), TensorFlow serializes the graph, variables, and assets into a single directory containing a saved_model.pb file and variables/ subfolder. This export is mandatory for serving via TensorFlow Serving, TensorFlow Lite, or TensorFlow.js. Why choose export over save_weights? Because export bundles the full forward pass—no Python code required on the serving side. The loaded model becomes a self-contained callable that can process raw numpy arrays, dicts, or tensors. Always specify signatures explicitly via tf.function decorators to control input/output shapes and names. Without signatures, TF guesses—and guesses break in production. Export once, serve everywhere.

export_model.pyPYTHON

// io.thecodeforge — ml-ai tutorial

import tensorflow as tf

model = tf.keras.Sequential([
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.Dense(10, activation='softmax')
])
model.build((None, 32))

@tf.function(input_signature=[
    tf.TensorSpec(shape=[None, 32], dtype=tf.float32, name='input')
])
def predict_fn(x):
    return {'output': model(x)}

tf.saved_model.save(model, 'exported_model', signatures={'predict': predict_fn})
print('Exported to exported_model/')

Output

Exported to exported_model/

⚠ Production Trap:

Never rely on default signatures from model.save(). They often use Keras training-specific inputs. Always define explicit signatures with tf.function to avoid silent shape mismatches at scale.

🎯 Key Takeaway

Use tf.saved_model.save() with explicit signatures for production-ready, platform-independent model serving.

Simple Exporting with model.export() for One-Step Deployment

Keras now offers model.export(filepath) as a one-line replacement for the verbose tf.saved_model.save() workflow. Introduced in TensorFlow 2.12+, this method automatically traces the model's call function, extracts the input signature from the first batch of training data, and outputs a complete SavedModel directory—no tf.function decorators needed. Why does this matter? Because traditional exporting forces you to write signature-building boilerplate, which breaks fast when model layers change. model.export() infers everything from the model's existing call graph. The downside: it only exports the default serving signature. If you need multiple signatures (e.g., predict, classify, encode), stick with tf.saved_model.save(). Use model.export() for rapid prototyping-to-production pipelines where the input shape is stable. The exported folder is immediately consumable by TensorFlow Serving or any SavedModel loader. No Python interpreter required on the inference side.

simple_export.pyPYTHON

// io.thecodeforge — ml-ai tutorial

import tensorflow as tf

model = tf.keras.applications.MobileNetV2(
    input_shape=(224, 224, 3), weights='imagenet', pooling='avg'
)

# Export in one line — no signatures, no tf.function
model.export('mobilenet_export')

# Verify
loaded = tf.saved_model.load('mobilenet_export')
import numpy as np
dummy = np.random.rand(1, 224, 224, 3).astype(np.float32)
result = loaded(dummy)
print('Output shape:', result.shape)

Output

Output shape: (1, 1280)

⚠ Production Trap:

model.export() pulls the signature from the first call—if you call the model with a batch size of 1 during export, it bakes that batch dimension. Always call model.export() after at least one inference with the expected real-world batch size.

🎯 Key Takeaway

model.export() cuts boilerplate for single-signature deployments but only if you call it with the right input shape first.

Options for Saving and Loading

TensorFlow provides several save/load options that impact performance, portability, and debugging. The save_format parameter in model.save() lets you choose between the Keras H5 format (single file, no external dependencies) and the SavedModel directory (more flexible, required for TensorFlow Serving). For checkpoint-based saving, tf.train.Checkpoint with CheckpointManager offers configurable retention policies via max_to_keep and keep_checkpoint_every_n_hours. You can also pass save_weights_only=True to ModelCheckpoint to store only weight tensors, reducing disk usage and accelerating iteration. When loading, the custom_objects dictionary allows you to map custom layers or loss functions during deserialization—critical when reusing model architectures across projects. For production, tf.saved_model.save() accepts signatures and tags to expose specific serving endpoints. Choosing wrongly can silently break your pipeline: SavedModel is default, but H5 is simpler for research. Understand tradeoffs: saved_model supports distribution strategies; H5 works with non-TF tools.

save_load_options.pyPYTHON

// io.thecodeforge — ml-ai tutorial
import tensorflow as tf

model = tf.keras.Sequential([tf.keras.layers.Dense(1)])

# 1) H5 format for portability
model.save('model.h5', save_format='h5')

# 2) SavedModel for production serving
model.save('saved_model_dir/')

# 3) Checkpoint with 10 max files, keep last 2 hourly
checkpoint = tf.train.Checkpoint(model=model)
manager = tf.train.CheckpointManager(
    checkpoint, directory='./ckpts', max_to_keep=10,
    keep_checkpoint_every_n_hours=2)
manager.save()

Output

Checkpoint saved to ./ckpts/ckpt-1

⚠ Production Trap:

If you use custom layers (e.g., Attention) and save in H5, loading requires explicit custom_objects. SavedModel preserves the full graph, avoiding silent deserialization errors.

🎯 Key Takeaway

Match save format to deployment target: H5 for lightweight exchange, SavedModel for serving pipelines.

Installs and Imports

Before any save/load operation, ensure your environment includes the correct TensorFlow version and associated libraries. Install TensorFlow with pip install tensorflow==2.15.0 (or your target version) to avoid breaking changes in serialization APIs. For model loading in Java (covered in a later section), you need the TensorFlow Java bindings: add org.tensorflow:tensorflow-core-platform to your Maven or Gradle dependencies. Python imports are minimal: import tensorflow as tf gives you tf.saved_model, tf.train.Checkpoint, and keras.Model.save. If you work with HDF5, explicit import h5py is optional but helps debug file corruption. For audit logging, import json and datetime to timestamp metadata. Avoid importing deprecated modules like tensorflow.python.keras; use tf.keras directly. Also import os for path handling and shutil if you need to copy SavedModel directories. These imports form the foundation: without them, no save or load call will execute. Pro tip: always pin your TensorFlow version to avoid silent format shifts between minor releases.

imports_setup.pyPYTHON

// io.thecodeforge — ml-ai tutorial
import tensorflow as tf
import h5py
import os
import json
from datetime import datetime

# Verify version (>=2.12 for best save compatibility)
print(f"TensorFlow version: {tf.__version__}")

# Optional: check if HDF5 is available
with h5py.File('test.h5', 'w') as f:
    f.create_dataset('x', data=[1, 2, 3])
os.remove('test.h5')

Output

TensorFlow version: 2.15.0

⚠ Production Trap:

Importing from keras instead of tf.keras can cause model graph mismatches when loading across machines. Always use tf.keras for consistency.

🎯 Key Takeaway

Pin TF version and import tf.keras explicitly to guarantee cross-environment model compatibility.

● Production incidentPOST-MORTEMseverity: high

Six Months of Inference on the Wrong Model Version

Symptom

After a model update deployment, A/B testing showed no improvement despite the new model having higher offline val_accuracy. After six months, a sudden accuracy cliff triggered investigation.

Assumption

The team assumed model.save('model.h5') was overwriting the previous production model. They verified the file modification timestamp — it was updated — but never verified the loaded model's version identifier.

Root cause

TF Serving was configured to load from a versioned directory: /models/classifier/1/. The new H5 model was saved to the root /models/classifier/ directory, not /models/classifier/2/. TF Serving continued serving version 1 because no new versioned subdirectory was created.

Fix

Always use SavedModel format with explicit version directories: model.save('/models/classifier/2/'). Add a model version identifier as a TF serving metadata field. Verify the loaded model version via the TF Serving health endpoint: GET /v1/models/classifier — check model_version_status.

Key lesson

Never save models to flat file paths for production — always use versioned SavedModel directories
Verify the serving model version via the TF Serving metadata endpoint after every deployment
Write a post-deployment test that sends a known input and verifies the output matches the expected model version

Production debug guideDiagnosing SavedModel, checkpoint, and serving failures4 entries

Symptom · 01

OSError when loading a SavedModel — 'SavedModel file does not exist'

→

Fix

Verify the path points to the SavedModel directory (contains saved_model.pb), not a parent directory. Use: ls -la model_dir/ and check for saved_model.pb + variables/ subdirectory. H5 paths and SavedModel paths are not interchangeable.

Symptom · 02

Model loads in Python but fails in TF Serving with 'Input shape mismatch'

→

Fix

Inspect the serving signature: saved_model_cli show --dir model_dir --tag_set serve --signature_def serving_default. The input key and shape must match exactly what Serving receives. Rebuild the model with tf.keras.Input to ensure the signature is explicit.

Symptom · 03

Checkpoint restores but training metrics restart from epoch 0

→

Fix

You restored weights but not the optimizer state. Load the full model (not just weights): model = tf.keras.models.load_model(checkpoint_dir). For weight-only checkpoints, the optimizer warm-up must be replayed manually.

Symptom · 04

model.load_weights() raises 'You are loading a weight file containing 5 layers into a model with 4 layers'

→

Fix

Architecture mismatch between checkpoint and current model. The number of layers has changed since the checkpoint was saved. Ensure the model architecture is identical before calling load_weights. Consider saving the full model (architecture + weights) instead of weights-only checkpoints.

★ SavedModel Quick Inspection CommandsFast commands for verifying model artifacts before serving

Need to verify what a SavedModel expects as input−

Immediate action

Inspect serving signatures

Commands

saved_model_cli show --dir /path/to/model --all

saved_model_cli run --dir /path/to/model --tag_set serve --signature_def serving_default --input_exprs 'input_1=np.ones((1,224,224,3))'

Fix now

Match the input key and dtype exactly in your serving client code

TF Serving not picking up a new model version+

TensorFlow Model Saving Formats

Method	What is saved?	Best Use Case
SavedModel	Architecture, Weights, Optimizer state, Assets	Production, TF Serving, Java/C++ Loading
H5 File	Architecture, Weights, Optimizer state	Simple sharing as a single portable file
Checkpoints	Weights only	Saving progress during long training sessions
JSON/YAML	Architecture only	Sharing the structure without any weights
TensorFlow Lite	Optimized Graph, Quantized Weights	Mobile (Android/iOS) and Edge Deployment

⚙ Quick Reference

12 commands from this guide

File	Command / Code	Purpose
save_and_load.py	from tensorflow.keras import models, layers	1. Saving the Entire Model (SavedModel vs. H5)
checkpointing.py	checkpoint_path = "training_checkpoints/forge_model_{epoch:02d}"	2. Using Checkpoints during Training
iothecodeforgemlModelLoader.java	public class ModelLoader {	3. Implementation
iothecodeforgedbmodel_audit.sql	INSERT INTO io.thecodeforge.model_artifacts (	4. Audit Persistence
Dockerfile	FROM tensorflow/serving:latest	5. Packaging for Deployment
checkpoint_backup.py	from tensorflow import keras	Save Your Sanity
checkpoint_discipline.py	from tensorflow import keras	Checkpoint Every 10 Minutes
checkpoint_weights.py	model = tf.keras.Sequential([	Why save_weights() Beats Full Model Dumps for Iteration Spee
export_model.py	model = tf.keras.Sequential([	Exporting Models with tf.saved_model.save() for Production
simple_export.py	model = tf.keras.applications.MobileNetV2(	Simple Exporting with model.export() for One-Step Deployment
save_load_options.py	model = tf.keras.Sequential([tf.keras.layers.Dense(1)])	Options for Saving and Loading
imports_setup.py	from datetime import datetime	Installs and Imports

Key takeaways

SavedModel is the default, multi-file format for TensorFlow 2.x production and is language-agnostic.

H5 is a legacy single-file format that is popular for quick research sharing but lacks deployment flexibility.

Callbacks allow for automatic 'save points' during the training loop, protecting against hardware failure.

Loading a model restores not just the math (weights), but the entire state, including the optimizer and loss function.

Always log your model metadata in a central database to ensure long-term model governance.

INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01SENIOR

Explain the difference between model.save() and model.save_weights() in ...

Q02SENIOR

Why is the SavedModel format preferred over H5 for cross-platform deploy...

Q03SENIOR

How do you implement a custom callback to save a model only when a speci...

Q04SENIOR

What is the role of the 'saved_model.pb' file inside a SavedModel direct...

Q05SENIOR

How does the 'CheckpointManager' class differ from the 'ModelCheckpoint'...

Q01 of 05SENIOR

Explain the difference between model.save() and model.save_weights() in terms of memory and future utility.

ANSWER

model.save() serializes the complete model: architecture (layer types and configurations), trained weights, optimizer state (learning rate, momentum, running averages), and compilation settings (loss function, metrics). The output is self-contained — you can load and immediately run predictions or resume training without any additional code. model.save_weights() serializes only the weight matrices as a TensorFlow checkpoint. To use these weights, you must first reconstruct the identical model architecture in code, then call model.load_weights(). This is 5–10x smaller on disk but requires the original model-building code to be available and identical. Use save() for production artifacts; use save_weights() for experiment checkpointing where you version the architecture code in git.

FAQ · 5 QUESTIONS

Frequently Asked Questions

Can I save a model and resume training on a different machine?

Is it possible to save only the architecture without the weights?

How do I load a model if I only have the .ckpt files?

What is the 'assets' folder in a SavedModel directory?

How do I convert a SavedModel to TensorFlow Lite for mobile deployment?

Naren Founder & Principal Engineer

20+ years shipping production ML systems and the infrastructure behind them. Lessons pulled from things that broke in production.

✓ Verified

production tested

July 19, 2026

last updated

2,466

articles · all by Naren

🔥

That's TensorFlow & Keras. Mark it forged?

6 min read · try the examples if you haven't