TF Model Save/Load — Avoid Wrong Version in Serving
A flat H5 save path caused TF Serving to serve the wrong model for six months.
- SavedModel (directory format) is the TF 2.x production standard — saves architecture, weights, optimizer state, and serving signature
- H5 (single file) is a legacy format — convenient for sharing but lacks TF Serving compatibility and cross-language loading
- Checkpoints save weights only — use ModelCheckpoint callback with save_best_only=True to guard against overfitting regressions
- JSON/YAML saves architecture only — useful for version controlling model design separately from weights
- TFLite (.tflite) is the mobile/edge format — converted from SavedModel, not saved directly
- Biggest mistake: trying to load an H5 model into a SavedModel directory path — they have completely different directory structures
Imagine you're playing a massive video game that takes 100 hours to beat. You wouldn't want to leave your console on for weeks; you use a 'Save Point.' Saving a model is exactly like that. It freezes the model's 'brain' (its weights) and its 'skeleton' (its architecture) into a file on your hard drive, so you can pick up exactly where you left off or send that file to someone else to run on their computer.
Training a deep learning model can take hours, days, or even weeks. Without a robust saving strategy, a simple power outage or a crashed script could wipe out thousands of dollars in compute time.
TensorFlow provides two primary ways to save: saving the entire model (architecture + weights) or saving just the weights (checkpoints). Understanding when to use the standard TensorFlow 'SavedModel' format versus the older 'H5' format is critical for moving models from research into production environments like TensorFlow Serving or TFLite. At TheCodeForge, we treat model serialization as a core DevOps task, ensuring that every training run is reproducible and every artifact is versioned.
1. Saving the Entire Model (SavedModel vs. H5)
The 'SavedModel' is the recommended format for TensorFlow 2.x. It saves the model architecture, weights, and even the compilation labels in a directory. Alternatively, the H5 format (Legacy Keras) stores everything in a single file, which is convenient for simple sharing but lacks the metadata required for advanced serving features.
2. Using Checkpoints during Training
A 'ModelCheckpoint' callback allows you to save your model automatically at the end of every epoch. This is a lifesaver for long training runs. It ensures that if the process is interrupted, you only lose a single epoch of work rather than the entire session.
3. Implementation: Java Model Loader
In many enterprise environments, models are trained in Python but executed in Java-based backend services. TensorFlow's SavedModel format is specifically designed to be cross-language compatible.
4. Audit Persistence: Logging Artifact Metadata
We don't just save files; we track them. This SQL pattern allows us to link a specific saved model file to the exact training metrics it produced.
5. Packaging for Deployment
To serve the model, we use a Docker container that includes TensorFlow Serving. This allows the model to be accessed via a REST or gRPC API.
Six Months of Inference on the Wrong Model Version
- Never save models to flat file paths for production — always use versioned SavedModel directories
- Verify the serving model version via the TF Serving metadata endpoint after every deployment
- Write a post-deployment test that sends a known input and verifies the output matches the expected model version
Key takeaways
Common mistakes to avoid
4 patternsTrying to load an H5 file using a SavedModel directory path (or vice versa)
Not saving the optimizer state before resuming training
model.save() (not model.save_weights()) to include the optimizer state. When resuming, load the full model: model = tf.keras.models.load_model(checkpoint_path). The optimizer will continue from its saved state.Using hardcoded absolute paths in save/load calls
Mismatching Keras versions between training and loading environments
Interview Questions on This Topic
Explain the difference between model.save() and model.save_weights() in terms of memory and future utility.
model.save_weights() serializes only the weight matrices as a TensorFlow checkpoint. To use these weights, you must first reconstruct the identical model architecture in code, then call model.load_weights(). This is 5–10x smaller on disk but requires the original model-building code to be available and identical. Use save() for production artifacts; use save_weights() for experiment checkpointing where you version the architecture code in git.Frequently Asked Questions
That's TensorFlow & Keras. Mark it forged?
3 min read · try the examples if you haven't