Beginner 3 min · March 09, 2026

Introduction to Keras

Keras — Unpinned Version Causes 3-5% Accuracy Variation

Q: What is Introduction to Keras in simple terms?

Introduction to Keras is a fundamental concept in ML / AI. Think of it as a tool — once you understand its purpose, you'll reach for it constantly.

Q: Is Keras just a wrapper for TensorFlow?

Since TensorFlow 2.0, Keras is the official high-level API of TensorFlow. It isn't just a separate wrapper anymore; it's the primary way developers interact with the TensorFlow engine.

Q: Does Keras support GPU acceleration automatically?

Yes. If TensorFlow is installed with GPU support and your hardware is configured correctly, Keras will automatically offload heavy tensor computations to your GPU without requiring any changes to your code.

Q: What is the difference between model.fit() and model.predict()?

This is a frequent entry-level interview question. 'fit()' is the training phase where the model learns from historical data. 'predict()' is the inference phase where the trained model provides output for new, unseen data.

Q: How do I save and load a Keras model?

Use model.save('model.keras') to save the entire model (architecture, weights, optimizer state). Load with tf.keras.models.load_model('model.keras'). For inference only, you can save weights only via model.save_weights('model.weights.h5').

Q: What should I do if training is very slow?

First, verify GPU usage via tf.config.list_physical_devices('GPU'). Then profile data pipeline: ensure you use prefetch, cache, and batch. Set dataset = dataset.prefetch(tf.data.AUTOTUNE). Also reduce batch size if memory is saturated.

Unpinned Keras version caused 3-5% accuracy variance from changed Dense init.

Naren Founder & Principal Engineer

20+ years shipping production ML systems and the infrastructure behind them. Lessons pulled from things that broke in production.

✓ Production

production tested

July 19, 2026

last updated

2,466

articles · all by Naren

Before you start⏱ 20 min

✓Basic programming fundamentals
✓A computer with internet access
✓Willingness to follow along with examples

● Production Incident 🔎 Debug Guide ⚙ Triage Commands

⚡Quick Answer

Keras is a high-level deep learning API built on TensorFlow, designed for fast experimentation
Provides Sequential, Functional, and Model subclassing APIs for different model architectures
Pre-built layers, optimizers, loss functions, and metrics reduce boilerplate by ~70%
Training loop is abstracted via model.fit() but can be customized with gradient tape
Production pitfall: pinning Keras version is critical — minor changes can break reproducibility of model results

✦ Definition~90s read

What is Introduction to Keras?

★

Think of Introduction to Keras as a powerful tool in your developer toolkit.

Keras exists to provide a consistent, simple interface that reduces cognitive load, allowing developers to define a model in just a few lines of code while maintaining the full power of the TensorFlow backend for execution.

Plain-English First

Think of Introduction to Keras as a powerful tool in your developer toolkit. Once you understand what it does and when to reach for it, everything clicks into place. Imagine you're building a custom house. Without Keras, you’d have to personally forge every nail, saw every plank, and mix the concrete from raw chemicals (that's low-level TensorFlow or NumPy). Keras is like having high-quality, pre-fabricated walls and smart-home modules. You still design the architecture and layout, but you spend your time on the vision rather than the tedious manual labor.

Introduction to Keras is a fundamental concept in ML / AI development. Originally developed by François Chollet, Keras has become the official high-level API for TensorFlow, specifically designed to enable fast experimentation by being user-friendly, modular, and extensible.

In this guide we'll break down exactly what Introduction to Keras is, why it was designed this way, and how to use it correctly in real projects. We will explore how Keras abstracts complex mathematical operations into manageable 'Layers' and 'Models'.

By the end you'll have both the conceptual understanding and practical code examples to use Introduction to Keras with confidence.

What Is Introduction to Keras and Why Does It Exist?

Introduction to Keras is a core feature of TensorFlow & Keras. It was designed to solve a specific problem that developers encounter frequently: the friction between an idea and a working model. In the early days of Deep Learning, implementing a simple neural network required hundreds of lines of code to manage tensors and gradients. Keras exists to provide a consistent, simple interface that reduces cognitive load, allowing developers to define a model in just a few lines of code while maintaining the full power of the TensorFlow backend for execution.

keras_basics.pyPYTHON

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

# io.thecodeforge: Defining a simple Sequential model
def build_forge_model():
    model = keras.Sequential([
        # A dense layer with 64 units and ReLU activation
        layers.Dense(64, activation='relu', input_shape=(32,)),
        # Output layer for binary classification
        layers.Dense(1, activation='sigmoid')
    ])

    # Compilation configures the learning process
    model.compile(
        optimizer='adam',
        loss='binary_crossentropy',
        metrics=['accuracy']
    )
    return model

if __name__ == '__main__':
    forge_model = build_forge_model()
    forge_model.summary()

Output

Model: "sequential"

_________________________________________________________________

Layer (type) Output Shape Param #

=================================================================

dense (Dense) (None, 64) 2112

...

💡Key Insight:

The most important thing to understand about Introduction to Keras is the problem it was designed to solve. Always ask 'why does this exist?' before asking 'how do I use it?' Keras is built for humans, not machines—it prioritizes developer experience without sacrificing performance.

📊 Production Insight

When you use model.compile() the computation graph is finalized.

If you change the optimizer after compile, it won't take effect.

Always recompile after modifying the model architecture.

This catches shape mismatches early in development.

🎯 Key Takeaway

Keras reduces neural network implementation from hundreds of lines to just a few.

The mental model: Keras is an assembly language for neural nets — you compose layers, not math.

Always recompile after changing layer definitions.

thecodeforge.io

Keras Introduction

Production-Grade Deployment: Containerizing Keras

In a professional environment at TheCodeForge, we don't just run scripts; we deploy reproducible environments. Deep learning dependencies (like specific CUDA versions for GPUs) are notoriously fragile. We use Docker to ensure that the version of Keras you use in development is identical to the one in production.

DockerfileDOCKERFILE

# io.thecodeforge: Production DL Environment
FROM tensorflow/tensorflow:2.15.0-gpu

WORKDIR /app

# Standardized library requirements
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY . .

# Run the Keras model training service
CMD ["python", "keras_basics.py"]

Output

Successfully built image: thecodeforge/keras-base:latest

🔥Forge Best Practice:

Always pin your TensorFlow/Keras version in your Dockerfile. A minor version jump can sometimes change default weight initializations, leading to non-reproducible model results.

📊 Production Insight

GPU driver version mismatches between host and Docker are the #1 deployment failure.

Use nvidia-smi inside the container to verify driver availability.

TensorFlow's GPU detection fails silently — your CPU runs for hours with no error.

Always add a GPU detection check at startup: print(tf.config.list_physical_devices('GPU')).

🎯 Key Takeaway

Containerize everything, pin every version.

Check GPU availability programmatically before training.

Never rely on pip freeze alone — use explicit version numbers in Dockerfile.

The Data Layer: Tracking Model Metadata in SQL

Modern MLOps requires more than just code; it requires data governance. When using Keras, we often log our model architectures and performance metrics to a centralized database. This allows us to track 'Model Drift' over time.

io/thecodeforge/models/schema.sqlSQL

-- io.thecodeforge: Schema for model performance tracking
CREATE TABLE IF NOT EXISTS io.thecodeforge.model_logs (
    log_id SERIAL PRIMARY KEY,
    model_name VARCHAR(255) NOT NULL,
    optimizer_type VARCHAR(50),
    final_accuracy DECIMAL(5,4),
    final_loss DECIMAL(5,4),
    deployment_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Example: Logging a Keras training run
INSERT INTO io.thecodeforge.model_logs (model_name, optimizer_type, final_accuracy, final_loss)
VALUES ('Sequential_V1', 'adam', 0.9821, 0.0412);

Output

1 row inserted successfully.

💡SQL Tip:

Storing your training hyperparameters in a SQL table alongside your accuracy results makes it significantly easier to perform 'Hyperparameter Search' analysis later using BI tools.

📊 Production Insight

Model drift detection becomes impossible without storing baseline metrics.

Always log not just final metrics but also validation metrics per epoch.

Use a timestamped log to correlate model version with data version (data hash).

Without this, you won't know if accuracy drop is due to model or data shift.

🎯 Key Takeaway

Log every training run to a database — model name, hyperparameters, final metrics.

Include data version hash to detect distribution shifts.

Without metadata, debugging model degradation is guesswork.

thecodeforge.io

Keras Introduction

Common Mistakes and How to Avoid Them

When learning Introduction to Keras, most developers hit the same set of gotchas. Knowing these in advance saves hours of debugging. A common mistake is confusing the Keras 'Sequential' API with the 'Functional' API for complex architectures. Another is failing to match the loss function to the output layer's activation—for instance, using 'mean_squared_error' for a categorical classification task. Understanding the 'Keras way' of data preprocessing is also vital to prevent shape mismatch errors during training.

CommonMistakes.pyPYTHON

# io.thecodeforge: Common Pitfall - Activation/Loss Mismatch
# WRONG: Using sigmoid with categorical_crossentropy for multi-class
# model.add(layers.Dense(10, activation='sigmoid'))
# model.compile(loss='categorical_crossentropy')

# CORRECT: Use softmax for multi-class classification
model = keras.Sequential([
    layers.Dense(128, activation='relu'),
    layers.Dense(10, activation='softmax') 
])

# Ensure labels are one-hot encoded for categorical_crossentropy
model.compile(
    optimizer='rmsprop',
    loss='categorical_crossentropy',
    metrics=['accuracy']
)

Output

// Model compiled successfully with matched activation and loss function.

⚠ Watch Out:

The most common mistake with Introduction to Keras is using it when a simpler alternative would work better. Always consider whether the added complexity of a Deep Learning model is justified. If a simple Scikit-Learn Logistic Regression achieves 95% accuracy, you likely don't need a 50-layer Keras model.

📊 Production Insight

The sigmoid + categorical_crossentropy combination is a silent killer — it trains but never converges well.

Similarly, using input_shape=(32,) when data is (32,1) causes cryptic errors.

Always print model.summary() before training and verify output shape matches label shape.

Use model evaluators on a toy batch before full training: model(tf.ones((1,32))).

🎯 Key Takeaway

Match activation to loss: sigmoid → binary_crossentropy, softmax → categorical_crossentropy.

Print shapes early and often.

Test a single forward pass before training.

Custom Training Loops and Gradient Tape

For advanced use cases—like GANs, multi-task learning, or custom regularisation—you can't rely on model.fit(). Keras provides the GradientTape API for fine-grained control. This section shows how to write a custom training loop that logs gradients and handles variable-length sequences.

custom_training.pyPYTHON

# io.thecodeforge: Custom Training Loop with GradientTape
import tensorflow as tf
from tensorflow.keras import layers, losses, optimizers, metrics

def train_step(model, x_batch, y_batch, optimizer, loss_fn, train_acc):
    with tf.GradientTape() as tape:
        logits = model(x_batch, training=True)
        loss = loss_fn(y_batch, logits)
    grads = tape.gradient(loss, model.trainable_variables)
    optimizer.apply_gradients(zip(grads, model.trainable_variables))
    train_acc.update_state(y_batch, logits)
    return loss

# Usage
model = keras.Sequential([layers.Dense(10, activation='relu'), layers.Dense(1, activation='sigmoid')])
optimizer = optimizers.Adam(0.001)
loss_fn = losses.BinaryCrossentropy()
train_acc = metrics.BinaryAccuracy()

for epoch in range(10):
    for batch in dataset:
        loss = train_step(model, batch[0], batch[1], optimizer, loss_fn, train_acc)
    print(f'Epoch {epoch}: Loss {loss.numpy():.4f}, Acc {train_acc.result().numpy():.4f}')
    train_acc.reset_states()

Output

Epoch 0: Loss 0.6912, Acc 0.5200

Epoch 1: Loss 0.6723, Acc 0.5600

...

Mental Model

GradientTape Mental Model

Think of GradientTape as a recording device that watches every tensor operation and remembers the graph for backward pass.

Everything inside the with tape: block is recorded.
The gradient() call uses the recorded operations to compute derivatives.
You must use training=True in layers like Dropout and BatchNorm when inside GradientTape.
Always call apply_gradients() — forgetting it means weights never update.

📊 Production Insight

Custom loops are 2-3x slower than model.fit() because they lack XLA compilation.

But they give you access to per-layer gradients for debugging.

Common production bug: forgetting to set training=True in layers — Dropout is applied at inference, corrupting results.

Always log gradient norms early; exploding gradients are the #1 cause of NaN loss.

🎯 Key Takeaway

Use model.fit() unless you need per-batch control.

GradientTape enables GANs, meta-learning, and gradient clipping.

Always set training=True explicitly in custom loops.

Installing Without the Pain — Why Pip Isn't Always Your Friend

Everyone runs pip install keras and thinks they're done. Then their training job crashes because the CUDA runtime doesn't match TensorFlow's expectations, or they silently get CPU inference on a GPU machine. The 'WHY' here is simple: Keras is a high-level API that wraps TensorFlow — and TensorFlow is hyper-sensitive to your system's driver stack.

Start with a clean virtual environment. python -m venv keras_deploy and activate it. Then install TensorFlow with the hardware you actually have: pip install tensorflow for CPU-only, tensorflow-gpu for CUDA 11.2+ (check nvidia-smi first). Keras comes bundled as tf.keras; the standalone keras package is a pip alias that resolves to the same thing. Use the bundled version — it's exactly what the model will see in production.

Don't skip the smoke test: import tensorflow as tf; print(tf.config.list_physical_devices('GPU')). If you see an empty list, your GPU is dead to you. Fix your drivers before you write a single layer.

ProductionInstallCheck.pyPYTHON

// io.thecodeforge — ml-ai tutorial

// Always validate your hardware before trusting pip
import tensorflow as tf
import sys

def check_gpu():
    gpus = tf.config.list_physical_devices('GPU')
    if not gpus:
        print("FAIL: No GPU detected. Training will be CPU-bound.")
        sys.exit(1)
    print(f"OK: Found {len(gpus)} GPU(s): {[g.name for g in gpus]}")
    for gpu in gpus:
        tf.config.experimental.set_memory_growth(gpu, True)

if __name__ == "__main__":
    print(f"TensorFlow version: {tf.__version__}")
    check_gpu()

Output

TensorFlow version: 2.15.0

OK: Found 1 GPU(s): ['/physical_device:GPU:0']

⚠ Production Trap:

Docker images from Docker Hub often ship old CUDA runtimes. Pin your TensorFlow version in requirements.txt and rebuild weekly. A mismatched driver costs you 3x latency and zero errors.

🎯 Key Takeaway

Your install is incomplete until you’ve confirmed the GPU is visible to TensorFlow. Run the check script in CI.

Building Models That Don't Embarrass You — Sequential vs Functional

Keras gives you two ways to wire up neurons: the Sequential API and the Functional API. If you've only ever used Sequential, you're missing 70% of what Keras can do. Sequential is a linear stack — one layer feeds into the next, no branching, no shared layers, no multi-input hybrids. It's fine for toy nets. But real production models need flexibility.

Functional API builds a DAG (Directed Acyclic Graph) of layers. You define tensors, pass them through layers, and connect the outputs. Want a model that takes both an image and a text embedding as input? Functional. Need to share a dense layer across two parallel branches? Functional. Need a skip connection like ResNet? You guessed it.

The WHY: Neural networks in the real world aren't straight lines. Multi-task models, attention mechanisms, residual connections — all require non-sequential wiring. The Functional API costs you zero performance and adds infinite flexibility. Write your first model with it, even if you only need a straight line. You'll thank me when requirements change.

FunctionalBranchModel.pyPYTHON

// io.thecodeforge — ml-ai tutorial

import tensorflow as tf
from tensorflow.keras import layers, Model

# Input tensor
inputs = tf.keras.Input(shape=(784,), name="pixel_input")

# Shared layer
shared_dense = layers.Dense(256, activation='relu')(inputs)

# Branch 1: classification
branch_clf = layers.Dense(128, activation='relu')(shared_dense)
output_clf = layers.Dense(10, activation='softmax', name="classifier")(branch_clf)

# Branch 2: reconstruction
branch_recon = layers.Dense(128, activation='relu')(shared_dense)
output_recon = layers.Dense(784, activation='sigmoid', name="reconstruction")(branch_recon)

# Build model
model = Model(inputs=inputs, outputs=[output_clf, output_recon], name="multi_task_autoencoder")
model.summary()

Output

Model: "multi_task_autoencoder"

_________________________________________________________________

Layer (type) Output Shape Param #

=================================================================

pixel_input (InputLayer) [(None, 784)] 0

dense (Dense) (None, 256) 200960

dense_1 (Dense) (None, 128) 32896

dense_2 (Dense) (None, 128) 32896

classifier (Dense) (None, 10) 1290

reconstruction (Dense) (None, 784) 101136

=================================================================

Total params: 369,178

Trainable params: 369,178

💡Senior Shortcut:

Use the Functional API from day one. It makes debugging easier because every tensor has a name, and you can plot the model graph with tf.keras.utils.plot_model(model, show_shapes=True).

🎯 Key Takeaway

Sequential is for tutorials. Functional is for production. If you can't draw a DAG in your head, you can't debug a real model.

Training the Model — Why Epochs Are a Vanity Metric

Everyone asks 'how many epochs should I train?' Wrong question. The real question is 'when has my model stopped learning?' Training indefinitely overfits your model to noise. Using a fixed epoch count is cargo-cult engineering.

Keras provides callbacks to stop training when validation performance plateaus. EarlyStopping monitors a metric (say, val_loss) and stops if it doesn't improve for patience epochs. ReduceLROnPlateau drops the learning rate when progress stalls — a smarter way to escape local minima than guessing a schedule.

Here's the pattern every senior engineer uses: set a high max_epochs (like 500), attach EarlyStopping with patience 10, and let the model train until it stops improving. You'll never waste compute again. And always save the best weights with ModelCheckpoint. That checkpoint is what you deploy — not the last epoch's weights.

The WHY: Training is an optimization problem, not a calendar event. The metric that matters is generalization, not epoch count. Stop when your validation loss stops dropping. Forget epoch numbers.

SmartTrainingLoop.pyPYTHON

// io.thecodeforge — ml-ai tutorial

import tensorflow as tf
from tensorflow.keras.callbacks import EarlyStopping, ReduceLROnPlateau, ModelCheckpoint

# Assume model, x_train, y_train, x_val, y_val exist
callbacks = [
    EarlyStopping(monitor='val_loss', patience=10, restore_best_weights=True),
    ReduceLROnPlateau(monitor='val_loss', factor=0.5, patience=5, min_lr=1e-6),
    ModelCheckpoint(filepath='best_model.keras', monitor='val_loss', save_best_only=True)
]

history = model.fit(
    x_train, y_train,
    validation_data=(x_val, y_val),
    epochs=500,  # high ceiling
    batch_size=64,
    callbacks=callbacks,
    verbose=1
)

print(f"Training stopped at epoch {len(history.history['loss'])} due to EarlyStopping.")
print(f"Best validation loss: {min(history.history['val_loss']):.4f}")

Output

Epoch 1/500

782/782 [==============================] - 3s 4ms/step - loss: 0.3412 - val_loss: 0.2876

Epoch 2/500

...

Epoch 12/500

[...]

Training stopped at epoch 12 due to EarlyStopping.

Best validation loss: 0.2511

🔥Production Insight:

Set restore_best_weights=True in EarlyStopping. The default saves the last epoch's weights, which could be worse than the best checkpoint. Don't deploy a model that forgot how to generalize.

🎯 Key Takeaway

Epochs are a ceiling, not a target. Let the validation metric decide when training is done. Use callbacks to automate it.

● Production incidentPOST-MORTEMseverity: high

Reproducible Training Failure Due to Unpinned Keras Version

Symptom

Model accuracy varied by 3-5% when training the same code on different machines or after rebuilding the Docker image.

Assumption

The team assumed Keras/TensorFlow versions were identical because they were using the same Docker base image. They had not specified the exact patch version.

Root cause

The Dockerfile used tensorflow/tensorflow:2.15.0-gpu (minor version only). Between runs, the base image was updated by the maintainer, pulling a new patch version (e.g., 2.15.1) that changed default initializer for the Dense layer from 'glorot_uniform' to 'he_normal'.

Fix

Pin the full version: FROM tensorflow/tensorflow:2.15.0-gpu@sha256:... or use a manifest digest. Also, set tf.random.set_seed(42) at the start of training to ensure deterministic ops.

Key lesson

Always pin the full version (including patch) of Keras/TensorFlow in production environments.
Set random seeds explicitly when reproducibility is required.
Use a lockfile for Python packages (pip freeze > requirements.txt) and rebuild Docker images from scratch before deployment.

Production debug guideSymptom → Action guide for the most frequent Keras failures in production pipelines4 entries

Symptom · 01

Model training crashes with 'Input 0 of layer xxx is incompatible: expected ndim=2, found ndim=3'

→

Fix

Shape mismatch between input data and first layer's input_shape. Verify data shape with print(x_train.shape) and ensure it matches the input_shape tuple (excluding batch dimension).

Symptom · 02

Training converges to a constant loss value within a few epochs

→

Fix

Common when using wrong loss function or activation mismatch. Check if output activation matches loss: sigmoid + binary_crossentropy for binary, softmax + categorical_crossentropy for multi-class.

Symptom · 03

GPU memory usage grows unbounded during training across epochs

→

Fix

In Jupyter notebooks, each model.fit() call accumulates GPU memory. Use tf.keras.backend.clear_session() between experiments and wrap training in a function to avoid retaining model references.

Symptom · 04

Model runs fine in training but produces NaN predictions during inference

→

Fix

Check for division by zero in custom layers or loss functions. Add clipvalue or clipnorm in optimizer configuration. Also verify input data normalization matches training-time statistics.

★ Quick Debug Cheat Sheet for KerasWhen things go wrong during Keras training, use these commands to diagnose the problem fast.

Shape mismatch error at compile time−

Immediate action

Print the model summary and compare with the expected input shape.

Commands

model.summary()

print(f'Input shape expected: {model.input_shape}, Provided: {x_train.shape[1:]}')

Fix now

Update the first layer's input_shape to match the training data feature dimension.

Training accuracy is stuck at 50% for binary classification+

GPU memory exhaustion after multiple runs+

Model.fit() hangs indefinitely+

Keras vs Low-Level TensorFlow Comparison

Aspect	Low-Level TensorFlow/NumPy	Keras API
Code Verbosity	High (Manual math/gradients)	Low (Concise layer definitions)
Flexibility	Maximum (Total control)	High (Modular & Extensible)
Speed of Prototyping	Slow	Very Fast
Learning Curve	Steep	Gentle
Standardization	Low (Varies by developer)	High (Industry standard patterns)

⚙ Quick Reference

8 commands from this guide

File	Command / Code	Purpose
keras_basics.py	from tensorflow import keras	What Is Introduction to Keras and Why Does It Exist?
Dockerfile	FROM tensorflow/tensorflow:2.15.0-gpu	Production-Grade Deployment
iothecodeforgemodelsschema.sql	CREATE TABLE IF NOT EXISTS io.thecodeforge.model_logs (	The Data Layer
CommonMistakes.py	model = keras.Sequential([	Common Mistakes and How to Avoid Them
custom_training.py	from tensorflow.keras import layers, losses, optimizers, metrics	Custom Training Loops and Gradient Tape
ProductionInstallCheck.py	def check_gpu():	Installing Without the Pain
FunctionalBranchModel.py	from tensorflow.keras import layers, Model	Building Models That Don't Embarrass You
SmartTrainingLoop.py	from tensorflow.keras.callbacks import EarlyStopping, ReduceLROnPlateau, ModelCh...	Training the Model

Key takeaways

Keras is the official high-level API for TensorFlow

use it unless you need absolute low-level control.

Always match output activation to loss function

sigmoid + binary_crossentropy, softmax + categorical_crossentropy.

Pin Keras/TensorFlow version in Docker and lockfiles to ensure reproducibility.

Use model.fit() for standard training; GradientTape for advanced customization like GANs or meta-learning.

Track every training run in a database with hyperparameters and data version to detect model drift.

INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01SENIOR

Explain the internal mechanism of 'backpropagation' and how Keras automa...

Q02SENIOR

What is the difference between the 'Sequential' API and the 'Functional'...

Q03SENIOR

How does Keras handle the 'Vanishing Gradient' problem through its built...

Q04SENIOR

In a production environment, why might you use TFLite or ONNX instead of...

Q05SENIOR

Describe the role of 'Callbacks' in Keras. How would you implement a cus...

Q01 of 05SENIOR

Explain the internal mechanism of 'backpropagation' and how Keras automates this during the training loop.

ANSWER

Backpropagation computes gradients of the loss with respect to all weights using the chain rule. Keras automates this via the GradientTape API: every forward operation is recorded, then tape.gradient() applies the chain rule backward. The optimizer then updates weights using apply_gradients(). This happens behind the scenes when calling model.fit(), but can be exposed via a custom training loop.

FAQ · 6 QUESTIONS

Frequently Asked Questions

What is Introduction to Keras in simple terms?

Is Keras just a wrapper for TensorFlow?

Does Keras support GPU acceleration automatically?

What is the difference between model.fit() and model.predict()?

How do I save and load a Keras model?

What should I do if training is very slow?

Naren Founder & Principal Engineer

20+ years shipping production ML systems and the infrastructure behind them. Lessons pulled from things that broke in production.

✓ Verified

production tested

July 19, 2026

last updated

2,466

articles · all by Naren

🔥

That's TensorFlow & Keras. Mark it forged?

3 min read · try the examples if you haven't