Keras Sequential vs Functional API — Which to Use and When
- Sequential API builds linear stacks — one input, one output, no exceptions. Functional API builds any directed acyclic graph. Both produce the same underlying Keras Model with identical computation graphs and zero runtime performance difference.
- Use Sequential for simple baselines and genuinely linear architectures. Switch to Functional the moment you need multi-input, multi-output, skip connections, or weight sharing — and start with Functional if there is any chance of needing these later.
- Weight sharing in Functional API: create one layer object and call it on multiple tensors. Both calls use and update the same weights. Creating new layer objects instead is the most common weight sharing mistake and shows immediately as a doubled parameter count.
- Sequential API builds models as a linear stack — one input, one output, no branches, no exceptions
- Functional API builds any directed acyclic graph — multi-input, multi-output, skip connections, shared layers, intermediate sub-models
- Both produce identical computation graphs — there is zero runtime performance difference between them
- Sequential cannot express residual connections, weight sharing, or multiple output heads — the moment you need any of these, it is the wrong tool
- Any Sequential model can be rewritten as Functional with the same layers, same weights, and identical outputs
- In Keras 3, both APIs work identically across TensorFlow, JAX, and PyTorch backends — the API choice is purely about architecture expressiveness
Graph disconnected error in Functional API
print([t.name for t in model.inputs]) # see which inputs the model knows aboutkeras.utils.plot_model(model, 'debug.png', show_shapes=True) # visual of full graphUnexpected None dimensions in model summary — shapes not propagating
model.summary() # look for (None, None) in the Output Shape columnmodel.build(input_shape=(None, 784)) # force shape inference if Input() is missingWeight sharing producing double the expected parameter count
print(len(model.layers)) # too many layers = new objects instead of reuseprint(model.count_params()) # double expected params = two separate Dense layersProduction Incident
layers.Add()([x, shortcut]) call that the team was attempting needs two input tensors from different points in the network — Sequential provides no mechanism to hold a reference to an earlier tensor and pass it to a later layer.layers.Add()([x, shortcut]) to merge the residual path, which is trivial once you're working with named tensor variables rather than implicit sequential connections. Kept the Sequential version of the non-residual portion for comparison — the weights were identical for the linear sections. Added a team guideline: if the architecture diagram has any node with more than one incoming edge, start with Functional API from day one.Production Debug GuideWhen your model fails to build or produces unexpected shapes
Input() tensor that appears anywhere in your graph must be explicitly listed in the keras.Model(inputs=[...]) constructor. If you have two input branches, both Input() tensors must be in that list. Tensors from one Model() call cannot connect to layers defined in the context of a different Model() call — they live in separate graphs.Input() layer as the first layer in Sequential — layers.Input(shape=(784,)) — or call model.build(input_shape=(None, 784)) before calling summary(). Without a concrete input shape, Keras cannot propagate dimensions through the graph and shows None everywhere.layers.Dense(64) called twice creates two separate layer objects with separate weights — that is two independent Dense layers, not one shared layer. Assign the layer to a variable first: shared = layers.Dense(64), then call shared(input_a) and shared(input_b). Both calls will use and update the same underlying weight matrix.model.fit() or model building→Check the output shape of the upstream layer with layer.output_shape and confirm it matches what the downstream layer expects. Use keras.utils.plot_model(model, show_shapes=True) to get a visual of every tensor shape flowing through the graph — this catches mismatches immediately and is far faster than reading through layer by layer in the summary.Keras provides two primary ways to build neural networks: the Sequential API and the Functional API. Both create the same underlying computation graphs — TensorFlow, JAX, or PyTorch depending on your Keras 3 backend — but they differ fundamentally in what architectures they can express. Sequential handles linear stacks of layers and nothing else. The Functional API handles any directed acyclic graph of layers: multi-input models, multi-output models, shared layers, and residual connections.
The choice matters more at design time than at runtime. Both APIs produce identical computation graphs. There is no speed difference, no memory difference, no training difference. The difference is entirely in what architectures you can express and how clearly the code communicates the intended structure to the next engineer who reads it.
In 2026 with Keras 3 supporting multiple backends, the choice of API is completely independent of whether you're running on TensorFlow, JAX, or PyTorch. I've used both in production — from simple image classifiers to multi-task systems with shared encoders and task-specific heads, to ResNet-style backbones with residual connections. Here is the practical decision framework I actually use, grounded in what goes wrong when teams make the wrong choice.
What is the Keras Sequential API?
The Sequential API builds models as a linear stack of layers, where each layer has exactly one input tensor and one output tensor. Data flows in one direction: from the first layer to the last, with no branching, no merging, and no skipping. The model is defined either by passing a list of layers to the constructor or by calling model.add() in sequence.
The Sequential API is deliberately simple — and that simplicity is its actual value. When your architecture is genuinely a straight line, Sequential communicates that intent clearly. You do not need to manage tensor variables, there are no wiring mistakes possible, and the code reads in the same order that data flows through the network. For standard feedforward networks, simple CNNs, vanilla RNNs, and baseline experiments, it is the right tool.
The limitations are structural, not a list of features that might be added later. A Sequential model cannot have multiple input branches, multiple output heads, layers that share weights with other layers, or residual connections where a later layer receives input from an earlier one. If your architecture needs any of these — and most production architectures eventually do — Sequential cannot express it and there is no workaround within the API itself.
One practical note: always include an explicit layers.Input(shape=(...)) as the first element. Without it, Keras cannot infer shapes until the first call to fit() or predict(), which means model.summary() shows None everywhere and shape errors are harder to catch before training starts.
# io.thecodeforge.keras.sequential_vs_functional.sequential_example import keras from keras import layers # Sequential API — pass a list of layers to the constructor # This is equivalent to calling model.add() for each layer in order model = keras.Sequential([ layers.Input(shape=(784,)), # Always include Input() — enables shape propagation from the start layers.Dense(256, activation='relu'), layers.Dropout(0.3), layers.Dense(128, activation='relu'), layers.Dropout(0.3), layers.Dense(10, activation='softmax'), ], name='mnist_classifier') model.compile( optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'] ) model.summary() # Equivalent using model.add() — same result, different style # Some teams prefer this for conditional layer addition during setup model_v2 = keras.Sequential(name='mnist_classifier_v2') model_v2.add(layers.Input(shape=(784,))) model_v2.add(layers.Dense(256, activation='relu')) model_v2.add(layers.Dropout(0.3)) model_v2.add(layers.Dense(128, activation='relu')) model_v2.add(layers.Dropout(0.3)) model_v2.add(layers.Dense(10, activation='softmax')) # Both models have identical architectures and would produce identical weights # after training on the same data with the same random seed print(f"model params: {model.count_params():,}") print(f"model_v2 params: {model_v2.count_params():,}") print(f"Architectures identical: {model.count_params() == model_v2.count_params()}")
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense (Dense) (None, 256) 200,960
dropout (Dropout) (None, 256) 0
dense_1 (Dense) (None, 128) 32,896
dropout_1 (Dropout) (None, 128) 0
dense_2 (Dense) (None, 10) 1,290
=================================================================
Total params: 235,146
Trainable params: 235,146
Non-trainable params: 0
model params: 235,146
model_v2 params: 235,146
Architectures identical: True
Input() layer as the first element — without it, shape propagation is deferred and model.summary() is uninformative.What is the Keras Functional API?
The Functional API builds models by defining the computation graph explicitly. You create Input() tensors, pass them through layer objects by calling those objects, and Keras tracks the connections. The model is then defined by passing the input and output tensors to keras.Model().
This explicit tensor-passing style requires more code than Sequential for simple architectures, but it removes every architectural constraint that Sequential imposes. You can split a tensor into multiple branches by passing the same tensor to multiple layer calls. You can merge tensors from different branches using Add(), Concatenate(), or Multiply(). You can reuse the same layer object on different inputs — weight sharing — by calling it multiple times. And you can create multiple output tensors from a single backbone and return all of them from the model.
The Functional API is the standard for any non-trivial production architecture. ResNet uses residual connections. Inception uses parallel convolution branches with different filter sizes. Siamese networks use shared layers called on two separate inputs. Multi-task learning models use a shared encoder with independent task-specific heads. None of these are expressible with Sequential. All of them are straightforward with Functional.
One mental model that helps: think of the Functional API as plumbing. Input() is the water source. Each layer call is a pipe fitting. Add() and Concatenate() are junction pieces. keras.Model() defines which pipes are the output taps. The layer objects are reusable fittings — you can connect the same fitting into multiple places in the plumbing system, and water flows through the same physical component in each path.
# io.thecodeforge.keras.sequential_vs_functional.functional_example import keras from keras import layers # ── SIMPLE FUNCTIONAL MODEL — same architecture as Sequential example ─────── # This demonstrates that Functional can express everything Sequential can # while also being able to express things Sequential cannot inputs = keras.Input(shape=(784,), name='image_flat') x = layers.Dense(256, activation='relu', name='dense_1')(inputs) x = layers.Dropout(0.3, name='dropout_1')(x) x = layers.Dense(128, activation='relu', name='dense_2')(x) x = layers.Dropout(0.3, name='dropout_2')(x) outputs = layers.Dense(10, activation='softmax', name='predictions')(x) model = keras.Model(inputs=inputs, outputs=outputs, name='mnist_functional') model.compile( optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'] ) model.summary() # ── MULTI-INPUT FUNCTIONAL MODEL — impossible with Sequential ─────────────── # Example: fusing image features with metadata for a richer prediction image_input = keras.Input(shape=(224, 224, 3), name='image') metadata_input = keras.Input(shape=(12,), name='metadata') # e.g. timestamp, location encoding # Image branch — a simple CNN backbone for illustration image_features = layers.Conv2D(32, 3, activation='relu')(image_input) image_features = layers.GlobalAveragePooling2D()(image_features) image_features = layers.Dense(64, activation='relu')(image_features) # Metadata branch — simpler processing meta_features = layers.Dense(16, activation='relu')(metadata_input) # Merge both branches combined = layers.Concatenate()([image_features, meta_features]) combined = layers.Dense(64, activation='relu')(combined) predictions = layers.Dense(5, activation='softmax', name='class_output')(combined) multi_input_model = keras.Model( inputs=[image_input, metadata_input], # both inputs declared here outputs=predictions, name='image_plus_metadata_model' ) print(f"\nMulti-input model inputs: {[i.name for i in multi_input_model.inputs]}") print(f"Parameters: {multi_input_model.count_params():,}")
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
image_flat (InputLayer) (None, 784) 0
dense_1 (Dense) (None, 256) 200,960
dropout_1 (Dropout) (None, 256) 0
dense_2 (Dense) (None, 128) 32,896
dropout_2 (Dropout) (None, 128) 0
predictions (Dense) (None, 10) 1,290
=================================================================
Total params: 235,146
Trainable params: 235,146
Non-trainable params: 0
Multi-input model inputs: ['image', 'metadata']
Parameters: 237,637
Input()creates the power source — the entry point for data into the graph- Each layer call connects an output wire to the next component's input terminal — the return value is the output tensor
Add()andConcatenate()are junction boxes — they merge multiple wires into one output wire- Calling the same layer object on two different tensors is a shared component — the same internal weights are used and updated from both paths during backpropagation
- keras.Model(inputs, outputs) defines which power sources and which output terminals constitute the model — everything in between is inferred from the tensor graph
Model Subclassing API — The Third Option
Keras also offers a third approach: Model Subclassing. You inherit from keras.Model, define your layers in __init__, and implement the actual forward pass in the call() method. This gives you full imperative control flow inside the forward pass — if statements that change which layers execute, for loops that iterate over a dynamic number of steps, conditional branching based on the values of tensors rather than just their shapes.
I reach for Subclassing only in specific situations. Research prototypes where the computation graph changes during training. Reinforcement learning agents where the action space or episode structure affects the forward pass. Recursive architectures where the number of steps is input-dependent. Tree-structured models. Anything where the graph topology is not fixed at definition time.
For everything else — including quite complex static architectures — I use Functional. The reason is tooling. Functional models produce complete, accurate model.summary() output with correct shapes at every layer. keras.utils.plot_model() generates a full visual graph. Serialisation with model.save() works completely and portably across backend switches. Subclassing models have more limited tooling support in all three areas, and the dynamic graph means that shape errors can surface at runtime during training rather than at graph construction time.
The practical rule: if you can draw the architecture as a fixed DAG on a whiteboard and have it not change during training, use Functional. If the graph topology is genuinely dynamic — if what you're drawing on the whiteboard would need to include conditional branches based on tensor values — use Subclassing.
# io.thecodeforge.keras.sequential_vs_functional.subclassing_example import keras from keras import layers class ResidualBlock(keras.Model): """A single residual block — a natural Subclassing use case because the block encapsulates reusable internal logic with its own layer state. Note: for a full static model, you'd wire these blocks together with the Functional API. Subclassing the block itself is the right boundary. """ def __init__(self, filters, use_projection=False): super().__init__() self.conv1 = layers.Conv2D(filters, 3, padding='same', activation='relu') self.conv2 = layers.Conv2D(filters, 3, padding='same') self.bn1 = layers.BatchNormalization() self.bn2 = layers.BatchNormalization() self.relu = layers.Activation('relu') # Optional projection shortcut when input/output channels differ self.use_projection = use_projection if use_projection: self.projection = layers.Conv2D(filters, 1, padding='same') def call(self, inputs, training=False): shortcut = inputs x = self.conv1(inputs) x = self.bn1(x, training=training) x = self.conv2(x) x = self.bn2(x, training=training) if self.use_projection: shortcut = self.projection(inputs) # This Add is why we need Subclassing — conditional shortcut projection # makes the graph topology depend on a constructor argument return self.relu(x + shortcut) class SimpleClassifier(keras.Model): """A classifier using the residual block above. Uses Subclassing here only because the forward pass contains a training-conditional dropout rate — otherwise Functional would be better. """ def __init__(self, num_classes, dropout_rate=0.3): super().__init__() self.conv_stem = layers.Conv2D(32, 3, activation='relu', padding='same') self.residual_1 = ResidualBlock(32) self.residual_2 = ResidualBlock(64, use_projection=True) # conditional projection self.pool = layers.GlobalAveragePooling2D() self.dropout = layers.Dropout(dropout_rate) self.classifier = layers.Dense(num_classes, activation='softmax') def call(self, inputs, training=False): x = self.conv_stem(inputs) x = self.residual_1(x, training=training) x = self.residual_2(x, training=training) x = self.pool(x) x = self.dropout(x, training=training) # training flag passed explicitly return self.classifier(x) model = SimpleClassifier(num_classes=10, dropout_rate=0.4) model.compile( optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'] ) # Build with concrete shapes so summary() shows layer dimensions model.build(input_shape=(None, 32, 32, 3)) print(f"Parameters: {model.count_params():,}")
model.summary() shows less shape information, plot_model() produces less useful graphs, and serialisation edge cases surface more often than with Functional models.plot_model() and model.summary() become less informative.Transfer Learning and Fine-Tuning — The Most Common Production Use Case
Transfer learning is one of the most common reasons teams encounter the Functional API in production, even when they started with Sequential for their own layers. Almost all pretrained models in keras.applications are built with the Functional API — ResNet50, EfficientNet, MobileNetV3, VGG16. When you load one of these and add custom layers on top, you are working with Functional models whether you explicitly chose the API or not.
The standard two-phase fine-tuning pattern I use in production is worth understanding in detail, because the ordering matters and getting it wrong in either direction has concrete consequences.
Phase 1 — train the new head on frozen backbone: set base_model.trainable = False before compiling. This ensures the randomly initialised head layers do not immediately destroy the pretrained features in the backbone through large gradient updates. The learning rate can be normal during this phase since only the head weights are updating. Run for enough epochs that the head has learned a reasonable mapping from backbone features to your task.
Phase 2 — fine-tune the top layers of the backbone: set base_model.trainable = True, then selectively freeze the bottom layers. Use a learning rate that is one to two orders of magnitude lower than Phase 1 — typically 1e-5 or lower. The lower rate is essential: the backbone features are already good, and you want to nudge them toward your domain without destroying the general representations. Recompile the model after changing trainable flags — this is not optional, the optimiser state needs to reflect the new trainable parameter set.
# io.thecodeforge.keras.sequential_vs_functional.transfer_learning_example import keras from keras import layers import numpy as np # Simulate training data — replace with your actual data pipeline X_train = np.random.rand(200, 224, 224, 3).astype(np.float32) y_train = np.random.randint(0, 10, size=(200,)) # Load pretrained backbone — include_top=False removes the original classifier head # The returned model is built with the Functional API base_model = keras.applications.ResNet50( weights='imagenet', include_top=False, input_shape=(224, 224, 3) ) print(f"Backbone layers: {len(base_model.layers)}") print(f"Backbone params: {base_model.count_params():,}") # ── PHASE 1: Freeze backbone, train only the new head ────────────────────── # This must come BEFORE compiling — the trainable flag is read at compile time base_model.trainable = False # Add custom classification head using Functional API # base_model.input and base_model.output are standard Functional API tensors x = base_model.output x = layers.GlobalAveragePooling2D(name='gap')(x) x = layers.Dense(256, activation='relu', name='head_dense')(x) x = layers.Dropout(0.4, name='head_dropout')(x) outputs = layers.Dense(10, activation='softmax', name='predictions')(x) model = keras.Model( inputs=base_model.input, outputs=outputs, name='resnet_transfer' ) trainable_in_phase1 = sum(1 for l in model.layers if l.trainable) print(f"\nPhase 1 — trainable layers: {trainable_in_phase1} of {len(model.layers)}") model.compile( optimizer=keras.optimizers.Adam(learning_rate=1e-3), loss='sparse_categorical_crossentropy', metrics=['accuracy'] ) model.fit(X_train, y_train, epochs=3, batch_size=16, verbose=1) # ── PHASE 2: Unfreeze top layers, fine-tune with very low LR ─────────────── # Enable backbone training base_model.trainable = True # Freeze all layers except the top 20 — preserve low-level features # that transfer well (edges, textures) while adapting high-level representations for layer in base_model.layers[:-20]: layer.trainable = False trainable_in_phase2 = sum(1 for l in model.layers if l.trainable) print(f"\nPhase 2 — trainable layers: {trainable_in_phase2} of {len(model.layers)}") # CRITICAL: always recompile after changing trainable flags # The optimiser needs to know which parameters to track model.compile( optimizer=keras.optimizers.Adam(learning_rate=1e-5), # 100x lower than Phase 1 loss='sparse_categorical_crossentropy', metrics=['accuracy'] ) model.fit(X_train, y_train, epochs=5, batch_size=16, verbose=1) print(f"\nFinal model params: {model.count_params():,}") print(f"Final trainable params: {sum(np.prod(w.shape) for w in model.trainable_weights):,}")
Backbone params: 23,587,712
Phase 1 — trainable layers: 4 of 181
Epoch 1/3
13/13 ━━━━━━━━━━━━━━━━━━━━ 12s 928ms/step - accuracy: 0.0950 - loss: 2.3781
Epoch 2/3
13/13 ━━━━━━━━━━━━━━━━━━━━ 8s 621ms/step - accuracy: 0.1400 - loss: 2.2901
Epoch 3/3
13/13 ━━━━━━━━━━━━━━━━━━━━ 8s 619ms/step - accuracy: 0.1750 - loss: 2.2034
Phase 2 — trainable layers: 24 of 181
Epoch 1/5
13/13 ━━━━━━━━━━━━━━━━━━━━ 15s 1s/step - accuracy: 0.1950 - loss: 2.1456
...
Final model params: 23,862,282
Final trainable params: 1,458,176
Decision Framework — Which API Should You Choose?
Here is the practical decision tree I actually use in production when starting a new model.
Is the architecture a strict linear chain with one input and one output? Use Sequential. Is the architecture anything other than a strict linear chain — multiple inputs, multiple outputs, residual connections, parallel branches, shared layers, intermediate sub-model extraction? Use Functional. Does the forward pass require imperative control flow — if statements or for loops over a dynamic number of steps that depend on tensor values, not just shapes? Use Subclassing, or Subclass individual blocks and wire them with Functional at the model level.
The decision is purely about architectural expressiveness. There is no runtime performance difference between Sequential and Functional — both produce the same type of Keras Model object with the same computation graph. The weights are identical, training is identical, inference is identical. You are choosing between two syntaxes for describing the same underlying graph.
One rule of thumb that has saved multiple teams I've worked with: if you are not certain the architecture will remain a linear stack for the entire project lifetime, start with Functional. Migrating from Functional to Sequential is pointless since Sequential is strictly less expressive. Migrating from Sequential to Functional when you hit the first skip connection at week six of a project is a frustrating and avoidable interruption.
# io.thecodeforge.keras.sequential_vs_functional.api_decision_example # Demonstrates that Sequential and Functional produce identical models # for the same linear architecture — confirming zero performance difference import keras from keras import layers import numpy as np # ── SAME ARCHITECTURE expressed two ways ──────────────────────────────────── # Sequential version seq_model = keras.Sequential([ layers.Input(shape=(20,)), layers.Dense(64, activation='relu'), layers.Dense(32, activation='relu'), layers.Dense(1, activation='sigmoid'), ], name='sequential_version') # Functional version — identical architecture inputs = keras.Input(shape=(20,)) x = layers.Dense(64, activation='relu')(inputs) x = layers.Dense(32, activation='relu')(x) outputs = layers.Dense(1, activation='sigmoid')(x) func_model = keras.Model(inputs, outputs, name='functional_version') # Verify identical parameter counts print(f"Sequential params : {seq_model.count_params():,}") print(f"Functional params : {func_model.count_params():,}") print(f"Identical : {seq_model.count_params() == func_model.count_params()}") # ── CAPABILITY BOUNDARY — what Sequential cannot do ───────────────────────── # This is the architecture Sequential cannot express: # A model with a residual connection (shortcut from input to output) skip_input = keras.Input(shape=(64,), name='residual_demo_input') processed = layers.Dense(64, activation='relu')(skip_input) # transform processed = layers.Dense(64)(processed) # second transform shortcut = skip_input # hold original merged = layers.Add()([processed, shortcut]) # residual add merged = layers.Activation('relu')(merged) resid_output = layers.Dense(10, activation='softmax')(merged) residual_model = keras.Model(skip_input, resid_output, name='residual_model') print(f"\nResidual model params: {residual_model.count_params():,}") print("Residual model is impossible to express with Sequential API") # ── VISUAL CONFIRMATION ───────────────────────────────────────────────────── keras.utils.plot_model( residual_model, to_file='residual_model.png', show_shapes=True, show_layer_names=True ) print("Graph saved to residual_model.png — the Add() merge is visible")
Functional params : 3,393
Identical : True
Residual model params: 9,098
Residual model is impossible to express with Sequential API
Graph saved to residual_model.png — the Add() merge is visible
Debugging Common Architecture Errors
The Functional API is more powerful than Sequential, but it surfaces errors in ways that can be cryptic until you understand the pattern behind them. Almost every Functional API error I've seen in production falls into one of four categories, and each has a clear diagnostic approach.
The graph disconnected error is the most common. It means you're trying to include a tensor in your model's computation graph that traces back to an Input() layer not listed in the keras.Model(inputs=[...]) constructor. The fix is always the same: check which Input() layers your tensors come from and make sure all of them are listed.
The None dimensions error typically means a Sequential model is missing an explicit Input() layer, or you are calling model.summary() before the model has processed any data. Adding Input() as the first layer is almost always the fix.
Weight sharing bugs are usually discovered through the parameter count: if your Siamese network has double the expected parameters, you created two separate layer objects instead of calling one shared object twice.
Shape errors during training are best diagnosed visually. plot_model() with show_shapes=True prints the tensor shape at every layer. Reading model.summary() works but is slower for complex graphs — the visual is much faster for identifying where a dimension mismatch occurs.
# io.thecodeforge.keras.sequential_vs_functional.debugging_example import keras from keras import layers # ── ERROR 1: Graph disconnected — missing input in keras.Model() ───────────── print("=== Demonstrating Graph Disconnected Error ===") input_a = keras.Input(shape=(128,), name='branch_a') input_b = keras.Input(shape=(64,), name='branch_b') branch_a_out = layers.Dense(64, activation='relu')(input_a) branch_b_out = layers.Dense(64, activation='relu')(input_b) merged = layers.Concatenate()([branch_a_out, branch_b_out]) output = layers.Dense(10, activation='softmax')(merged) # WRONG: input_b is missing from the inputs list — will raise graph disconnected error try: bad_model = keras.Model(inputs=input_a, outputs=output) # input_b not listed! except Exception as e: print(f"Expected error: {type(e).__name__}: {str(e)[:100]}...") # CORRECT: both input tensors in the list good_model = keras.Model(inputs=[input_a, input_b], outputs=output) print(f"Correct model inputs: {[i.name for i in good_model.inputs]}") # ── ERROR 2: None dimensions in Sequential — missing Input() layer ────────── print("\n=== Sequential Without Input() — None Dimensions ===") bad_sequential = keras.Sequential([ layers.Dense(64, activation='relu'), # No Input() — shapes are unknown layers.Dense(10, activation='softmax') ]) print("Without Input():") bad_sequential.summary() # Output Shape column shows (None, None) good_sequential = keras.Sequential([ layers.Input(shape=(784,)), # Input() added — shapes now propagate layers.Dense(64, activation='relu'), layers.Dense(10, activation='softmax') ]) print("\nWith Input():") good_sequential.summary() # Output Shape column shows concrete dimensions # ── ERROR 3: Weight sharing — wrong approach vs correct approach ───────────── print("\n=== Weight Sharing: Wrong vs Correct ===") siamese_input_a = keras.Input(shape=(100,), name='left') siamese_input_b = keras.Input(shape=(100,), name='right') # WRONG: two separate Dense layers — double the parameters, not weight sharing out_a_wrong = layers.Dense(64)(siamese_input_a) # creates Dense layer #1 out_b_wrong = layers.Dense(64)(siamese_input_b) # creates Dense layer #2 — DIFFERENT WEIGHTS bad_siamese = keras.Model([siamese_input_a, siamese_input_b], layers.Concatenate()([out_a_wrong, out_b_wrong])) print(f"Wrong Siamese params: {bad_siamese.count_params():,} (64*100+64 for each branch = two separate layers)") # CORRECT: one layer object, called twice — same weights used in both branches shared_encoder = layers.Dense(64, name='shared_encoder') # ONE layer object out_a_correct = shared_encoder(siamese_input_a) # call #1 — uses shared weights out_b_correct = shared_encoder(siamese_input_b) # call #2 — same weights, accumulated gradients good_siamese = keras.Model([siamese_input_a, siamese_input_b], layers.Concatenate()([out_a_correct, out_b_correct])) print(f"Correct Siamese params: {good_siamese.count_params():,} (one set of 64*100+64 shared weights)") # Verify visually keras.utils.plot_model(good_siamese, 'siamese.png', show_shapes=True, show_layer_names=True) print("\nSiamese graph saved — shared_encoder node should appear once with two connections")
Expected error: ValueError: Graph disconnected: cannot obtain value for tensor 'branch_b' at layer...
Correct model inputs: ['branch_a', 'branch_b']
=== Sequential Without Input() — None Dimensions ===
Without Input():
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense (Dense) (None, 64) 0
dense_1 (Dense) (None, 10) 0
=================================================================
Total params: 0
Trainable params: 0
With Input():
Model: "sequential_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense_2 (Dense) (None, 64) 50,240
dense_3 (Dense) (None, 10) 650
=================================================================
Total params: 50,890
Trainable params: 50,890
=== Weight Sharing: Wrong vs Correct ===
Wrong Siamese params: 12,928 (two separate Dense layers)
Correct Siamese params: 6,464 (one shared Dense layer called twice)
Siamese graph saved — shared_encoder node should appear once with two connections
Input() tensor that appears in the graph is not listed in the keras.Model(inputs=[...]) constructor. Trace the error tensor back to its Input() layer, then add that Input() to the list. Every Input() in the graph must be in that list — no exceptions.model.summary().summary() output for complex graphs and catches tensor dimension mismatches visually.model.summary() parameter counts carefully for any model with shared layers — the count should reflect sharing, not duplication.Input() layer is missing from the keras.Model(inputs=[...]) list — add every Input() used in the graph to that list.Input() layer as the first element — add layers.Input(shape=(...)) to resolve it.Autoencoders — A Natural Functional API Pattern
Autoencoders are worth covering explicitly because they demonstrate two Functional API capabilities that Sequential fundamentally cannot support, and they're a common architecture for dimensionality reduction, anomaly detection, generative modelling, and representation learning.
The first capability: sub-model extraction. With the Functional API, you can create multiple Keras Model objects from the same computation graph. The encoder model and the autoencoder model share the same layer objects and the same weights — training the autoencoder updates the encoder's weights, and the encoder model immediately reflects those updated weights. No copying, no re-training, no synchronisation code.
The second capability: conditional graph reuse. You can attach different decoders to the same encoder for experiments — one decoder for image reconstruction, another for masked patch prediction, another for contrastive learning objectives — and all of them share the encoder's weights while each has its own loss function and training data.
This pattern extends directly to any architecture with reusable intermediate representations: vision-language models where the image and text encoders feed different downstream heads, multi-task models where a shared feature extractor drives separate classification and regression heads, and distillation setups where a student encoder is trained to match a teacher encoder's representations.
# io.thecodeforge.keras.sequential_vs_functional.autoencoder_example import keras from keras import layers import numpy as np # ── BUILD THE GRAPH — define layers and connect tensors ───────────────────── encoder_input = keras.Input(shape=(784,), name='original_image') # Encoder path — progressively compress the representation encoded = layers.Dense(256, activation='relu', name='enc_256')(encoder_input) encoded = layers.Dense(128, activation='relu', name='enc_128')(encoded) latent = layers.Dense(32, activation='relu', name='latent')(encoded) # 32-dim bottleneck # Decoder path — reconstruct from the latent representation decoded = layers.Dense(128, activation='relu', name='dec_128')(latent) decoded = layers.Dense(256, activation='relu', name='dec_256')(decoded) reconstructed = layers.Dense(784, activation='sigmoid', name='reconstructed')(decoded) # ── CREATE MULTIPLE MODELS FROM THE SAME GRAPH ────────────────────────────── # The autoencoder: input → encoder → decoder → reconstruction autoencoder = keras.Model( inputs=encoder_input, outputs=reconstructed, name='autoencoder' ) # The encoder only: input → latent representation # This reuses the SAME layer objects — no weight copying, no new parameters encoder = keras.Model( inputs=encoder_input, outputs=latent, name='encoder' ) print("=== Model Parameter Counts ===") print(f"Autoencoder params: {autoencoder.count_params():,}") print(f"Encoder params: {encoder.count_params():,}") print(f"Encoder is subset: {encoder.count_params() < autoencoder.count_params()}") print() # Compile and train the autoencoder — unsupervised, input = target autoencoder.compile( optimizer='adam', loss='binary_crossentropy' ) # Generate synthetic data for demonstration X_demo = np.random.rand(500, 784).astype(np.float32) autoencoder.fit(X_demo, X_demo, epochs=3, batch_size=32, verbose=1) print("\n=== Using the Encoder After Training ===") # Training the autoencoder updated the encoder's weights — automatically # because they share the same layer objects — no sync needed X_sample = X_demo[:5] latent_vectors = encoder.predict(X_sample, verbose=0) print(f"Input shape: {X_sample.shape}") print(f"Latent shape: {latent_vectors.shape}") print(f"Compression ratio: {X_sample.shape[1] / latent_vectors.shape[1]:.0f}x") # Verify shared weights: the encoder's output is the same as # getting the intermediate tensor from the autoencoder auto_latent = keras.Model( inputs=encoder_input, outputs=autoencoder.get_layer('latent').output ).predict(X_sample, verbose=0) import numpy as np print(f"\nEncoder vs autoencoder intermediate: outputs match = {np.allclose(latent_vectors, auto_latent)}") print("This confirms shared weights — same layer objects, same computation")
Autoencoder params: 337,904
Encoder params: 236,128
Encoder is subset: True
Epoch 1/3
16/16 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - loss: 0.6932
Epoch 2/3
16/16 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - loss: 0.6927
Epoch 3/3
16/16 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - loss: 0.6921
=== Using the Encoder After Training ===
Input shape: (5, 784)
Latent shape: (5, 32)
Compression ratio: 24x
Encoder vs autoencoder intermediate: outputs match = True
This confirms shared weights — same layer objects, same computation
predict() on it.| Feature | Sequential API | Functional API | Model Subclassing |
|---|---|---|---|
| Architecture type | Linear stack only — one input, one output, no exceptions | Any directed acyclic graph — branching, merging, skipping | Any graph plus dynamic control flow in the forward pass |
| Multi-input models | No — impossible by definition | Yes — pass a list of Input() tensors to keras.Model() | Yes — handled in call() with multiple arguments |
| Multi-output models | No — impossible by definition | Yes — return a list of output tensors from keras.Model() | Yes — return a tuple or dict from call() |
| Shared layers (weight reuse) | No — each layer position is called exactly once | Yes — assign layer to variable, call it on multiple tensors | Yes — call self.layer on multiple inputs in call() |
| Residual / skip connections | No — a layer can only receive the immediately preceding layer's output | Yes — Add()([current_output, earlier_tensor]) is straightforward | Yes — handled imperatively in call() |
| Intermediate sub-model extraction | Awkward — requires layer indexing hacks | Natural — create keras.Model(input, intermediate_tensor) from any tensor | Not supported — graph is not static |
| model.summary() quality | Good for linear stacks — shows concrete shapes when Input() is present | Excellent — shows full graph with shapes at every layer | Partial — shapes not always resolvable without running data through |
| plot_model() readability | Low value for complex linear stacks | High — shows the full DAG visually with tensor shapes | Limited — dynamic graph may not render completely |
| Debugging difficulty | Easiest — errors surface immediately at add() or compile() | Medium — graph disconnected and shape errors are common but diagnosable | Hardest — errors often surface at runtime during training |
| Transfer learning | Awkward — requires accessing pretrained model layers by index | Natural — use base_model.input and base_model.output directly | Natural — call base_model in call() as a layer |
| Keras 3 backend support | Yes — TensorFlow, JAX, PyTorch | Yes — TensorFlow, JAX, PyTorch | Yes — TensorFlow, JAX, PyTorch |
| Best used for | Simple baselines, genuinely linear architectures, teaching examples | Most real production architectures — any non-trivial model | Research prototypes, RL agents, genuinely dynamic architectures |
🎯 Key Takeaways
- Sequential API builds linear stacks — one input, one output, no exceptions. Functional API builds any directed acyclic graph. Both produce the same underlying Keras Model with identical computation graphs and zero runtime performance difference.
- Use Sequential for simple baselines and genuinely linear architectures. Switch to Functional the moment you need multi-input, multi-output, skip connections, or weight sharing — and start with Functional if there is any chance of needing these later.
- Weight sharing in Functional API: create one layer object and call it on multiple tensors. Both calls use and update the same weights. Creating new layer objects instead is the most common weight sharing mistake and shows immediately as a doubled parameter count.
- Multi-input and multi-output models require the Functional API — pass a list of
Input()tensors to keras.Model(inputs=[...]) and return a list of output tensors from keras.Model(outputs=[...]). - Any Sequential model can be rewritten as a Functional model with the same layers, same weights, and identical outputs. The reverse is not always possible — Functional models with branching cannot be converted to Sequential.
- Transfer learning requires the Functional API — all pretrained models in keras.applications are Functional. Freeze the backbone before Phase 1, recompile after changing trainable flags, and use a learning rate of 1e-5 or lower for Phase 2 fine-tuning.
- In Keras 3, both APIs work identically across TensorFlow, JAX, and PyTorch backends. The API choice is purely about architecture expressiveness, not backend or performance.
- Model Subclassing is for genuinely dynamic architectures — use it at the block level for components with internal logic, and wire those blocks together with the Functional API at the model level.
⚠ Common Mistakes to Avoid
Interview Questions on This Topic
- QWhat is the difference between the Keras Sequential and Functional API?JuniorReveal
- QWhen would you use the Functional API over Sequential, and can you give a concrete example?JuniorReveal
- QHow does weight sharing work in the Keras Functional API, and how would you verify it is working correctly?Mid-levelReveal
- QCan you convert a Sequential model to a Functional model? Is the reverse always possible?Mid-levelReveal
- QWhat is a multi-output model in Keras and when would you build one?Mid-levelReveal
- QWhen would you choose Model Subclassing over the Functional API for a production model?SeniorReveal
Frequently Asked Questions
What is the Keras Sequential API?
The Keras Sequential API builds neural networks as a linear stack of layers. You add layers in order with model.add() or pass them as a list to keras.Sequential(). Data flows from the first layer to the last in one straight path with no branching and no skip connections. It is the simplest Keras API, appropriate for feedforward networks, simple CNNs, and linear RNNs where the architecture has no branches. Always include an explicit Input() layer as the first element to enable shape propagation from the start.
What is the Keras Functional API?
The Keras Functional API builds models by explicitly calling layer objects on tensors and connecting them into a computation graph. You create Input() tensors, call layers on them to produce output tensors, and define the model with keras.Model(inputs, outputs). It supports any architecture expressible as a directed acyclic graph: multi-input, multi-output, residual connections, parallel branches, shared layers, and intermediate sub-model extraction. It is the standard API for non-trivial production architectures and all models in keras.applications.
Which Keras API should I use for most projects?
Default to the Functional API for production work. It supports everything Sequential supports and everything Sequential cannot express. The code is slightly more verbose for simple architectures but that verbosity is constant — it does not grow with complexity the way Sequential workarounds do. Use Sequential only when you are certain the architecture is and will remain a strict linear stack with no branching. Use Model Subclassing only for architectures with genuinely dynamic computation graphs.
What is a residual connection and how do you build one with Keras?
A residual connection adds a layer's input directly to its output: output = F(x) + x. Introduced in ResNet to enable training of very deep networks by giving gradients a shortcut path during backpropagation. In Keras Functional API: shortcut = x; x = layers.Dense(64, activation='relu')(x); x = layers.Dense(64)(x); x = layers.. The Add()([x, shortcut]); x = layers.Activation('relu')(x)Add() layer receives two input tensors — one from the current path, one from the shortcut — which is why residual connections require the Functional API and cannot be expressed with Sequential.
Can Keras Functional models be saved and loaded?
Yes. Both Sequential and Functional models use the same serialisation API. Save with model.save('model.keras') (the recommended Keras native format) or model.save('model_dir', save_format='tf') for TensorFlow SavedModel format. Load with keras.models.load_model('model.keras'). The save format is independent of which Keras API you used to build the model. In Keras 3, model.save() uses the .keras format by default, which is portable across backends.
Developer and founder of TheCodeForge. I built this site because I was tired of tutorials that explain what to type without explaining why it works. Every article here is written to make concepts actually click.