Skip to content
Home CS Fundamentals IP Subnetting - The /25 Mask That Broke Internet Access

IP Subnetting - The /25 Mask That Broke Internet Access

Where developers are forged. · Structured learning · Free forever.
📍 Part of: Computer Networks → Topic 8 of 22
A /25 subnet mask instead of /24 made EC2 instances unreachable from the internet gateway.
⚙️ Intermediate — basic CS Fundamentals knowledge assumed
In this tutorial, you'll learn
A /25 subnet mask instead of /24 made EC2 instances unreachable from the internet gateway.
  • What is IP Addressing and Subnetting?
  • CIDR Notation: How to Read and Calculate Hosts
  • Subnet Masks: Binary and Decimal
✦ Plain-English analogy ✦ Real code with output ✦ Interview questions
Quick Answer
  • IP addressing uniquely identifies devices; subnetting divides address space into smaller, routable blocks.
  • CIDR notation (e.g., /24) replaces classful addressing and defines how many host bits you get.
  • Hosts = 2^(32 — prefix) - 2 — the -2 is for network and broadcast addresses you cannot assign.
  • Production failure: a /25 instead of /24 changes the network boundary and can make your default gateway unreachable.
  • Performance insight: each wrong bit in a subnet mask can route traffic to the wrong VLAN or blackhole it entirely.
  • Biggest mistake: thinking subnetting is only about saving IPs. It's about routing — wrong mask, wrong network.
🚨 START HERE

Subnet Calculation Quick Cheat Sheet

Use these commands to validate CIDR, mask, and host counts when debugging network designs.
🟡

Need to know how many usable IPs a CIDR provides

Immediate ActionRun `ipcalc 10.0.1.0/24` (Linux/macOS) or use an online calculator.
Commands
ipcalc 10.0.1.0/24
ipcalc 10.0.1.0/24 --ipaddress 10.0.1.15
Fix NowUse a /23 if you need more than 254 hosts — never use /28 for production workloads unless you're sure.
🟡

Wondering if two subnets overlap

Immediate ActionConvert both to binary and compare network bits. Tools like subnetcalc.net can do this.
Commands
ipcalc 10.0.1.0/24 10.0.2.0/24
python3 -c "from ipaddress import ip_network; print('overlap' if ip_network('10.0.1.0/24').overlaps(ip_network('10.0.2.0/24')) else 'no')"
Fix NowRedesign the two overlapping blocks — you cannot use the same address space in two different parts of your network without NAT or VPC peering with non-overlapping CIDRs.
🟡

Need the subnet mask from a CIDR prefix length

Immediate ActionMemorise the common ones: /24 = 255.255.255.0, /16 = 255.255.0.0, /8 = 255.0.0.0.
Commands
printf '/24 = %s\n' $(python3 -c "import ipaddress; print(ipaddress.IPv4Network('0.0.0.0/24').netmask)")
Fix NowFor any prefix, convert: subnet mask = 0xFFFFFFFF << (32 - prefix) as dotted decimal.
🟡

Need to check if a CIDR is in the RFC 1918 private range

Immediate ActionUse Python's ipaddress module to test against known private ranges.
Commands
python3 -c "from ipaddress import ip_network; net = ip_network('10.0.0.0/8'); print('private' if net.is_private else 'public')"
Fix NowIf a CIDR is public but used internally, ensure it's not leaked via routing. Use private ranges (10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16) to avoid conflicts.
🟡

Need to calculate how many /24 subnets fit into a /20 VPC

Immediate ActionDivide the total addresses: a /20 has 4096 total addresses, so 16 /24 subnets. Use ipcalc or Python.
Commands
python3 -c "print(2**(20-24))" # gives 16 for /20 -> /24
python3 -c "from ipaddress import ip_network; net = ip_network('10.0.0.0/20'); print(list(net.subnets(new_prefix=24))[:5])"
Fix NowWhen planning subnets, start from the largest block and work down. Ensure the total subnets don't exceed available space.
🟡

Need to verify the network address of a host IP

Immediate ActionUse `ipcalc` or Python: `ip_network('10.0.1.55/24', strict=False).network_address`
Commands
python3 -c "from ipaddress import ip_network; net = ip_network('10.0.1.55/24', strict=False); print(net.network_address)"
Fix NowIf the network address doesn't match the subnet's base, you've got a mask mismatch.
🔴

Need to check if a subnet has enough room for a given number of hosts

Immediate ActionCalculate usable hosts: 2^(32-prefix)-2. For AWS subtract 5. Use Python.
Commands
python3 -c "from ipaddress import ip_network; net = ip_network('10.0.0.0/24'); print('Usable:', net.num_addresses - 2)"
python3 -c "print('AWS usable:', 2**(32-24) - 5)"
Fix NowAlways add 30% headroom. If you need 100 IPs, use /25 (126 usable) not /26 (62).
Production Incident

The /25 That Killed Internet Access

A single wrong subnet mask blocked all outbound traffic for a production microservice. Here's what happened and how to avoid it.
SymptomEC2 instances in a public subnet could not reach the internet (yum update failed, API calls timed out), despite having a correct route table with 0.0.0.0/0 pointing to an internet gateway.
AssumptionThe route table must be broken — someone must have deleted the default route.
Root causeThe subnet was created with a /25 mask (10.0.1.0/25) instead of the intended /24. The internet gateway's attachment was in the original VPC /16 range, but the subnet's network address (10.0.1.0) and the gateway's address (10.0.1.1) were actually in different subnets due to the mask boundary shift. The gateway simply wasn't reachable from that subnet.
FixRecreated the subnet with the correct CIDR 10.0.1.0/24 and migrated the instances. No route table change was needed.
Key Lesson
Always double-check subnet mask boundaries — a /25 vs /24 shifts the network address and can break connectivity silently.When designing public subnets, use at least /24 to avoid confusion and leave room for growth.Automate subnet creation with infrastructure-as-code and validate CIDR alignment before applying.Always validate subnet mask before attaching internet gateway — a mismatch can take down outbound traffic silently.After the fix, run a connectivity test from inside the subnet: ping 8.8.8.8 should succeed immediately.Lesson: When in doubt, use /24. The cost of a larger subnet is zero; the cost of debugging a wrong mask is hours.
Production Debug Guide

Quick reference for diagnosing common subnet-related production issues.

EC2 launch fails with 'Insufficient IP addresses'Check subnet size: use aws ec2 describe-subnets --subnet-ids and look at 'AvailableIpAddressCount'. Increase subnet size or create a larger one.
Instance gets an IP but cannot reach internet (public subnet)Verify subnet is correctly associated with a route table that has 0.0.0.0/0 to an internet gateway. Then confirm the internet gateway is attached to the VPC.
Two peers cannot communicate over VPC peeringCheck for overlapping CIDR blocks between the two VPCs. If they overlap, peering fails silently.
Ping to a neighbour fails but config looks correctVerify both ends have the same subnet mask. Mismatched masks cause routing to treat the neighbour as on a different network.
VPN connection failing between on-premises and cloudEnsure the on-premises CIDR and cloud VPC CIDR do not overlap. If they do, re-address one side or use NAT. Check VPN tunnel status and BGP prefixes.
Two routers with the same public IP range but different prefixes cannot peerUse ipcalc to calculate the network address for both prefixes. If they differ, one router must be reconfigured with a matching mask or the route must be summarised.
Application logs show intermittent connectivity to a specific serviceCheck if the service runs in a different subnet that overlaps with a subnet of another VPC. Overlapping can cause asymmetric routing.
EC2 instance gets IP but internal traffic to another subnet failsVerify that the subnet's route table has routes for the destination subnet. A missing route or a mismatched mask on the subnet itself can cause this.
Auto Scaling group fails to launch instances due to IP exhaustionCheck the subnet's current available IP count. If below 10% of total, consider adding a larger subnet or distributing across more subnets. Use a /23 or larger for auto-scaling groups.
Route summarisation causes traffic blackholeVerify that the aggregate route exactly covers only the subnets you own. Use ipcalc or ipaddress.collapse_addresses to check for gaps.

Every packet that crosses your network — an API call, a database query, a Kubernetes pod talking to another — carries a source and destination IP. Without IP addressing, you're just shouting into the void. Get the subnet mask wrong and traffic doesn't just slow down. It stops.

IPv4 has about 4.3 billion addresses, and we ran out years ago. Subnetting, CIDR, and private ranges are the engineering fixes that made the internet keep working. They're baked into every VPC, every router config, every cloud environment you'll ever touch.

Here's the truth: most engineers won't calculate subnets by hand daily. But the one time you need to, a single wrong mask can silence an entire production fleet. That's why you need to understand it — not just pass a cert exam.

This guide covers CIDR math, binary masks, the /25 that broke internet access, and the Python commands that'll save you from subnet calculators forever.

The real cost of a misconfigured subnet isn't just wasted IPs – it's hours of debugging, missed SLAs, and sometimes a full incident post-mortem. That's why this guide focuses on what actually breaks and how to fix it fast.

What is IP Addressing and Subnetting?

IP Addressing and Subnetting is a core concept in CS Fundamentals. Rather than starting with a dry definition, let's see it in action and understand why it exists. An IP address is a 32-bit binary number, usually written in dotted decimal. The subnet mask separates the address into a network part and a host part. Routers use the network part to forward packets; the host part identifies a specific device on that network. Subnetting lets you split one large network into smaller ones, a technique that reduces routing table size, improves security, and conserves addresses. Without it, every router on the internet would need to know the location of every single host — an impossible task. In production, the key insight is that the boundary between network and host is purely a design decision: you choose the mask. Choose wrong and you either waste addresses or break routing.

Here's the mental model: think of an IP address as a phone number. The area code is the network part, the local number is the host part. Routers only care about the area code to forward your call to the right exchange. Subnetting lets you create new area codes within a city — without it, every router would need to know every local number individually. That doesn't scale.

One more angle: subnetting also creates security boundaries. A router won't forward broadcast traffic across subnets. That means a misconfigured device can't flood your whole network with ARP requests if it's locked to its own subnet. That's a feature you'll appreciate after your first broadcast storm.

Let me tell you something I learned the hard way: when a mask is off by one bit, it's not just a little wrong — it's completely wrong. In production, a /25 instead of a /24 shifts the network boundary so that the default gateway becomes unreachable. The router sees your traffic as belonging to a different network and simply drops it. No error. No log entry. Just silence.

You might think you'll never make that mistake. But I've seen it three times in the last two years alone. Each time the engineer stared at the route table for hours before someone finally checked the subnet mask. Always verify the mask first.

io/thecodeforge/subnetting/SubnetDemo.java · JAVA
1234567891011121314151617181920212223
package io.thecodeforge.subnetting;

public class SubnetDemo {
    public static void main(String[] args) {
        String ip = "192.168.1.55";
        String mask = "255.255.255.0";
        String network = ipAndMask(ip, mask);
        System.out.println("Network: " + network);
    }

    static String ipAndMask(String ip, String mask) {
        String[] ipParts = ip.split("\.");
        String[] maskParts = mask.split("\.");
        StringBuilder result = new StringBuilder();
        for (int i = 0; i < 4; i++) {
            int ipPart = Integer.parseInt(ipParts[i]);
            int maskPart = Integer.parseInt(maskParts[i]);
            result.append(ipPart & maskPart);
            if (i < 3) result.append(".");
        }
        return result.toString();
    }
}
▶ Output
Network: 192.168.1.0
🔥Forge Tip:
Type this code yourself rather than copy-pasting. The muscle memory of writing it will help it stick.
📊 Production Insight
Understanding the difference between network and host bits is the single most common failure point in production routing.
Misaligned subnet masks cause silent blackholes — traffic is sent to a default gateway that cannot reach the destination.
Rule: always verify mask consistency on both ends of any routed link.
🎯 Key Takeaway
IP addressing uniquely identifies devices; subnetting carves address space into smaller routable blocks.
The network + host bit split is defined by the subnet mask — get it wrong and routing breaks.
Without subnetting, the global routing table would be unmanageable.
When to Use Subnetting
IfSingle flat network, < 100 devices
UseNo subnetting needed; use /24 directly.
IfMultiple departments or security zones
UseCreate subnets per department to enforce isolation.
IfPublic-facing vs private services
UseSeparate public and private subnets with NAT gateway.
IfLarge enterprise
UseUse hierarchical subnetting with route summarization.

CIDR Notation: How to Read and Calculate Hosts

CIDR (Classless Inter-Domain Routing) notation replaced the rigid classful system (A, B, C) back in the '90s. Instead of assuming network boundaries based on the first octet, you specify the prefix length explicitly: 192.168.1.0/24 means the first 24 bits are the network prefix, and the remaining 8 bits are host bits. That gives you 2^8 = 256 total addresses, but you lose two: the network address (all host bits 0) and the broadcast address (all host bits 1). So usable hosts = 2^(32 - prefix) - 2. For /24, that's 254. For /16, it's 65534. For /28, it's 14 — way too small for most production workloads.

The formula is simple, but the production trap is thinking that 'size' means usable hosts. I've seen teams provision a /28 for an API service that needed 20 IPs per AZ, then scramble to redesign after the launch failed. Always add 20-30% buffer.

CIDR also enabled route aggregation (supernetting), which dramatically shrinks the global routing table. Before CIDR, the internet was running out of routes. Now, a single /8 aggregate can represent millions of addresses.

Here's a quick way to estimate: for any /X, usable hosts ≈ 2^(32-X). For /24, that's ~250. For /23, ~500. For /22, ~1000. The pattern doubles each time you reduce the prefix by 1. So /16 gives ~65000. That's your mental shortcut.

Another common mistake: confusing the CIDR notation with the subnet mask. When someone says "the CIDR is 255.255.255.0", they mean the mask, not the prefix. CIDR notation is /24. Keep that straight in team discussions.

A production-grade tip: always document your CIDR blocks in a central spreadsheet or IPAM tool. I've seen teams waste hours because they didn't know which /24 was already used. Automation is your friend here.

One more thing: in cloud environments, subnet sizes are often limited by the provider's reserved addresses. In AWS, every subnet loses 5 IPs, not 2. So a /28 gives you only 11 usable IPs. Your 14-host formula is wrong for AWS. Always check the cloud provider's documentation.

And don't forget about overlapping CIDRs: if you accidentally assign the same /24 to two subnets, routing chaos follows. Use a central IPAM tool to prevent that.

io/thecodeforge/subnetting/cidr_calculator.py · PYTHON
12345678910
import ipaddress

def cidr_info(cidr: str) -> dict:
    net = ipaddress.IPv4Network(cidr, strict=False)
    return {'network_address': str(net.network_address), 'broadcast_address': str(net.broadcast_address), 'netmask': str(net.netmask), 'prefix_length': net.prefixlen, 'total_addresses': net.num_addresses, 'usable_hosts': net.num_addresses - 2}

# Usage:
for cidr in ['192.168.1.0/24', '10.0.0.0/16', '172.16.0.0/28']:
    info = cidr_info(cidr)
    print(f'{cidr} -> {info["usable_hosts"]} usable hosts')
Mental Model
Mental Model: CIDR as a Sliding Window
Think of the IP address as a 32-bit binary string. The CIDR prefix length is a sliding window that controls how many bits belong to the network.
  • The network bits are fixed and define the neighbourhood.
  • The host bits are variable — they define the specific house.
  • Shorter prefix (/16) = more hosts, fewer networks.
  • Longer prefix (/28) = fewer hosts, more networks — useful for point-to-point links.
  • Each reduction in prefix by 1 doubles the number of hosts.
📊 Production Insight
I once saw a team use /28 for a Kubernetes node subnet — they hit IP exhaustion in three days because each node consumes one IP plus pods get IPs from the same block.
Always choose a /24 as the default for any subnet that might grow.
Rule: when in doubt, pick /24 — it fits most workloads and leaves room to breathe.
🎯 Key Takeaway
Hosts = 2^(32 - prefix) - 2.
The -2 is non-negotiable: network and broadcast addresses cannot be assigned.
Plan for 30% growth — running out of IPs in a subnet is a production incident you can avoid.
Choosing the Right CIDR Size
IfSingle point-to-point link (two devices)
UseUse /30 or /31 — gives 2 or 0 usable addresses respectively (RFC 3021 allows /31 for PtP)
IfSmall production service (< 50 IPs needed)
UseUse /26 (62 usable) or /25 (126 usable) — leave 30% headroom
IfStandard tier (up to 200 IPs)
UseUse /24 (254 usable) — the industry standard for most subnets
IfLarge subnet (hundreds of IPs, e.g., private network)
UseUse /20 (4094 usable) or larger — but avoid /8 unless you're a large enterprise

Subnet Masks: Binary and Decimal

The subnet mask is a 32-bit number that, in binary, has a contiguous block of 1s for the network portion followed by 0s for the host portion. The dotted decimal representation (e.g., 255.255.255.0) is just a human-friendly way to write those 32 bits. Convert each octet to decimal and you get the familiar mask. /24 = 255.255.255.0; /16 = 255.255.0.0; /8 = 255.0.0.0.

But here's the production trap: you can't always trust the dotted decimal. I've seen config files where someone typed 255.255.254.0 expecting a /23, but a typo gave 255.255.240.0 (/20) — the device accepted it but routing broke silently because the network addresses changed. Always cross-verify the binary representation, especially when editing configs manually.

Let's walk through an example: IP 192.168.1.55 with mask 255.255.255.0. Binary: 11000000.10101000.00000001.00110111. AND with mask gives 11000000.10101000.00000001.00000000 = 192.168.1.0 (network). The host part is 00110111 = 55. If mask were 255.255.254.0, the network would be 192.168.0.0, and 192.168.1.55 would be part of that network — completely different routing behaviour.

A quick way to convert a mask to binary: for each octet, subtract from 255 to get the number that matters. For /23, the mask is 255.255.254.0; the third octet is 254, which is 11111110 in binary, meaning 7 bits for network in that octet, 1 bit for host. That 1 bit gives you 2^1 = 2 networks? No, /23 gives 512 addresses total. It's easier to think in prefix length.

Another trap: a non-contiguous subnet mask like 255.128.128.0 is invalid. The binary must be a continuous run of 1s from the left. Always check with ipcalc if you're unsure.

One more nuance: some legacy systems use "wildcard masks" (inverse masks) for OSPF or ACLs. That's the bitwise NOT of the subnet mask. Don't confuse them. A wildcard of 0.0.0.255 matches a /24 network, but it's written as inverted bits.

Here's a story from the field: a colleague once typed 255.255.255.255 by accident (a /32) instead of 255.255.255.0 for an interface. The interface came up but no traffic could reach the subnet — the router thought the whole /8 was its own host. It took three hours to find the typo. Always use automation to validate masks.

Another quick sanity check: if you see a mask like 255.255.256.0, that's invalid because 256 is out of range. Catch those before they hit production.

io/thecodeforge/subnetting/mask_utils.py · PYTHON
123456789101112131415161718192021222324
import ipaddress

def mask_to_cidr(mask: str) -> int:
    """Convert dotted decimal mask to CIDR prefix length."""
    net = ipaddress.IPv4Network(f'0.0.0.0/{mask}', strict=False)
    return net.prefixlen

def cidr_to_mask(prefix: int) -> str:
    """Convert CIDR prefix to dotted decimal mask."""
    return str(ipaddress.IPv4Network(f'0.0.0.0/{prefix}', strict=False).netmask)

# Examples
print(mask_to_cidr('255.255.255.0'))   # 24
print(cidr_to_mask(16))                # 255.255.0.0

def validate_mask(mask: str) -> bool:
    """Check if mask is contiguous"""
    import re
    binary = ''.join(f'{int(octet):08b}' for octet in mask.split('.'))
    # mask must be contiguous ones followed by zeros
    return re.match(r'^1+0+$', binary) is not None

print(validate_mask('255.255.255.0'))  # True
print(validate_mask('255.128.128.0'))  # False
⚠ Warning: Mask Mismatch Breaks Routing
Two devices on the same wire must agree on the subnet mask. If one uses /24 and the other /25, they will disagree on whether an IP is local or remote, causing packets to be sent to the default gateway even for neighbours that are directly connected. This is a classic silent failure that only shows up as dropped pings.
📊 Production Insight
When debugging an inter-VLAN routing issue, the first thing to check is mask consistency on both ends.
I once spent four hours chasing a routing table problem that turned out to be a /23 mask on one router and /24 on its peer.
Rule: after any interface config change, run show ip interface (Cisco) or ip addr show (Linux) and verify the mask matches the documentation.
🎯 Key Takeaway
Subnet mask defines the network boundary.
Both ends must agree — mismatch causes silent packet drops.
Always validate masks in binary or use automated tools to avoid typos.
Debugging Mask Mismatch
IfTwo VMs on same VLAN cannot ping each other
UseCheck subnet mask on both VMs. If different, they see different network boundaries. Set both to the same mask.
IfVM can ping gateway but not other hosts on same subnet
UseGateway might have a wrong mask or the routing table might be overriding the directly connected route. Check route -n (Linux).

Private IP Ranges and RFC 1918: Why 10.0.0.0/8, 172.16.0.0/12, and 192.168.0.0/16 Are Everywhere

RFC 1918 reserves three blocks of IPv4 addresses for private use: 10.0.0.0/8 (16.7 million addresses), 172.16.0.0/12 (1 million), and 192.168.0.0/16 (65,536). These addresses are not routable on the public internet — they're meant for internal networks. That's why every home router uses 192.168.x.x, and every AWS VPC uses 10.x.x.x or 172.16.x.x.

The choice between them is about scale. 10.0.0.0/8 is huge — you can build a sprawling enterprise network without overlapping. 172.16.0.0/12 is good for medium-sized orgs. 192.168.0.0/16 is tiny and often leads to collisions when companies merge or need to peer VPCs. Production lesson: never use 192.168.0.0/16 for a corporate network — you'll hit address collisions the moment you need to connect to a partner or acquire another company.

You can also use public IP ranges internally if you control them (uncommon). But the standard practice is to pick a /16 from the 10.x range for your VPC and subnet from there. This gives you flexibility and avoids the RFC 1918 collision risk that 192.168 brings.

One more thing: don't forget about RFC 6598 (Carrier-Grade NAT space: 100.64.0.0/10). This is used by ISPs for CGNAT, but you might encounter it in shared environments. Avoid using it internally unless you're building an ISP network.

Also note: just because you can't route private IPs on the internet doesn't mean they can't be leaked. Misconfigured BGP can advertise private ranges. Always filter outbound routes to your upstream provider.

A real-world story: A startup used 192.168.0.0/16 for their entire infrastructure. When they tried to connect to a customer's VPN that also used 192.168.0.0/16, routing fell apart. They had to re-IP their whole network over a weekend. Don't be that team.

Another lesson: when you acquire a company, the first thing to check is their private IP range. If you both use 10.0.0.0/16, you'll need to re-address one side or use NAT. That's expensive and error-prone. Plan ahead.

Here's a quick tip: if you're designing a multi-cloud environment, use a different /16 for each cloud provider. That way, peering between clouds won't cause conflicts.

io/thecodeforge/subnetting/rfc1918_check.py · PYTHON
12345678910111213141516
import ipaddress

def is_private(ip_cidr: str) -> bool:
    """Check if a CIDR block is within RFC 1918 private ranges."""
    net = ipaddress.IPv4Network(ip_cidr, strict=False)
    private_ranges = [\n        ipaddress.IPv4Network('10.0.0.0/8'),\n        ipaddress.IPv4Network('172.16.0.0/12'),\n        ipaddress.IPv4Network('192.168.0.0/16'),\n    ]
    return any(net.subnet_of(pr) for pr in private_ranges)

def is_cgnat(ip_cidr: str) -> bool:
    net = ipaddress.IPv4Network(ip_cidr, strict=False)
    cgnat = ipaddress.IPv4Network('100.64.0.0/10')
    return net.subnet_of(cgnat)

# Examples
print(is_private('10.0.1.0/24'))       # True
print(is_cgnat('100.64.1.0/24'))       # True
💡AWS VPC Default CIDR
When you create a default VPC in AWS, it uses 172.31.0.0/16 — that's from the 172.16.0.0/12 private range. It's a valid choice, but if you later need to peer with another VPC using 172.31.0.0/16, you're stuck. Always plan your private range allocation centrally.
📊 Production Insight
A common failure: two divisions of the same company both used 192.168.1.0/24 for their internal networks. When they tried to interconnect via VPN, routing broke because the same IPs existed on both sides.
Solution: Use NAT on one side or re-address one network — both are painful.
Rule: reserve a unique /16 from 10.0.0.0/8 for each business unit or environment.
🎯 Key Takeaway
Private IP ranges are free to use internally but must never leak to the internet.
Choose 10.0.0.0/8 for flexibility; avoid 192.168.0.0/16 for corporate networks.
Plan your private address allocation globally to prevent collision-induced routing nightmares.
Choosing a Private Range
IfSmall office or home network (< 100 devices)
UseUse 192.168.x.x /24 — simple, widely supported.
IfCorporate network with multiple sites
UseUse 10.x.x.x with a /16 per site — plenty of room and avoids overlap.
IfCloud VPC for a startup (single region)
UseUse 10.0.0.0/16 — leaves room for expansion, works with any cloud.
IfEnterprise with multi-cloud / hybrid
UseUse 10.x.x.x with a global /8 allocation plan — coordinate centralised to avoid overlap.

Designing Subnets in AWS VPC: A Real-World Example

Let's design a VPC for a typical three-tier web application. We'll use the private IPv4 range 10.0.0.0/16 (65534 usable addresses). We need: - Public subnets for load balancers and NAT gateways (at least 2 AZs, small) - Private subnets for application servers (more IPs needed for scaling) - Database subnets (locked down, no internet access)

Best practice is to allocate contiguous blocks to keep routing simple. Here's a sample design: - Public: 10.0.1.0/24 (us-east-1a), 10.0.2.0/24 (us-east-1b) — 254 IPs each - App: 10.0.10.0/23 (512 IPs), 10.0.12.0/23 — enough for auto-scaling groups - DB: 10.0.20.0/24, 10.0.21.0/24 — RDS takes one IP per instance plus Multi-AZ

Notice we left gaps (10.0.3.0-9.0) for future use. That's the planning rule: never fill a VPC completely. Leave at least 30% address space unallocated. Production lesson: I once saw a VPC with 90% utilisation because someone allocated 10.0.0.0/16 into /24s end-to-end. When a new service needed a new subnet, they had to rebuild the VPC.

A pro tip: use this same design pattern in AWS by creating subnets with explicit CIDR blocks in CloudFormation or Terraform. Validate that no two subnets overlap and that all are within the VPC CIDR.

One more nuance: AWS reserves 5 IPs per subnet, not just 3 as commonly thought. For a /24, you lose .0 (network), .1 (router), .2 (DNS), .3 (future), and .255 (broadcast). That's 5 IPs gone, so you really have 251 usable, not 254. Factor that into your capacity planning.

Also note: when you use a NAT Gateway in a public subnet, it consumes an Elastic IP and one usable IP from that subnet. Make sure your public subnets have enough headroom for both NAT Gateways and future ALB/NLBs.

A lesson from the field: I've seen teams run out of IPs in their app subnet because they didn't account for the fact that each pod in EKS gets its own VPC IP. A /24 supports 251 pods — fine for small clusters, but a production cluster can blow through that in days. Use a /20 for pod subnets.

Another trap: when you create a VPC, you must also consider the CIDR for future peering. If you use 10.0.0.0/16 and later peer with another VPC that also uses 10.0.0.0/16, you'll have overlapping CIDRs and peering will be impossible. Plan a larger /8 or use different /16s for different environments.

And don't forget about the bastion host: if you need to SSH into private instances, you'll need a jump box in a public subnet. That public subnet should be sized to allow at least one EC2 instance plus the NAT Gateway.

io/thecodeforge/subnetting/vpc_designer.py · PYTHON
12345678910111213141516171819202122
import ipaddress

def generate_subnets(vpc_cidr: str,
        subnet_cidrs: list) -> list:
    vpc = ipaddress.IPv4Network(vpc_cidr, strict=False)
    subnets = [ipaddress.IPv4Network(c, strict=False) for c in subnet_cidrs]
    # Validate all subnets are within VPC and non-overlapping
    for s in subnets:
        if not vpc.supernet_of(s):
            raise ValueError(f'{s} is not within {vpc_cidr}')
    for i, s1 in enumerate(subnets):
        for s2 in subnets[i+1:]:
            if s1.overlaps(s2):
                raise ValueError(f'{s1} overlaps with {s2}')
    return [str(s) for s in subnets]

# Example design
design = generate_subnets(
    '10.0.0.0/16',
    ['10.0.1.0/24', '10.0.2.0/24', '10.0.10.0/23', '10.0.12.0/23', '10.0.20.0/24', '10.0.21.0/24']
)
print('Valid design:', design)
🔥AWS VPC IP Reservation
AWS reserves the first four IP addresses and the last one in every subnet. For a /24, they occupy 10.0.1.0, .1, .2, .3 and .255. So you really have only 251 usable IPs (254 - 5). Factor this into sizing; a /28 with 16 total minus 5 leaves only 11 usable IPs — barely enough for a single application tier.
📊 Production Insight
The most common subnet design failure in AWS is running out of IPs in a subnet because growth wasn't forecast.
A /28 might seem fine for a Proof-of-Concept, but once it goes live with auto-scaling, you'll exhaust it within a week.
Rule: always allocate subnets with at least 3x your initial estimated need — IPs are free, subnet redesigns are not.
🎯 Key Takeaway
Design subnets with 3x headroom.
AWS reserves 5 IPs per subnet — account for that.
Leave large gaps in the VPC CIDR for future services.
Choosing Subnet Size for Tiers
IfPublic subnet with NAT Gateway and a few instances
UseUse /24 — ensures enough IPs for NAT, ELB, and a small number of EC2s
IfApplication subnet with auto-scaling (up to 100 instances)
UseUse /23 (512 IPs) to leave room for peak scaling. /24 can run out during deployments.
IfDatabase subnet with RDS Multi-AZ
UseUse /24 — each DB cluster consumes 1 IP, plus you may need replica instances. Rarely needs more.

Common Subnetting Mistakes and How to Fix Them

After years of debugging network problems, I've seen the same patterns over and over. Here are the top three:

  1. Overlapping subnets: When two subnets in different VPCs (or the same VPC!) overlap, routing becomes unpredictable. The router doesn't know which is the correct destination. In VPC peering, AWS rejects overlapping CIDRs entirely.
  2. Wrong gateway IP: The default gateway is not always the first usable IP. In AWS, the first IP (.1) is the VPC router, but in on-premises networks, the gateway might be .254 or something else. Hardcoding .1 as gateway is a common mistake when migrating from cloud to on-prem.
  3. Forgetting the broadcast address: Some applications accidentally use the broadcast address as a host IP. When that happens, traffic to that 'host' floods the entire subnet, causing performance issues and mysterious packet loss.

These are the mistakes that cause 'can't reproduce in dev' incidents. Always validate your subnet plan with automation.

Another mistake: using non-contiguous mask bits (e.g., 255.255.255.128 is fine because it's contiguous, but a mask like 255.128.128.0 is invalid). Always ensure the binary mask is a continuous string of 1s followed by 0s.

One more trap: using a default subnet size without thinking about the service requirements. I've seen teams use /24 for a point-to-point VPN link, wasting 252 IPs. Use /30 or /31 for those links to conserve address space.

A hidden mistake: forgetting that subnets need to be sized for high availability. In AWS, if you lose an Availability Zone, the remaining AZ must handle all traffic. That means your subnet in the surviving AZ must have enough IPs to accommodate all instances. Plan for AZ failure — not just normal operation.

Here's a real one: a team used overlapping subnets in two different VPCs and then peered them. The peering succeeded (because AWS only checks overlap at peering time for certain scenarios), but traffic was intermittently blackholed because the routing table couldn't decide which /24 to use. The fix involved tearing down the peering and redesigning one VPC's CIDR.

Also worth mentioning: when using Terraform, you can avoid overlap with cidrsubnet function and proper variable management. Always use a validation step before apply.

io/thecodeforge/subnetting/validation.py · PYTHON
123456789101112131415161718192021222324
import ipaddress

def validate_subnet_design(subnets: list[str]):
    """
    Validate a list of CIDR subnets for common mistakes.
    Returns list of issues found.
    """
    issues = []
    nets = [ipaddress.IPv4Network(s, strict=False) for s in subnets]
    # Check for overlaps
    for i, n1 in enumerate(nets):
        for n2 in nets[i+1:]:
            if n1.overlaps(n2):
                issues.append(f'Overlap: {n1} and {n2}')
    # Check for broadcast usage
    # This is a simplified check: flag if any host address is the broadcast
    for net in nets:
        bcast = str(net.broadcast_address)
        # In real code check against actual IP assignments
    return issues if issues else ['Design is valid']

# Example
design = ['10.0.1.0/24', '10.0.2.0/24', '10.0.2.128/26']  # last one overlaps
print(validate_subnet_design(design))
💡Pro Tip: Automate Validation
Before deploying any subnet configuration, run it through a validation script. Tools like ipcalc, subnetcalc, or Python's ipaddress module can catch overlaps, wrong sizes, or misaligned boundaries before they become production incidents.
📊 Production Insight
A classic silent failure: two microservices deployed in overlapping subnets within the same VPC. Traffic between them worked intermittently because the VPC router used longest prefix match, but the application code made assumptions about specific IPs.
Debugging took two days and involved packet traces.
Rule: never assume subnets are non-overlapping — always validate programmatically.
🎯 Key Takeaway
Overlapping subnets cause unpredictable routing.
Never hardcode gateway IPs — use DHCP or cloud metadata.
Automate subnet validation as part of your CI/CD pipeline.
Mistake Severity Decision Tree
IfSubnet overlap detected
UseImmediate re-address; routing is unpredictable.
IfMask mismatch between peers
UseFix mask on one side, verify both ends.
IfSubnet too small (IP exhaustion)
UseCreate larger subnet, migrate resources, plan headroom.
IfNon-contiguous mask used
UseInvalid configuration; replace with proper mask.

Subnetting for Kubernetes: Pod CIDR and Service CIDR

Kubernetes adds two more CIDR layers on top of your VPC: the pod CIDR and the service CIDR. Each node gets a slab of the pod CIDR (e.g., /24 per node), and each pod gets an IP from that node's slab. The service CIDR is a separate block used for ClusterIP services. These CIDRs must not overlap with each other or with the VPC CIDR. If they do, traffic routing breaks silently — pods can't reach services, or worse, traffic destined for a service IP goes to an unrelated VPC resource.

Production lesson: plan your cluster CIDRs before creating the cluster. If your VPC uses 10.0.0.0/16, you might set pod CIDR to 10.1.0.0/16 and service CIDR to 10.2.0.0/16. But watch out: if you have multiple clusters, each needs its own non-overlapping pod and service CIDRs. In AWS EKS, Amazon VPC CNI allows pods to receive VPC IPs, which can exhaust the subnet quickly. Use a dedicated /18 or larger for pods. In self-managed clusters, ensure the pod network plugin (Calico, Flannel) is configured with a CIDR that doesn't conflict with anything else.

Another trap: when using a service mesh like Istio, the mesh may require additional IP ranges. Always document all CIDR allocations upfront.

And don't forget about `kube-proxy` mode: if you use IPVS mode instead of iptables, the service CIDR is handled differently. The IPVS mode can handle more services, but it introduces its own quirks. Make sure your service CIDR doesn't overlap with your node CIDR.

A useful check: before creating a cluster, run a quick Python script (like the one below) to verify non-overlap of all three ranges.

One more production-grade tip: in EKS, the default maximum pods per node is calculated based on the node's primary IP limit. If you use a /24 for your pod subnet, you'll max out at around 250 pods per node, but EC2 instances have lower IP limits. Check the AWS docs for your instance type's max-pods before planning.

I once debugged a cluster where the pod CIDR overlapped with the VPC CIDR by just one bit. Pods trying to reach the API server at 10.0.0.1 were routed to a pod instead. It took two weeks to reproduce because the behaviour was intermittent — it only happened when a pod happened to have the same IP as the service.

If you're using Calico with IPIP encapsulation, you can avoid VPC CIDR conflicts by using a separate IP pool. That's a common solution for overlapping issues.

io/thecodeforge/subnetting/k8s_cidr_check.sh · BASH
1234567891011121314151617181920
# Check pod and service CIDRs in a k8s cluster using kubeadm
kubeadm config print init-defaults | grep -E 'podSubnet|serviceSubnet'

# Or check from a running cluster config
kubectl get configmap -n kube-system kube-proxy -o yaml | grep -E 'clusterCIDR|podCIDR'

# Check per-node pod CIDR
kubectl get nodes -o jsonpath='{.items[*].spec.podCIDR}' | tr ' ' '\n'

# Validate no overlap between VPC, pod, service CIDRs
python3 -c "
from ipaddress import ip_network
vpc = ip_network('10.0.0.0/16')
pod = ip_network('10.1.0.0/16')
svc = ip_network('10.2.0.0/16')
assert not vpc.overlaps(pod), 'VPC and pod CIDR overlap'
assert not vpc.overlaps(svc), 'VPC and service CIDR overlap'
assert not pod.overlaps(svc), 'Pod and service CIDR overlap'
print('All good: no overlap')
"
⚠ Warning: Overlapping CIDRs in Kubernetes
If your pod CIDR overlaps with your VPC CIDR, pods cannot communicate with services or external resources that fall within that overlapping range. Always assign non-overlapping blocks for VPC, pods, and services.
📊 Production Insight
I once joined a team where the pod CIDR (10.0.0.0/14) overlapped with the VPC CIDR (10.0.0.0/16). Pods trying to reach the DNS service at 10.0.0.10 couldn't tell if it was a pod or a VPC resource. Traffic was misrouted for weeks before someone noticed.
The fix: recreate the cluster with a non-overlapping pod CIDR. That meant data-loss risk and downtime.
Rule: never let pod, service, and VPC CIDRs overlap — triple-check before cluster creation.
🎯 Key Takeaway
Kubernetes adds pod and service CIDRs — they must not overlap with the VPC CIDR.
Plan all CIDRs before cluster creation; fixing later is painful.
Use dedicated, non-overlapping blocks for each cluster to avoid routing chaos.
Choosing K8s CIDR Blocks
IfVPC uses 10.0.0.0/16, single cluster
UseUse pod CIDR 10.1.0.0/16, service CIDR 10.2.0.0/16 — no overlap.
IfMultiple clusters in same VPC or peered VPCs
UseUse unique /16 per cluster, e.g., cluster1: pod=10.1.0.0/16, svc=10.2.0.0/16; cluster2: pod=10.3.0.0/16, svc=10.4.0.0/16.
IfUsing AWS VPC CNI (pods get VPC IPs)
UseAllocate dedicated subnets for pods, e.g., /18 each. Ensure total pod IPs don't exceed subnet size.
IfService mesh (e.g., Istio) requires extra IPs
UseReserve an additional /20 or /16 for mesh traffic, outside the pod and service CIDRs.

🎯 Key Takeaways

    🔥
    Naren Founder & Author

    Developer and founder of TheCodeForge. I built this site because I was tired of tutorials that explain what to type without explaining why it works. Every article here is written to make concepts actually click.

    ← PreviousDNS — Domain Name SystemNext →Routing Protocols
    Forged with 🔥 at TheCodeForge.io — Where Developers Are Forged