IP Subnetting - The /25 Mask That Broke Internet Access
A /25 subnet mask instead of /24 made EC2 instances unreachable from the internet gateway.
20+ years shipping production systems from the metal up. Lessons pulled from things that broke in production.
- IP addressing uniquely identifies devices; subnetting divides address space into smaller, routable blocks.
- CIDR notation (e.g., /24) replaces classful addressing and defines how many host bits you get.
- Hosts = 2^(32 - prefix) - 2 — the
-2is for network and broadcast addresses you cannot assign. - Production failure: a /25 instead of /24 changes the network boundary and can make your default gateway unreachable.
- Performance insight: each wrong bit in a subnet mask can route traffic to the wrong VLAN or blackhole it entirely.
- Biggest mistake: thinking subnetting is only about saving IPs. It's about routing — wrong mask, wrong network.
Think of the internet like a massive city with billions of houses. Every house needs a unique street address so mail can reach it — that's an IP address. But a city isn't just one giant street; it's divided into neighbourhoods, zip codes, and districts to keep things organised and efficient. Subnetting is exactly that: carving a big block of addresses into smaller neighbourhoods so traffic flows to the right place without chaos. Without it, your router would be like a postman trying to deliver to every house on Earth from a single sorting office.
Every packet that crosses your network — an API call, a database query, a Kubernetes pod talking to another — carries a source and destination IP. Without IP addressing, you're just shouting into the void. Get the subnet mask wrong and traffic doesn't just slow down. It stops.
IPv4 has about 4.3 billion addresses, and we ran out years ago. Subnetting, CIDR, and private ranges are the engineering fixes that made the internet keep working. They're baked into every VPC, every router config, every cloud environment you'll ever touch.
Here's the truth: most engineers won't calculate subnets by hand daily. But the one time you need to, a single wrong mask can silence an entire production fleet. That's why you need to understand it — not just pass a cert exam.
This guide covers CIDR math, binary masks, the /25 that broke internet access, and the Python commands that'll save you from subnet calculators forever.
The real cost of a misconfigured subnet isn't just wasted IPs – it's hours of debugging, missed SLAs, and sometimes a full incident post-mortem. That's why this guide focuses on what actually breaks and how to fix it fast.
What Subnetting Actually Does to Your Network
IP subnetting divides a single IP network into smaller, logically isolated segments. The core mechanic is borrowing host bits from the default subnet mask to create a network prefix that identifies each subnet. For example, a /24 network (255.255.255.0) with 256 addresses can be split into two /25 networks (255.255.255.128), each with 128 addresses. This reduces broadcast domains and improves routing efficiency.
Each subnet has a network address (all host bits zero), a broadcast address (all host bits one), and usable host addresses in between. A /25 subnet yields 126 usable addresses (128 minus 2). The subnet mask determines the boundary: any IP address AND its mask reveals the network address. Misconfiguring the mask by even one bit can silently isolate machines or create overlapping subnets that break routing.
Use subnetting when you need to segment traffic for security, performance, or IP conservation. In production, it's essential for VPC design, multi-tenant isolation, and controlling broadcast storms. Without proper subnet planning, you'll exhaust IP space or create routing black holes that are hard to debug.
CIDR Notation: How to Read and Calculate Hosts
CIDR (Classless Inter-Domain Routing) notation replaced the rigid classful system (A, B, C) back in the '90s. Instead of assuming network boundaries based on the first octet, you specify the prefix length explicitly: 192.168.1.0/24 means the first 24 bits are the network prefix, and the remaining 8 bits are host bits. That gives you 2^8 = 256 total addresses, but you lose two: the network address (all host bits 0) and the broadcast address (all host bits 1). So usable hosts = 2^(32 - prefix) - 2. For /24, that's 254. For /16, it's 65534. For /28, it's 14 — way too small for most production workloads.
The formula is simple, but the production trap is thinking that 'size' means usable hosts. I've seen teams provision a /28 for an API service that needed 20 IPs per AZ, then scramble to redesign after the launch failed. Always add 20-30% buffer.
CIDR also enabled route aggregation (supernetting), which dramatically shrinks the global routing table. Before CIDR, the internet was running out of routes. Now, a single /8 aggregate can represent millions of addresses.
Here's a quick way to estimate: for any /X, usable hosts ≈ 2^(32-X). For /24, that's ~250. For /23, ~500. For /22, ~1000. The pattern doubles each time you reduce the prefix by 1. So /16 gives ~65000. That's your mental shortcut.
Another common mistake: confusing the CIDR notation with the subnet mask. When someone says "the CIDR is 255.255.255.0", they mean the mask, not the prefix. CIDR notation is /24. Keep that straight in team discussions.
A production-grade tip: always document your CIDR blocks in a central spreadsheet or IPAM tool. I've seen teams waste hours because they didn't know which /24 was already used. Automation is your friend here.
One more thing: in cloud environments, subnet sizes are often limited by the provider's reserved addresses. In AWS, every subnet loses 5 IPs, not 2. So a /28 gives you only 11 usable IPs. Your 14-host formula is wrong for AWS. Always check the cloud provider's documentation.
And don't forget about overlapping CIDRs: if you accidentally assign the same /24 to two subnets, routing chaos follows. Use a central IPAM tool to prevent that.
- The network bits are fixed and define the neighbourhood.
- The host bits are variable — they define the specific house.
- Shorter prefix (/16) = more hosts, fewer networks.
- Longer prefix (/28) = fewer hosts, more networks — useful for point-to-point links.
- Each reduction in prefix by 1 doubles the number of hosts.
-2 is non-negotiable: network and broadcast addresses cannot be assigned.Subnet Masks: Binary and Decimal
The subnet mask is a 32-bit number that, in binary, has a contiguous block of 1s for the network portion followed by 0s for the host portion. The dotted decimal representation (e.g., 255.255.255.0) is just a human-friendly way to write those 32 bits. Convert each octet to decimal and you get the familiar mask. /24 = 255.255.255.0; /16 = 255.255.0.0; /8 = 255.0.0.0.
But here's the production trap: you can't always trust the dotted decimal. I've seen config files where someone typed 255.255.254.0 expecting a /23, but a typo gave 255.255.240.0 (/20) — the device accepted it but routing broke silently because the network addresses changed. Always cross-verify the binary representation, especially when editing configs manually.
Let's walk through an example: IP 192.168.1.55 with mask 255.255.255.0. Binary: 11000000.10101000.00000001.00110111. AND with mask gives 11000000.10101000.00000001.00000000 = 192.168.1.0 (network). The host part is 00110111 = 55. If mask were 255.255.254.0, the network would be 192.168.0.0, and 192.168.1.55 would be part of that network — completely different routing behaviour.
A quick way to convert a mask to binary: for each octet, subtract from 255 to get the number that matters. For /23, the mask is 255.255.254.0; the third octet is 254, which is 11111110 in binary, meaning 7 bits for network in that octet, 1 bit for host. That 1 bit gives you 2^1 = 2 networks? No, /23 gives 512 addresses total. It's easier to think in prefix length.
Another trap: a non-contiguous subnet mask like 255.128.128.0 is invalid. The binary must be a continuous run of 1s from the left. Always check with ipcalc if you're unsure.
One more nuance: some legacy systems use "wildcard masks" (inverse masks) for OSPF or ACLs. That's the bitwise NOT of the subnet mask. Don't confuse them. A wildcard of 0.0.0.255 matches a /24 network, but it's written as inverted bits.
Here's a story from the field: a colleague once typed 255.255.255.255 by accident (a /32) instead of 255.255.255.0 for an interface. The interface came up but no traffic could reach the subnet — the router thought the whole /8 was its own host. It took three hours to find the typo. Always use automation to validate masks.
Another quick sanity check: if you see a mask like 255.255.256.0, that's invalid because 256 is out of range. Catch those before they hit production.
show ip interface (Cisco) or ip addr show (Linux) and verify the mask matches the documentation.route -n (Linux).Private IP Ranges and RFC 1918: Why 10.0.0.0/8, 172.16.0.0/12, and 192.168.0.0/16 Are Everywhere
RFC 1918 reserves three blocks of IPv4 addresses for private use: 10.0.0.0/8 (16.7 million addresses), 172.16.0.0/12 (1 million), and 192.168.0.0/16 (65,536). These addresses are not routable on the public internet — they're meant for internal networks. That's why every home router uses 192.168.x.x, and every AWS VPC uses 10.x.x.x or 172.16.x.x.
The choice between them is about scale. 10.0.0.0/8 is huge — you can build a sprawling enterprise network without overlapping. 172.16.0.0/12 is good for medium-sized orgs. 192.168.0.0/16 is tiny and often leads to collisions when companies merge or need to peer VPCs. Production lesson: never use 192.168.0.0/16 for a corporate network — you'll hit address collisions the moment you need to connect to a partner or acquire another company.
You can also use public IP ranges internally if you control them (uncommon). But the standard practice is to pick a /16 from the 10.x range for your VPC and subnet from there. This gives you flexibility and avoids the RFC 1918 collision risk that 192.168 brings.
One more thing: don't forget about RFC 6598 (Carrier-Grade NAT space: 100.64.0.0/10). This is used by ISPs for CGNAT, but you might encounter it in shared environments. Avoid using it internally unless you're building an ISP network.
Also note: just because you can't route private IPs on the internet doesn't mean they can't be leaked. Misconfigured BGP can advertise private ranges. Always filter outbound routes to your upstream provider.
A real-world story: A startup used 192.168.0.0/16 for their entire infrastructure. When they tried to connect to a customer's VPN that also used 192.168.0.0/16, routing fell apart. They had to re-IP their whole network over a weekend. Don't be that team.
Another lesson: when you acquire a company, the first thing to check is their private IP range. If you both use 10.0.0.0/16, you'll need to re-address one side or use NAT. That's expensive and error-prone. Plan ahead.
Here's a quick tip: if you're designing a multi-cloud environment, use a different /16 for each cloud provider. That way, peering between clouds won't cause conflicts.
Designing Subnets in AWS VPC: A Real-World Example
Let's design a VPC for a typical three-tier web application. We'll use the private IPv4 range 10.0.0.0/16 (65534 usable addresses). We need: - Public subnets for load balancers and NAT gateways (at least 2 AZs, small) - Private subnets for application servers (more IPs needed for scaling) - Database subnets (locked down, no internet access)
Best practice is to allocate contiguous blocks to keep routing simple. Here's a sample design: - Public: 10.0.1.0/24 (us-east-1a), 10.0.2.0/24 (us-east-1b) — 254 IPs each - App: 10.0.10.0/23 (512 IPs), 10.0.12.0/23 — enough for auto-scaling groups - DB: 10.0.20.0/24, 10.0.21.0/24 — RDS takes one IP per instance plus Multi-AZ
Notice we left gaps (10.0.3.0-9.0) for future use. That's the planning rule: never fill a VPC completely. Leave at least 30% address space unallocated. Production lesson: I once saw a VPC with 90% utilisation because someone allocated 10.0.0.0/16 into /24s end-to-end. When a new service needed a new subnet, they had to rebuild the VPC.
A pro tip: use this same design pattern in AWS by creating subnets with explicit CIDR blocks in CloudFormation or Terraform. Validate that no two subnets overlap and that all are within the VPC CIDR.
One more nuance: AWS reserves 5 IPs per subnet, not just 3 as commonly thought. For a /24, you lose .0 (network), .1 (router), .2 (DNS), .3 (future), and .255 (broadcast). That's 5 IPs gone, so you really have 251 usable, not 254. Factor that into your capacity planning.
Also note: when you use a NAT Gateway in a public subnet, it consumes an Elastic IP and one usable IP from that subnet. Make sure your public subnets have enough headroom for both NAT Gateways and future ALB/NLBs.
A lesson from the field: I've seen teams run out of IPs in their app subnet because they didn't account for the fact that each pod in EKS gets its own VPC IP. A /24 supports 251 pods — fine for small clusters, but a production cluster can blow through that in days. Use a /20 for pod subnets.
Another trap: when you create a VPC, you must also consider the CIDR for future peering. If you use 10.0.0.0/16 and later peer with another VPC that also uses 10.0.0.0/16, you'll have overlapping CIDRs and peering will be impossible. Plan a larger /8 or use different /16s for different environments.
And don't forget about the bastion host: if you need to SSH into private instances, you'll need a jump box in a public subnet. That public subnet should be sized to allow at least one EC2 instance plus the NAT Gateway.
Common Subnetting Mistakes and How to Fix Them
After years of debugging network problems, I've seen the same patterns over and over. Here are the top three:
- Overlapping subnets: When two subnets in different VPCs (or the same VPC!) overlap, routing becomes unpredictable. The router doesn't know which is the correct destination. In VPC peering, AWS rejects overlapping CIDRs entirely.
- Wrong gateway IP: The default gateway is not always the first usable IP. In AWS, the first IP (.1) is the VPC router, but in on-premises networks, the gateway might be .254 or something else. Hardcoding .1 as gateway is a common mistake when migrating from cloud to on-prem.
- Forgetting the broadcast address: Some applications accidentally use the broadcast address as a host IP. When that happens, traffic to that 'host' floods the entire subnet, causing performance issues and mysterious packet loss.
These are the mistakes that cause 'can't reproduce in dev' incidents. Always validate your subnet plan with automation.
Another mistake: using non-contiguous mask bits (e.g., 255.255.255.128 is fine because it's contiguous, but a mask like 255.128.128.0 is invalid). Always ensure the binary mask is a continuous string of 1s followed by 0s.
One more trap: using a default subnet size without thinking about the service requirements. I've seen teams use /24 for a point-to-point VPN link, wasting 252 IPs. Use /30 or /31 for those links to conserve address space.
A hidden mistake: forgetting that subnets need to be sized for high availability. In AWS, if you lose an Availability Zone, the remaining AZ must handle all traffic. That means your subnet in the surviving AZ must have enough IPs to accommodate all instances. Plan for AZ failure — not just normal operation.
Here's a real one: a team used overlapping subnets in two different VPCs and then peered them. The peering succeeded (because AWS only checks overlap at peering time for certain scenarios), but traffic was intermittently blackholed because the routing table couldn't decide which /24 to use. The fix involved tearing down the peering and redesigning one VPC's CIDR.
Also worth mentioning: when using Terraform, you can avoid overlap with cidrsubnet function and proper variable management. Always use a validation step before apply.
ipcalc, subnetcalc, or Python's ipaddress module can catch overlaps, wrong sizes, or misaligned boundaries before they become production incidents.Subnetting for Kubernetes: Pod CIDR and Service CIDR
Kubernetes adds two more CIDR layers on top of your VPC: the pod CIDR and the service CIDR. Each node gets a slab of the pod CIDR (e.g., /24 per node), and each pod gets an IP from that node's slab. The service CIDR is a separate block used for ClusterIP services. These CIDRs must not overlap with each other or with the VPC CIDR. If they do, traffic routing breaks silently — pods can't reach services, or worse, traffic destined for a service IP goes to an unrelated VPC resource.
Production lesson: plan your cluster CIDRs before creating the cluster. If your VPC uses 10.0.0.0/16, you might set pod CIDR to 10.1.0.0/16 and service CIDR to 10.2.0.0/16. But watch out: if you have multiple clusters, each needs its own non-overlapping pod and service CIDRs. In AWS EKS, Amazon VPC CNI allows pods to receive VPC IPs, which can exhaust the subnet quickly. Use a dedicated /18 or larger for pods. In self-managed clusters, ensure the pod network plugin (Calico, Flannel) is configured with a CIDR that doesn't conflict with anything else.
Another trap: when using a service mesh like Istio, the mesh may require additional IP ranges. Always document all CIDR allocations upfront.
And don't forget about `kube-proxy` mode: if you use IPVS mode instead of iptables, the service CIDR is handled differently. The IPVS mode can handle more services, but it introduces its own quirks. Make sure your service CIDR doesn't overlap with your node CIDR.
A useful check: before creating a cluster, run a quick Python script (like the one below) to verify non-overlap of all three ranges.
One more production-grade tip: in EKS, the default maximum pods per node is calculated based on the node's primary IP limit. If you use a /24 for your pod subnet, you'll max out at around 250 pods per node, but EC2 instances have lower IP limits. Check the AWS docs for your instance type's max-pods before planning.
I once debugged a cluster where the pod CIDR overlapped with the VPC CIDR by just one bit. Pods trying to reach the API server at 10.0.0.1 were routed to a pod instead. It took two weeks to reproduce because the behaviour was intermittent — it only happened when a pod happened to have the same IP as the service.
If you're using Calico with IPIP encapsulation, you can avoid VPC CIDR conflicts by using a separate IP pool. That's a common solution for overlapping issues.
Subnetting in Hybrid Cloud: Avoiding Overlap with On-Premises
When you connect an on-premises network to a cloud VPC via VPN or Direct Connect, the biggest risk is CIDR overlap. If both sides use 10.0.0.0/16, traffic to any 10.x.x.x address is ambiguous — does it go on-prem or to the cloud? This causes asymmetric routing, dropped packets, and hours of debugging.
The fix: before any hybrid connection, audit both sides' CIDR allocations. Assign a unique /16 from 10.0.0.0/8 to each environment (e.g., on-prem gets 10.1.0.0/16, cloud-prod gets 10.2.0.0/16, cloud-dev gets 10.3.0.0/16). If overlap is unavoidable, use NAT to translate overlapping addresses at the boundary.
A real-world example: a company with a 10.0.0.0/16 on-prem wanted to migrate to AWS using a Direct Connect. They created an AWS VPC with 10.0.0.0/16. The Direct Connect failed to establish. They had to re-IP the entire on-prem network to 10.1.0.0/16 — a multi-month project.
Another approach: use RFC 6598 (Carrier-Grade NAT) addresses for one side if you control both ends, but that's rare. Most enterprises stick to RFC 1918 and manage with careful planning.
Tooling: maintain a central IP address management (IPAM) system that tracks all CIDR blocks across on-prem and cloud. Tools like phpIPAM, NetBox, or even a spreadsheet with validation scripts can prevent overlaps before they happen.
Final tip: always include a 'last mile' step in your migration plan that validates no overlap between the new cloud CIDR and existing on-prem CIDRs. A simple Python script can save you weeks of rework.
Why Subnetting Exists: The Real Reason Your Network Is Not a Flat Parking Lot
Imagine a single network with 200 machines. A printer sends a broadcast. Every single NIC on that wire wakes up, processes the packet, then goes back to sleep. That's a flat network. It works until it doesn't. Subnetting carves your broadcast domain into pieces. Each subnet is its own broadcast island. Traffic stays local unless a router explicitly forwards it. You get three things: containment, isolation, and efficiency. Containment stops broadcast storms from taking down the entire org. Isolation means HR can't accidentally ping the production database. Efficiency means you stop wasting addresses on a classful scheme that doesn't fit your real headcount. The 'Need of Subnetting' competitors mention is this: you subnet because a /24 with 254 addresses is wasteful for 10 IoT sensors. You subnet because security requires boundaries. You subnet because routing protocols get confused when every router thinks it owns the same block. That's the why. The how follows.
Classful Subnetting: The Ancient Art of /8, /16, /24 and Why You Should Forget It
Classful addressing was the original mold: Class A got 16M hosts, Class B got 65K, Class C got 254. It was rigid and wasteful. If you had 300 machines, you took a Class B — 65K addresses flushed down the toilet. Subnetting smashed that mold by borrowing host bits to create smaller networks. But here's the trap: IP classes still infect routing tables and firewall rules. Many networking tools still default to classful boundaries. If you misconfigure a /23 and a device assumes classful /24, packets vanish. The competitors walk through Class A, B, C subnet tables with binary math. Fine. But the senior engineer takeaway is: treat your subnet mask as a binary mask, not a class label. Use CIDR everywhere. Never assume '192.168.x.x is a /24' — it's a /16 if you don't mask it. The key concept is that a subnet mask is just a 32-bit prefix length. Memorize the powers of two up to 32. Forget the classes.
The /25 That Killed Internet Access
- Always double-check subnet mask boundaries — a /25 vs /24 shifts the network address and can break connectivity silently.
- When designing public subnets, use at least /24 to avoid confusion and leave room for growth.
- Automate subnet creation with infrastructure-as-code and validate CIDR alignment before applying.
- Always validate subnet mask before attaching internet gateway — a mismatch can take down outbound traffic silently.
- After the fix, run a connectivity test from inside the subnet: ping 8.8.8.8 should succeed immediately.
- Lesson: When in doubt, use /24. The cost of a larger subnet is zero; the cost of debugging a wrong mask is hours.
aws ec2 describe-subnets --subnet-ids and look at 'AvailableIpAddressCount'. Increase subnet size or create a larger one.ipcalc to calculate the network address for both prefixes. If they differ, one router must be reconfigured with a matching mask or the route must be summarised.ipcalc or ipaddress.collapse_addresses to check for gaps.ipcalc 10.0.1.0/24ipcalc 10.0.1.0/24 --ipaddress 10.0.1.15Key takeaways
Common mistakes to avoid
5 patternsOverlapping subnets in different VPCs or environments
ipaddress overlap check before peering. Ensure each environment uses a dedicated /16 from 10.0.0.0/8. If already overlapping, re-address one side or use NAT.Hardcoding the default gateway as the first usable IP (e.g., 10.0.1.1) without verification
ip route show default.Using a non-contiguous subnet mask (e.g., 255.128.128.0)
mask_to_cidr should not throw. Replace with a valid contiguous mask.Underestimating IP consumption in Kubernetes (pod CIDR too small)
kubectl describe nodes.Forgetting AWS reserves 5 IPs per subnet
/28 gives only 11 usable, which is too small for production.Interview Questions on This Topic
Explain the difference between a /24 and a /25 subnet in terms of usable host addresses and network boundaries. Give a production scenario where using /25 instead of /24 would break connectivity.
Frequently Asked Questions
20+ years shipping production systems from the metal up. Lessons pulled from things that broke in production.
That's Computer Networks. Mark it forged?
16 min read · try the examples if you haven't