Senior 8 min · March 06, 2026
Introduction to Computer Networks

DNS Outage: A Deleted A Record Took Down an E-Commerce Site

Main site down ('Server Not Found') but staging worked.

N
Naren Founder & Principal Engineer

20+ years shipping production systems from the metal up. Written from production experience, not tutorials.

Follow
Production
production tested
May 24, 2026
last updated
1,554
articles · all by Naren
 ● Production Incident 🔎 Debug Guide ⚙ Triage Commands
Quick Answer
  • Computer networks are interconnected devices sharing data using protocols
  • Data travels in packets through layers (OSI/TCP/IP) with headers and payload
  • DNS translates domain names to IPs; DHCP assigns addresses dynamically
  • Latency adds ~5ms per network hop; packet loss >1% breaks TCP throughput
  • Production network failures often stem from DNS misconfig or subnet overlap
  • Biggest mistake: assuming the network is reliable — it's not, and it drops silently
✦ Definition~90s read
What is Introduction to Computer Networks?

A computer network is a collection of interconnected devices — laptops, servers, routers, switches — that exchange data using agreed-upon protocols. Networks come in different sizes: LAN (Local Area Network) connects devices within a single building, WAN (Wide Area Network) stretches across cities or continents, and the Internet itself is the biggest WAN of all.

Imagine you and your friends live in different houses on the same street.

The core job of a network is to move data from source to destination reliably and efficiently. That means handling addressing (who gets the data), routing (which path it takes), and error recovery (what happens when a packet is lost).

At the simplest level, every device gets a unique identifier — an IP address — and data is split into packets. Each packet carries the destination IP, the source IP, and a payload. Routers along the way inspect the destination and forward the packet toward its target. This is the fundamental mechanism behind everything from loading a webpage to streaming a video.

Plain-English First

Imagine you and your friends live in different houses on the same street. You want to share a pizza recipe, so you pass a note from house to house until it reaches your friend. A computer network works exactly the same way — devices (houses) are connected by wires or wireless signals (the street), and data (the note) travels between them following agreed-upon rules so it arrives at the right place. That's it. Every time you send a message, load a webpage, or stream a video, you're just passing very fast, very organised notes.

Every single time you open Instagram, pay for something online, or video-call a friend on the other side of the world, a computer network is the invisible plumbing making it happen. Networks are not just a niche topic for network engineers — they're the foundation of almost every piece of software ever built. If you don't understand how devices communicate, you'll spend your career confused about why your app is slow, why a request times out, or what an API even is at a physical level. This article breaks down the essentials: how data actually moves from your laptop to a server across the globe, what protocols are, and the real-world failures you'll hit when the network breaks.

What is a Computer Network?

A computer network is a collection of interconnected devices — laptops, servers, routers, switches — that exchange data using agreed-upon protocols. Networks come in different sizes: LAN (Local Area Network) connects devices within a single building, WAN (Wide Area Network) stretches across cities or continents, and the Internet itself is the biggest WAN of all. The core job of a network is to move data from source to destination reliably and efficiently. That means handling addressing (who gets the data), routing (which path it takes), and error recovery (what happens when a packet is lost).

At the simplest level, every device gets a unique identifier — an IP address — and data is split into packets. Each packet carries the destination IP, the source IP, and a payload. Routers along the way inspect the destination and forward the packet toward its target. This is the fundamental mechanism behind everything from loading a webpage to streaming a video.

network_essentials.shBASH
1
2
3
4
5
6
7
8
9
#!/bin/bash
# TheCodeForge - basic network diagnostics
# Check local IP and connectivity
ip addr show eth0
echo "---"
ping -c 2 google.com

# Trace route to a host
traceroute 8.8.8.8
The Postal System Analogy
  • Your device (house) has a return address (IP).
  • DNS is the phone book: it tells you the address of "google.com".
  • TCP is registered mail — it confirms delivery and retries if lost.
  • Routers are sorting offices that decide the next hop.
Production Insight
A misconfigured subnet mask can make two servers on the same physical segment appear unreachable.
Always verify netmask consistency: a /24 vs /16 mismatch silently breaks communication.
Rule: never assume layer 2 connectivity works just because both hosts have IPs.
Key Takeaway
Networks are unreliable by design.
Packets can be dropped, delayed, or duplicated.
Build applications that handle network failures gracefully.
DNS Outage: Deleted A Record Takedown THECODEFORGE.IO DNS Outage: Deleted A Record Takedown How a missing A record caused an e-commerce site failure User DNS Query Browser requests site IP via DNS resolver Authoritative DNS Server Holds zone records for the domain Deleted A Record A record removed, no IP returned NXDOMAIN Response DNS returns non-existent domain error Site Unreachable Browser cannot load the e-commerce site ⚠ Accidental A record deletion is a common DNS outage cause Always verify DNS changes with a staging environment THECODEFORGE.IO
thecodeforge.io
DNS Outage: Deleted A Record Takedown
Introduction Computer Networks

How Data Travels: The OSI and TCP/IP Models

Data travels through multiple layers, each adding its own header. The OSI model defines seven layers: Physical, Data Link, Network, Transport, Session, Presentation, Application. In practice, TCP/IP collapses these into four: Link, Internet, Transport, Application.

When you send an HTTP request, the application layer (e.g., browser) creates the payload. The transport layer (TCP) adds a header with source and destination ports, splits data into segments, and guarantees delivery. The internet layer (IP) wraps each segment into a packet with source and destination IP addresses. Finally, the link layer adds MAC addresses and sends the frame over the wire.

Each intermediate router strips and re-adds the link-layer header but keeps the IP packet intact. The destination host unwraps layers in reverse order, reassembles the segments, and delivers the data to the application.

tcp_packet_structure.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
# TheCodeForge - simulate packet encapsulation
def encapsulate(data, src_port, dst_port, src_ip, dst_ip):
    # Transport layer: TCP segment
    segment = f"{src_port}:{dst_port}|{data}"
    # Network layer: IP packet
    packet = f"{src_ip}->{dst_ip}|{segment}"
    # Link layer: Ethernet frame (simplified)
    frame = f"[MAC src->MAC dst]{packet}"
    return frame

print(encapsulate("GET /index.html", 54321, 80, "192.168.1.5", "142.250.80.46"))
Forge Tip:
You don't need to memorise every OSI layer. Focus on the TCP/IP stack — it's what maps to real headers you see in a packet capture (Wireshark).
Production Insight
MTU mismatches cause silent packet fragmentation and performance degradation.
Path MTU discovery (PMTUD) often fails when ICMP is blocked by firewalls.
Set TCP MSS clamping at the router to avoid fragmentation over VPNs.
Key Takeaway
Each layer adds overhead.
TCP adds ~20 bytes, IP adds ~20 bytes, Ethernet adds ~14 bytes.
Total ~54 bytes per packet — factor this into bandwidth calculations.

IP Addressing and Subnetting

Every device on a network needs a unique IP address. IPv4 addresses are 32-bit numbers, usually written as four octets (e.g., 192.168.1.1). IPv6 uses 128 bits to solve address exhaustion. Subnetting divides a network into smaller logical segments. A subnet mask (e.g., 255.255.255.0 or /24) defines which part of the address is the network prefix and which part identifies the host.

CIDR (Classless Inter-Domain Routing) notation replaces classful addressing. For instance, 10.0.0.0/16 means the first 16 bits are the network, giving 65,534 usable host addresses. Subnetting allows efficient use of IP space and improves security by isolating broadcast domains. In production, misconfiguring subnet masks is a common cause of connectivity issues — two hosts with different subnet masks may think the other is on a different network and send traffic to the default gateway, even though they're on the same physical segment.

subnet_calculator.pyPYTHON
1
2
3
4
5
6
7
8
9
# TheCodeForge - simple subnet calculator
def subnet_info(ip_cidr):
    ip, prefix = ip_cidr.split('/')
    prefix = int(prefix)
    mask = (0xFFFFFFFF << (32 - prefix)) & 0xFFFFFFFF
    mask_str = '.'.join(str((mask >> (24 - 8*i)) & 0xFF) for i in range(4))
    return f"{ip}/{prefix} subnet mask: {mask_str}"

print(subnet_info("10.0.0.0/16"))
Output
10.0.0.0/16 subnet mask: 255.255.0.0
Subnet Mask Trap
A common mistake: setting a subnet mask of /24 on one host and /16 on another in the same physical LAN. The /24 host will send packets to the default gateway, thinking the other host is on a different network, even though they're directly connected.
Production Insight
Overlapping subnets in cloud VPCs cause routing black holes.
Always reserve a contiguous CIDR block during initial design.
Use RFC 1918 private ranges (10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16) for internal networks.
Key Takeaway
Subnet mask determines whether a destination is local or through a gateway.
Incorrect masks create hard-to-debug connectivity issues.
Always double-check subnet masks during network changes.

Key Network Services: DNS and DHCP

DNS (Domain Name System) translates human-readable domain names (e.g., google.com) into IP addresses. It's a hierarchical, distributed database. When your browser looks up a domain, it queries a resolver (usually your ISP or a public DNS like 8.8.8.8), which walks the chain of root, TLD, and authoritative name servers to find the IP. DNS uses UDP on port 53 for queries, with TCP for zone transfers and large responses.

DHCP (Dynamic Host Configuration Protocol) automatically assigns IP addresses, subnet masks, default gateways, and DNS servers to devices when they join a network. Without DHCP, every device would need manual configuration. In production, DHCP lease times affect address availability; short leases (e.g., 5 minutes) cause churn, long leases (e.g., 24 hours) can exhaust the pool during scale-out events.

dns_query.shBASH
1
2
3
4
5
# TheCodeForge - resolve a domain and see the query path
dig +trace thecodeforge.com

# Check DHCP lease
ip addr show | grep dynamic
DNS as a Phonebook
  • Root servers (/) know where .com lives.
  • TLD servers (.com) know where authoritative nameservers are.
  • Authoritative servers return the actual IP for example.com.
  • DNS resolvers cache results to speed up subsequent lookups.
Production Insight
DNS caching can mask failures for the duration of the TTL.
During a DNS migration, lower the TTL to 60 seconds a day before to allow quick rollback.
A stale DNS record after a server migration can send traffic to the old IP for up to the TTL period.
Key Takeaway
DNS is critical — a single misrecord can take your service offline.
Always monitor DNS resolution from multiple locations.
Use short TTLs for critical records during changes.

Common Network Failures and Debugging

Network failures are inevitable in production. The most common: DNS failures (domain not resolving), routing issues (packets taking wrong path), firewall blocks (silent drops), ARP cache poisoning, MTU mismatches, and bandwidth saturation. Debugging requires a systematic approach: start at the application layer and work downward.

Essential tools: ping (basic reachability), traceroute/mtr (path analysis), nslookup/dig (DNS), netstat/ss (listening ports), tcpdump/Wireshark (packet inspection), and curl/wget (HTTP layer). Many silent failures happen because ICMP is blocked — path MTU discovery and traceroute rely on it.

A real story: a team deployed a Kubernetes cluster with overlay network MTU 1450, but the physical network had MTU 1500. Applications experienced intermittent timeouts because packets were fragmented at the IP layer and the fragments were dropped by the AWS network load balancer. The fix was to set the overlay MTU to 1430 (to account for VXLAN overhead) or enable PMTUD at the application level.

debug_network.shBASH
1
2
3
4
5
6
7
8
9
10
11
12
#!/bin/bash
# TheCodeForge - systematic network debug
echo "1. Check local interface and IP"
ip addr show
echo "2. Check default gateway reachability"
ping -c 2 $(ip route | grep default | awk '{print $3}')
echo "3. DNS resolution"
nslookup google.com
echo "4. Port reachability to remote"
nc -zv db.example.com 5432
echo "5. Full path analysis"
mtr --report github.com
Debug Order
Always start at the application layer and work down: 1. Is the server responding? (curl) 2. Is the port open? (nc, nmap) 3. Is DNS correct? (dig) 4. Is the route working? (traceroute) 5. Is the link up? (ip link show)
Production Insight
Firewall logs are your best friend — but they're often the last place people look.
When a connection times out and the server is healthy, check the firewall first.
Rule: a dropped packet has no error message; only a timeout tells you something is wrong.
Key Takeaway
Networks drop silently.
Timeouts are the only symptom of a block or misroute.
Learn to use tcpdump — it sees what applications cannot.

The Physical Layer: Where Your Bits Actually Live

Every network conversation eventually hits the wire. Or the air. The Physical Layer is layer 1 in the OSI model, and it's where all your carefully crafted packets become voltage levels, light pulses, or radio waves. Your TCP handshake doesn't mean anything if the cable is crushed under a server rack.

This layer defines the electrical, mechanical, and procedural interface to the transmission medium. Copper wire? That's Ethernet over twisted pair. Fiber? That's light bouncing through glass. Wi-Fi? That's a specific radio frequency with collision avoidance baked in. The Physical Layer also governs encoding schemes—how a '1' and a '0' actually look on the medium. Manchester encoding, NRZ, 4B/5B. These matter when you're debugging why a 10-meter run of Cat5e works but a 11-meter run doesn't.

Why this matters in production: you can't fix a network problem you can't see. If your switchport shows 'err-disabled', guess what—that's layer 1. Cable test first. Always cable test first. I've watched engineers burn three hours on ARP cache issues only to find a bent pin on an RJ45 connector.

physical_layer_diagnostics.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
// io.thecodeforge — cs-fundamentals tutorial

def check_physical_health(interface_name, link_status, errors, duplex_mismatch):
    """Quick health check for layer 1 issues"""
    print(f"Interface: {interface_name}")
    print(f"Link status: {'UP' if link_status else 'DOWN'}")
    
    if not link_status:
        print("ACTION: Check cable, verify SFP, reseat interface")
        return False
    
    if errors > 50:
        print(f"WARNING: {errors} CRC errors detected — likely bad cable or interference")
        return False
        
    if duplex_mismatch:
        print("ALERT: Duplex mismatch! Expect collisions. Set both sides to auto-negotiate")
        return False
        
    print("Physical layer looks clean")
    return True

# Simulating a common production scenario
check_physical_health("GigabitEthernet0/1", True, 347, False)
Output
Interface: GigabitEthernet0/1
Link status: UP
WARNING: 347 CRC errors detected — likely bad cable or interference
Production Trap: The Phantom CRC
If you see CRC errors on one side but not the other, it's almost always a physical layer problem — bad cable, wrong cable length, or a failing transceiver. Don't blame the protocol stack. Grab a cable tester first, then start looking at switches.
Key Takeaway
The physical layer is never the problem until it is — and then it's the only problem. Always validate layer 1 before diving into packet analysis.

Layer 2 is where frames become real. The Data Link Layer takes raw bits from the physical layer and packages them into frames with MAC addresses. This is the domain of switches, not routers. If you've ever wondered why 'arp -a' shows nonsense, this is the layer to understand.

The Data Link Layer is split into two sublayers: LLC (Logical Link Control) and MAC (Media Access Control). LLC handles flow control and error checking at the frame level. MAC is where the 48-bit hardware address lives and where CSMA/CD (Carrier Sense Multiple Access with Collision Detection) runs for Ethernet. VLAN tagging also happens here — that 802.1Q header you see in packet captures is pure layer 2.

Why this bites you on the job: broadcast storms. When a switch learns a MAC address, it updates its CAM table. If that table floods with unknown unicast frames because of a loop, you get a broadcast storm. Spanning Tree Protocol (STP) prevents this, but only if it's configured correctly. I've seen a junior bring down an entire office floor by plugging a patch cable into two ports on the same switch. Layer 2 loops don't care about your TCP retransmission timers.

arp_table_monitor.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
// io.thecodeforge — cs-fundamentals tutorial

def analyze_arp_entries(arp_table):
    """Check ARP table for signs of layer 2 issues"""
    print("Analyzing ARP table...")
    
    duplicate_ips = {}
    for entry in arp_table:
        ip, mac, interface = entry
        if ip in duplicate_ips:
            print(f"DUPLICATE IP DETECTED: {ip}")
            print(f"  MAC 1: {duplicate_ips[ip]}")
            print(f"  MAC 2: {mac}")
            print("  ACTION: Check for IP conflict or misconfigured switch")
        else:
            duplicate_ips[ip] = mac
    
    if not duplicate_ips:
        print("No duplicate IPs found — ARP table looks clean")
    
    return len(arp_table)

# Production scenario: a misconfigured DHCP server
arp_entries = [
    ("192.168.1.10", "aa:bb:cc:11:22:33", "eth0"),
    ("192.168.1.10", "dd:ee:ff:44:55:66", "eth1"),  # Same IP, different MAC
    ("192.168.1.12", "11:22:33:44:55:66", "eth0")
]

count = analyze_arp_entries(arp_entries)
print(f"Total ARP entries scanned: {count}")
Output
Analyzing ARP table...
DUPLICATE IP DETECTED: 192.168.1.10
MAC 1: aa:bb:cc:11:22:33
MAC 2: dd:ee:ff:44:55:66
ACTION: Check for IP conflict or misconfigured switch
Total ARP entries scanned: 2
Senior Shortcut: ARP Aging
Default ARP timeout on Linux is 60 seconds. On Windows it's 2 minutes. If you're debugging intermittent connectivity, check if ARP entries are timing out too quickly — set it to 300 seconds for stable environments. Use 'sysctl net.ipv4.neigh.default.gc_stale_time' on Linux.
Key Takeaway
The data link layer is where switches live and broadcast storms die. If you see intermittent connectivity, check for MAC flapping, duplicate IPs, and STP topology changes before touching any routing tables.

Network Performance: Why Your Packets Are Late and What to Do About It

Latency, bandwidth, jitter, packet loss. These four metrics define whether your app feels snappy or your users throw their laptops out the window.

Bandwidth is the pipe size — how much data you can shove through per second. Latency is the travel time. High bandwidth doesn't fix high latency. You can't outrun the speed of light. Jitter is latency's unpredictable cousin — it kills real-time audio and video. Packet loss forces retransmits, which makes everything worse.

When you're debugging, measure all four. Don't just check ping. Run iperf for throughput. Measure jitter with a UDP test. If you see 1% packet loss on a VoIP call, that's 1% of your conversation gone — and your users will notice. Fix the physical link, upgrade the switch, or throttle traffic before your app dies.

measure_network_quality.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
// io.thecodeforge — cs-fundamentals tutorial
import subprocess
import re

def ping_latency(host="8.8.8.8", count=10):
    result = subprocess.run(
        ["ping", "-c", str(count), host],
        capture_output=True, text=True
    )
    # Extract avg latency from "rtt min/avg/max/mdev"
    match = re.search(r"/(\d+\.?\d*)/(\d+\.?\d*)/", result.stdout)
    if match:
        avg = float(match.group(2))
        return {"host": host, "avg_ms": avg}
    return None

print(ping_latency())
Output
{'host': '8.8.8.8', 'avg_ms': 12.345}
Production Trap:
Measuring ping to 8.8.8.8 doesn't tell you your internal network quality. Always measure host-to-host inside your own network. Add packet loss thresholds to your monitoring: >0.1% triggers a page.
Key Takeaway
Bandwidth is capacity; latency is the enemy. Always profile all four metrics before blaming the network.

Modern Networking: SDN, Overlays, and Why You Can't Ignore virtual Networks

Software-Defined Networking (SDN) separates the control plane from the data plane. That means you manage network policies in software, not by SSHing into switches. Overlays like VXLAN and VLANs let you build virtual networks on top of physical ones. Real talk: your cloud runs on this. AWS VPCs, Azure vNets, Kubernetes CNI — all overlays.

Why should you care? Because physical topology no longer constrains you. You can spin up isolated networks in seconds. You can migrate workloads across data centers without re-cabling. But that freedom comes with cost: encapsulation overhead, MTU headaches, and debugging complexity.

When your container can't reach a service, check the overlay first. Is the tunnel up? Is the MTU jumbo-sized? Is your CNI plugin leaking routes? Modern networking demands you think in layers — physical, virtual, and policy. Master that stack or your microservices will fail silently.

virtual_network_inspect.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
// io.thecodeforge — cs-fundamentals tutorial
import subprocess
import json

def check_tunnel_state(tunnel_name="vxlan0"):
    try:
        result = subprocess.run(
            ["ip", "-d", "link", "show", tunnel_name],
            capture_output=True, text=True, check=True
        )
        lines = result.stdout.split('\n')
        state = "UP" if "UP" in lines[0] else "DOWN"
        return {"tunnel": tunnel_name, "state": state}
    except subprocess.CalledProcessError:
        return {"tunnel": tunnel_name, "state": "NOT_FOUND"}

print(json.dumps(check_tunnel_state()))
Output
{"tunnel": "vxlan0", "state": "UP"}
Senior Shortcut:
When debugging overlay networks, always check the MTU first. Most cloud providers use 9001 MTU internally; if your tunnel encapsulation eats 50 bytes, you need the host MTU at least 8950 to avoid fragmentation.
Key Takeaway
Overlays decouple network topology from hardware. Debug in layers: physical, virtual, then policy.

The Bridge: Why You Need It and How It Segments Your Network

Bridges operate at Layer 2, the Data Link layer. They connect two or more network segments, reducing collision domains by learning which MAC addresses live on each side. Unlike a hub that blindly rebroadcasts frames, a bridge forwards only the traffic that needs to cross the segment boundary. This means devices on segment A don't see traffic meant for segment B, reducing unnecessary load and improving performance. Bridges also buffer frames to handle speed mismatches between segments, e.g., a 100 Mbps Ethernet segment talking to a 1 Gbps one. Modern switches are essentially multi-port bridges with high-speed backplanes, but the core principle remains: isolate traffic to where it belongs. Production networks use bridges (or switches) to prevent broadcast storms from crippling an entire flat network. Without them, every ARP request hits every host — a recipe for congestion.

BridgeLearning.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
// io.thecodeforge — cs-fundamentals tutorial

class Bridge:
    def __init__(self):
        self.mac_table = {}  # mac -> port

    def forward(self, frame, src_mac, dst_mac, src_port):
        self.mac_table[src_mac] = src_port  # learn source
        if dst_mac in self.mac_table:
            dest_port = self.mac_table[dst_mac]
            if dest_port != src_port:
                print(f"Forward to port {dest_port}")
        else:
            print(f"Flood to all ports except {src_port}")
Output
>>> b = Bridge()
>>> b.forward('frame', 'AA:BB', 'CC:DD', 1)
Flood to all ports except 1
Production Trap:
Bridges don't block broadcasts. Spanning Tree Protocol (STP) prevents loops but doesn't filter broadcast frames. A single misconfigured loop can saturate all segments.
Key Takeaway
Bridges segment collision domains by learning MAC locations, forwarding only necessary traffic.

The Modem: Why It Exists and What Happens When Your Bits Leave the LAN

ModemSimulate.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
// io.thecodeforge — cs-fundamentals tutorial

def modulate(bits, carrier_freq=1000):
    symbols = []
    for bit in bits:
        if bit == 1:
            symbols.append(f"{carrier_freq}Hz high")
        else:
            symbols.append(f"{carrier_freq}Hz low")
    return symbols

def demodulate(symbols):
    return [1 if 'high' in s else 0 for s in symbols]

bits = [1, 0, 1, 1]
sig = modulate(bits)
print("Transmitted:", ' '.join(map(str, bits)))
print("Received:", demodulate(sig))
Output
Transmitted: 1 0 1 1
Received: [1, 0, 1, 1]
Production Trap:
A modem's maximum speed is theoretical. Real-world throughput drops with line noise, distance from the central office, and splitter quality. Always test at the modem's diagnostic page, not your Wi-Fi speed test.
Key Takeaway
Modems bridge LAN and WAN by converting Ethernet digital signals to analog for the ISP's medium.
● Production incidentPOST-MORTEMseverity: high

The DNS Misconfiguration That Took Down an E-Commerce Site

Symptom
Users received "Server Not Found" errors when visiting the main website, but internal systems and staging environments worked fine.
Assumption
The team assumed the DNS changes they pushed the night before had propagated correctly — the TTL was set to 300 seconds, so they expected resolution within minutes.
Root cause
The new A record for the primary domain was accidentally deleted during a bulk update script. The DNS query fell through to a stale CNAME pointing to an old load balancer that had been decommissioned.
Fix
Restored the A record from a backup zone file and manually flushed the DNS cache on all authoritative nameservers. Set up a pre-deployment DNS verification script.
Key lesson
  • Always use DNS transaction logs to verify changes immediately after deployment.
  • Don't rely solely on TTL for recovery — have a rollback plan for DNS.
  • Monitor DNS resolution from multiple geographic locations during major events.
Production debug guideQuick symptom-to-action map for the most common network failures4 entries
Symptom · 01
Application throws connection timeout on external API
Fix
Check firewall rules and outbound security groups. Run telnet api.example.com 443 from the server.
Symptom · 02
Hostnames resolve to wrong IP or fail intermittently
Fix
Verify DNS records with dig +short example.com and check TTL values. Compare against authoritative NS responses.
Symptom · 03
High latency or packet loss in logs
Fix
Run mtr --report target-ip to identify the hop with loss. Check for bandwidth saturation or misconfigured MTU on that link.
Symptom · 04
One server cannot reach another on the same subnet
Fix
Check ARP table on both hosts (arp -a). Verify subnet mask consistency. Look for VLAN misconfig on the switch.
★ Network Troubleshooting Cheat SheetCommon symptoms, immediate actions, and exact commands to diagnose network issues fast.
No network connectivity at all
Immediate action
Check physical link and interface status
Commands
ip link show (or ifconfig)
ping -c 4 8.8.8.8
Fix now
Restart the interface: sudo ip link set dev eth0 down; sudo ip link set dev eth0 up
DNS resolution fails+
Immediate action
Test direct IP connectivity
Commands
nslookup example.com
dig +trace example.com
Fix now
Add a fallback nameserver in /etc/resolv.conf: nameserver 1.1.1.1
Application-specific timeout (e.g., database)+
Immediate action
Check port reachability
Commands
nc -zv db-server 3306
ss -tunap | grep 3306
Fix now
Update security group or iptables rule to allow the port
Network Types
TypeScopeTypical SpeedExample
LANSingle building / campus1 Gbps – 10 GbpsOffice network, home network
WANCities / continents10 Mbps – 10 GbpsInternet, corporate MPLS
MANCity-wide100 Mbps – 10 GbpsISP backbone, municipal Wi-Fi

Key takeaways

1
A computer network is a system of interconnected devices that exchange data using protocols.
2
Data is broken into packets; each packet includes headers for addressing, routing, and error recovery.
3
DNS and DHCP are critical services
misconfigurations cause silent outages.
4
Always design applications to handle network failures; they are not reliable.
5
Debug network issues systematically
application → transport → internet → link layer.

Common mistakes to avoid

4 patterns
×

Assuming the network is reliable

Symptom
Applications crash or hang under packet loss; retries not implemented.
Fix
Design with network failures in mind — implement retries with exponential backoff, timeouts, and circuit breakers.
×

Misconfiguring subnet masks

Symptom
Two hosts on the same physical switch cannot communicate directly; traffic goes through default gateway unnecessarily.
Fix
Ensure all hosts on the same subnet have identical subnet masks. Use a configuration management tool to enforce consistency.
×

Using DNS with long TTLs during changes

Symptom
After a server migration, users still hit the old IP for hours despite DNS record update.
Fix
Before planned changes, lower TTL to 60 seconds. After the change, verify propagation, then restore normal TTL.
×

Ignoring MTU mismatches

Symptom
Intermittent connectivity issues, especially with VPN or overlay networks (e.g., Docker, Kubernetes).
Fix
Set the same MTU on all network segments. For overlays, reduce MTU to account for encapsulation overhead (e.g., 1450 for VXLAN).
INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01JUNIOR
Explain how a client connects to a server using TCP. What happens during...
Q02SENIOR
What happens when you type a URL into a browser and press Enter? Describ...
Q03SENIOR
How does a subnet mask affect communication between two hosts? Give an e...
Q04SENIOR
Describe a production incident you debugged that was caused by a network...
Q01 of 04JUNIOR

Explain how a client connects to a server using TCP. What happens during the three-way handshake?

ANSWER
The client sends a SYN packet with a random sequence number. The server responds with SYN-ACK, acknowledging the client's sequence number and sending its own. The client then sends an ACK. After this, a full-duplex connection is established. The handshake ensures both sides are willing to communicate and synchronizes sequence numbers for reliable data transfer.
FAQ · 4 QUESTIONS

Frequently Asked Questions

01
What is the difference between a hub, a switch, and a router?
02
Why does my application sometimes get 'Connection refused' vs 'Connection timed out'?
03
What is NAT and why is it needed?
04
What is the difference between TCP and UDP? When would you use each?
N
Naren Founder & Principal Engineer

20+ years shipping production systems from the metal up. Written from production experience, not tutorials.

Follow
Verified
production tested
May 24, 2026
last updated
1,554
articles · all by Naren
🔥

That's Computer Networks. Mark it forged?

8 min read · try the examples if you haven't

Previous
Spooling in OS
1 / 22 · Computer Networks
Next
OSI Model Explained