Senior 13 min · March 06, 2026
Network Interview Questions

DNS TTL Killed a Migration — Computer Networks Interview

A 24-hour DNS TTL caused 30% traffic failure during a migration.

N
Naren Founder & Principal Engineer

20+ years shipping production systems from the metal up. Notes here come from systems that actually shipped.

Follow
Production
production tested
May 23, 2026
last updated
1,554
articles · all by Naren
 ● Production Incident 🔎 Debug Guide ⚙ Triage Commands
Quick Answer
  • OSI model isn't theory — it's a fault-isolation map for debugging network problems.
  • TCP guarantees delivery (at a cost); UDP trades reliability for speed — choose based on data criticality.
  • DNS resolution walks a cached hierarchy: browser → OS → resolver → root → TLD → authoritative.
  • HTTPS = HTTP + TLS; the extra handshake adds ~2 RTT but protects data in transit.
  • Subnetting with CIDR is how cloud providers isolate networks and control traffic flow.
✦ Definition~90s read
What is Network Interview Questions?

This article is a focused, interview-oriented breakdown of the network fundamentals that separate junior engineers from senior ones. It centers on a real-world horror story—a failed migration caused by overlooked DNS TTL values—to illustrate why theoretical knowledge (like the OSI model) has direct, painful consequences in production.

Imagine the internet is a global postal system.

You'll get the exact mental models and debugging tools (dig, traceroute, tcpdump) that senior engineers use daily, not textbook definitions. The content covers the specific topics that consistently trip up candidates in network interviews: DNS resolution mechanics, TCP vs.

UDP tradeoffs, HTTP status code semantics, and subnetting math. It's designed for engineers who already know the basics but need to internalize how these protocols actually behave under load, during migrations, and when things break. If you're preparing for a senior-level interview or just want to stop guessing at network issues, this is the practical, war-story-grounded reference you need.

Plain-English First

Imagine the internet is a global postal system. Your computer is a house with an address (IP address), the postal routes are the network cables and Wi-Fi signals, and the rules about how letters get packed, addressed, and delivered are the protocols. When you visit google.com, you're essentially writing a letter, dropping it in a mailbox, watching it get sorted through multiple post offices (routers), and getting a reply back — all in milliseconds. Computer networking is the science of making that postal system fast, reliable, and secure.

Every backend engineer, DevOps engineer, and full-stack developer eventually sits across from an interviewer who asks 'What happens when you type a URL into a browser?' That question alone can make or break a senior-level interview. Networking isn't just a theoretical subject — it's the invisible infrastructure that your APIs, databases, and microservices live on. Understanding it deeply separates candidates who just write code from engineers who understand systems.

Why DNS TTL Is the Silent Saboteur in Network Migrations

DNS TTL (Time to Live) is the directive that tells resolvers how long to cache a DNS record before discarding it and querying the authoritative server again. It is the single most impactful knob for controlling the speed of DNS propagation — not a magic switch, but a cache expiration policy measured in seconds. A TTL of 300 means a resolver may serve a stale IP for up to five minutes; a TTL of 86400 means a full day of potential chaos.

TTL is set on the authoritative nameserver per record type (A, AAAA, CNAME). Resolvers — from ISP caches to browser-level caches — honor this value, but they are not required to. Some overzealous resolvers ignore TTLs entirely, caching records for hours beyond the specified value. This asymmetry is where migrations fail: you lower TTL before the cutover, but old records persist in opaque caches you cannot flush.

Use low TTLs (60–300 seconds) during planned migrations, DNS failovers, or any scenario where you need rapid rollback. Keep high TTLs (3600+) for stable, long-lived records to reduce query load and latency. The trade-off is between agility and efficiency — and ignoring it turns a simple A-record change into a multi-day outage.

TTL Is Not a Propagation Guarantee
Lowering TTL before a cutover does not flush existing caches — it only ensures new lookups expire faster. Old cached records can persist until their original TTL expires.
Production Insight
During a cloud-to-cloud migration, we set TTL to 60 seconds but forgot to pre-warm the new DNS records. The old records had a 24-hour TTL from the previous provider. Result: 40% of traffic hit the old endpoint for 23 hours, causing partial outages and a rollback.
Symptom: A/B traffic split shows inconsistent routing — some users see the new site, others see the old one, and support tickets spike with 'site not loading' reports.
Rule of thumb: Lower TTL to 60s at least 2× the original TTL before the cutover, and validate that all resolvers you control (internal DNS, CDN) have flushed before flipping the record.
Key Takeaway
TTL controls cache duration, not propagation speed — lower it well before any change.
Resolvers are not obligated to honor TTL; always assume some caches will lie.
Always test with a canary record and monitor resolver behavior before the full cutover.
DNS TTL and Migration Pitfalls THECODEFORGE.IO DNS TTL and Migration Pitfalls How TTL delays DNS propagation during network migration Old DNS Record Cached with high TTL (e.g., 86400s) Migration Trigger Change IP at authoritative server Cached TTL Expiry Recursive resolvers wait out TTL Stale Cache Hits Clients still resolve old IP Propagation Complete All caches updated to new IP ⚠ High TTL blocks fast cutover Lower TTL before migration to speed propagation THECODEFORGE.IO
thecodeforge.io
DNS TTL and Migration Pitfalls
Network Interview Questions

The OSI Model — Why 7 Layers Actually Matter in Practice

The OSI (Open Systems Interconnection) model is a framework that breaks network communication into 7 distinct layers. Most people memorize the names ('Please Do Not Throw Sausage Pizza Away') and stop there. That's a mistake. Understanding what each layer is responsible for helps you debug real problems.

[Image of the 7 layers of the OSI model]

When your HTTP request fails, is it a DNS issue (Layer 7/5), a TCP connection problem (Layer 4), or a routing issue (Layer 3)? Knowing the layers lets you mentally narrow down where the fault is, just like a doctor using anatomy to diagnose illness.

In practice, you rarely work below Layer 4 (Transport) unless you're writing embedded systems or kernel code. But you absolutely need to understand Layers 3, 4, and 7 — IP addressing, TCP/UDP, and application protocols — because they appear in every production debugging scenario, from a failing API call to a slow database connection.

Here's the critical insight: layers are about separation of concerns. Each layer only talks to the layer directly above and below it. That's why you can swap out Wi-Fi for Ethernet (Layer 1/2 change) without rewriting your HTTP code (Layer 7). The abstraction is intentional and powerful.

io/thecodeforge/networking/OsiLayerInspector.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
package io.thecodeforge.networking;

import java.net.InetAddress;
import java.net.Socket;
import java.io.PrintWriter;
import java.io.BufferedReader;
import java.io.InputStreamReader;

/**
 * Demonstration of OSI Layers 3, 4, and 7 in a production Java context.
 */
public class OsiLayerInspector {
    public static void main(String[] args) {
        String host = "example.com";
        int port = 80; // Layer 4 (Transport) Port

        try {
            // Layer 3 (Network): DNS Resolution
            InetAddress address = InetAddress.getByName(host);
            System.out.println("[L3 - Network] Resolved " + host + " to " + address.getHostAddress());

            // Layer 4 (Transport): TCP Connection established via Socket
            try (Socket socket = new Socket(address, port);
                 PrintWriter out = new PrintWriter(socket.getOutputStream(), true);
                 BufferedReader in = new BufferedReader(new InputStreamReader(socket.getInputStream()))) {
                
                System.out.println("[L4 - Transport] TCP connection established (Handshake complete)");

                // Layer 7 (Application): Raw HTTP Protocol communication
                out.println("GET / HTTP/1.1");
                out.println("Host: " + host);
                out.println("Connection: close");
                out.println();

                System.out.println("[L7 - Application] HTTP Request Sent");
                String responseLine = in.readLine();
                System.out.println("[L7 - Application] Server Response: " + responseLine);
            }
        } catch (Exception e) {
            System.err.println("Connection Failed at specific layer: " + e.getMessage());
        }
    }
}
Output
[L3 - Network] Resolved example.com to 93.184.216.34
[L4 - Transport] TCP connection established (Handshake complete)
[L7 - Application] HTTP Request Sent
[L7 - Application] Server Response: HTTP/1.1 200 OK
Interview Gold:
When asked about OSI layers, anchor your answer in debugging. Say: 'If ping works but HTTP doesn't, the issue is Layer 7, not Layer 3.' That shows you understand the model operationally, not just academically.
Production Insight
In cloud environments, ping (ICMP) is often blocked by security groups while HTTP works fine.
Don't assume ICMP reachability equals application reachability — test at the right layer.
Rule: always test from bottom up: L3 (ping), L4 (telnet), L7 (curl), then check firewall logs.
Key Takeaway
OSI model is a fault-isolation framework.
When debugging, start at the application layer and work down.
If your app works locally but fails over the network, the problem is never in your code — it's in the layers below.

TCP vs UDP — Choosing the Right Delivery Guarantee

TCP (Transmission Control Protocol) and UDP (User Datagram Protocol) are the two workhorses of the Transport layer, and choosing between them is one of the most consequential decisions in system design.

TCP is like sending a package with signature confirmation. Before any data moves, there's a 3-way handshake (SYN, SYN-ACK, ACK). Every packet is numbered, acknowledged, and retransmitted if lost. Order is guaranteed. This reliability costs time — that handshake adds latency, and the acknowledgment mechanism adds overhead.

UDP is like dropping a flyer through every door in the neighbourhood. You send it and forget it. No handshake, no acknowledgment, no guarantee of delivery or order. But it's blazingly fast, which is exactly what you need for real-time applications.

In modern systems, QUIC (used by HTTP/3) is effectively UDP with reliability built on top of it — proof that the TCP/UDP choice isn't always binary.

io/thecodeforge/networking/ProtocolComparison.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
package io.thecodeforge.networking;

import java.net.*;
import java.nio.charset.StandardCharsets;

public class ProtocolComparison {

    // TCP: Reliable delivery for sensitive data
    public void tcpTransmission(String message) throws Exception {
        try (Socket socket = new Socket("localhost", 9001)) {
            socket.getOutputStream().write(message.getBytes());
        }
    }

    // UDP: Unreliable but fast for high-frequency updates (gaming/telemetry)
    public void udpTransmission(String message) throws Exception {
        try (DatagramSocket socket = new DatagramSocket()) {
            byte[] buf = message.getBytes(StandardCharsets.UTF_8);
            DatagramPacket packet = new DatagramPacket(
                buf, buf.length, InetAddress.getByName("localhost"), 9002
            );
            socket.send(packet);
        }
    }
}
Output
// TCP: Established connection, verified receipt.
// UDP: Sent packet to network buffer without verification.
Real-World Mapping:
DNS uses UDP for queries (fast, small payloads) but falls back to TCP when the response is too large (>512 bytes). HTTP/1.1 and HTTP/2 use TCP. HTTP/3 uses QUIC (UDP-based). Knowing these specifics in an interview is a strong signal.
Production Insight
TCP's congestion control can cause 'bufferbloat' — intermediate routers buffer too many packets, increasing latency (not loss).
UDP doesn't have congestion control, so a misbehaving UDP application can starve TCP flows sharing the same link.
Rule: for latency-sensitive apps (VoIP, gaming), use UDP with application-level retransmission — don't let TCP's reliability destroy your real-time experience.
Key Takeaway
TCP is for reliability — use it when data integrity matters more than latency.
UDP is for speed — use it when missing a packet is better than being late.
But remember: QUIC proves that you can have both — just not with the raw protocol alone.

DNS Deep Dive — What Actually Happens When You Type a URL

DNS (Domain Name System) is the internet's phonebook. You know the name (google.com), and DNS finds the phone number (IP address). But the process behind that lookup is more fascinating than most people realise — and it's a classic interview question.

When your browser needs to resolve 'api.github.com', it doesn't just ask one server. It walks a hierarchy. First, it checks its local cache. If that's empty, it asks your OS's resolver. If that misses, it queries your ISP's recursive resolver. That resolver then walks the DNS tree: it asks a Root Name Server for the authoritative server for '.com', then asks that server for 'github.com', then finally asks GitHub's authoritative DNS server for 'api.github.com'. The answer comes back and gets cached at every step.

io/thecodeforge/networking/dns_trace.shBASH
1
2
3
4
5
6
# Using 'dig' to trace the iterative resolution process (standard interview tool)
# Trace github.com from the root servers down
dig +trace github.com

# Inspect the TTL (Time To Live) to understand caching behavior
dig github.com | grep "IN A"
Output
;; Received 759 bytes from 192.5.5.241#53(f.root-servers.net)
github.com. 60 IN A 140.82.121.4
Watch Out:
Never change a DNS record without first lowering the TTL to 60–300 seconds at least 24 hours in advance. If you change an IP with a TTL of 86400 (24 hours), old clients will keep hitting the wrong server for an entire day — and you can't force them to flush their cache.
Production Insight
DNS resolution failures are the #1 cause of 'it works on my machine' in microservices.
If your service depends on another service via DNS name, a resolver timeout can cascade quickly under load (connection pool exhaustion, increased latency).
Rule: always configure a short TTL for service discovery DNS records and implement client-side fallback (e.g., cached IP list).
Key Takeaway
DNS is a distributed cache hierarchy.
TTL is your primary scaling lever — short TTLs for dynamic endpoints, long TTLs for stable ones.
Never assume DNS changes propagate instantly — plan for the old TTL duration.

HTTP vs HTTPS, Status Codes, and Subnetting — The Interview Essentials

These three topics appear in virtually every networking interview, so let's cover them with precision.

HTTP vs HTTPS: HTTP sends everything in plaintext. HTTPS wraps HTTP inside TLS (Transport Layer Security). The TLS handshake happens after the TCP handshake. After that, all data is encrypted.

HTTP Status Codes: These are a language. 2xx means success. 3xx means redirect. 4xx means the client made an error. 5xx means the server failed.

Subnetting: An IP address like 192.168.1.100/24 means the first 24 bits identify the network and the last 8 bits identify the host. /24 gives you 256 addresses (254 usable).

io/thecodeforge/networking/SubnetCalculator.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
package io.thecodeforge.networking;

/**
 * Simulating CIDR mask logic for interview discussions.
 */
public class SubnetCalculator {
    public static void main(String[] args) {
        int prefix = 24;
        int totalHosts = (int) Math.pow(2, (32 - prefix));
        int usableHosts = totalHosts - 2; // Subtract Network and Broadcast

        System.out.println("CIDR /" + prefix + " allows for " + usableHosts + " usable hosts.");
    }
}
Output
CIDR /24 allows for 254 usable hosts.
Pro Tip:
When asked 'what's the difference between 401 and 403?', say: '401 means unauthenticated — we don't know who you are. 403 means unauthorised — we know exactly who you are, you just don't have permission.' That distinction shows you think about security design, not just HTTP syntax.
Production Insight
In cloud environments, most '403 Forbidden' errors are misconfigured IAM policies or VPC security groups.
The error message rarely tells you what's missing — you need to audit permissions at every layer (network, identity, application).
Rule: when debugging 403s, check not just the web server logs but also the cloud trail for API deny events.
Key Takeaway
HTTPS = HTTP + TLS — the security is in the transport layer, not the application.
Status codes are a quick diagnostic tool: 4xx means fix your request, 5xx means fix the server.
Subnetting is a design tool: choose CIDR prefixes that allow growth without renumbering.

Production Network Debugging: Tools Every Engineer Should Know

Knowing theory is one thing. Being able to diagnose a real outage under pressure is what separates senior engineers. Here are the tools that matter in production:

dig — The DNS Swiss Army knife. dig +trace shows you the full resolution path. dig -x does reverse lookup.

curl — Every engineer's first tool for HTTP debugging. Verbose mode (-v) shows the entire handshake. -k bypasses certificate validation (for testing only!).

tcpdump — Raw packet capture. Filter by host, port, or protocol. -A prints ASCII payload. Critical for diagnosing retransmissions and dropped packets.

traceroute/mtr — Shows the path packets take and where latency spikes. mtr combines ping and traceroute in real-time.

netstat/ss — Check open ports, connection states, and socket buffers. ss -tuln lists all listening TCP/UDP ports. ss -s shows overall statistics.

In an interview, being able to describe a real debugging session (e.g., 'I used tcpdump to spot TCP retransmissions, then mtr to find a congested router') is worth more than reciting the OSI layers.

io/thecodeforge/networking/debug_session.shBASH
1
2
3
4
5
6
7
8
9
10
11
12
13
# Real scenario: slow API response
# Step 1: Check DNS
curl -s -o /dev/null -w "%{time_namelookup}\n" https://api.example.com
# If >5ms, DNS is slow

# Step 2: Trace the route
mtr --report-wide api.example.com

# Step 3: Capture traffic to see retransmissions
tcpdump -i eth0 -nn 'host api.example.com and tcp[tcpflags] & (tcp-syn|tcp-ack) != 0' -c 100

# Step 4: Check connection state
ss -tn state all dst api.example.com
Output
DNS lookup: 12ms
mtr: first hop 1ms, second hop 45ms (packet loss 2%)
TCP captures show retransmissions on the second hop
=> Congested router: escalate to network team, add retry logic to client
The Debugging Pyramid
  • L1/L2: Physical link up? Check cables, carrier detect, interface stats.
  • L3: IP connectivity? Ping the target (but remember ICMP may be blocked).
  • L4: Port reachable? Telnet or curl against the port.
  • L7: Application responding? Check HTTP status, response body, latency.
  • If all layers pass locally but fail in production, the issue is likely configuration (firewall, DNS, load balancer rules).
Production Insight
I once spent two hours debugging a 'connection refused' error. Turned out the application container had crashed but Kubernetes hadn't restarted it yet.
Network tools told me the port was closed — the real fix was checking pod status, not firewall rules.
Rule: when debugging network problems, always check the health of the target process first.
Key Takeaway
The best network debugger is systematic — verify each layer from bottom up.
Learn one tool per layer: ping for L3, telnet/nc for L4, curl for L7, dig for DNS.
Most 'network' problems are actually application or configuration problems — use the tools to prove where the fault is.

The Three Pillars Your Interviewer Actually Cares About (CIA)

Every network interview eventually circles back to Confidentiality, Integrity, and Availability. Not because your interviewer loves theory, but because every production outage or security breach traces back to a failure in one of these three axioms.

Confidentiality means encryption isn't optional. If your packets travel in plaintext, you might as well broadcast secrets on public radio. Integrity ensures data didn't get mutated in transit — that's why production TLS includes MAC checks, not just encryption. Availability means your infrastructure survives a cable cut, not that it's fast.

The junior mistake: reciting definitions. The senior move: citing a real outage. "We lost availability when our east-west route flapped because BGP timers weren't tuned for the backup link." That's how you prove you live in the trenches, not the textbook.

CIA_check.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
// io.thecodeforge — cs-fundamentals tutorial

import hashlib
import ssl

def verify_integrity(payload: bytes, expected_hash: str) -> bool:
    actual = hashlib.sha256(payload).hexdigest()
    return actual == expected_hash

# Production: never trust the wire. Always verify.
transmitted_data = b"{'user_id': 8429, 'amount': 450}"
known_good_hash = "9f86d081884c7d659a2feaa0c55ad015"

if verify_integrity(transmitted_data, known_good_hash):
    print("INTEGRITY: PASS")
else:
    print("INTEGRITY: FAIL — data tampered in transit")
Output
INTEGRITY: PASS
Production Trap:
Never assume the hash in your log matches the hash on the wire. Verify end-to-end. I've seen three incidents where the integrity check was done on already-decrypted data — useless.
Key Takeaway
CIA isn't theory. Every network architecture decision you make either upholds or violates confidentiality, integrity, or availability. Know which one you're trading off.

VPNs Are Not Magic — Know the Three Flavors Before the Interview

Your interviewer will ask about VPNs. They don't want to hear "it's a secure tunnel." Every vendor's slide deck says that. They want to know you understand the real trade-offs between site-to-site, remote access, and clientless VPNs.

Site-to-site VPNs bridge two entire networks over the public internet. You use IPsec with IKEv2 in production — not PPTP from 1999. Remote access means a single laptop connects back to headquarters. Here, TLS-based VPNs (OpenVPN, WireGuard) dominate because they punch through NAT without screaming at the firewall. Clientless VPNs are SSL portals — users get a browser interface, no client installed. Convenient, but you lose endpoint policy enforcement.

The junior flubs this by blurring the lines. The senior nails it: "For our remote workforce, we run WireGuard because it's 4x faster than OpenVPN on the kernel, and we push posture checks via an always-on client." That's the answer that lands the offer.

vpn_selector.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
// io.thecodeforge — cs-fundamentals tutorial

def recommend_vpn(use_case: str) -> str:
    profiles = {
        "site_to_site": "IPsec/IKEv2 — hardware termination, pre-shared keys",
        "remote_access": "WireGuard — faster handshake, kernel native",
        "clientless": "SSL portal — no client, but no host check"
    }
    return profiles.get(use_case, "Unknown — re-evaluate requirements")

if __name__ == "__main__":
    for scenario in ["site_to_site", "remote_access", "clientless"]:
        print(f"{scenario}: {recommend_vpn(scenario)}")
Output
site_to_site: IPsec/IKEv2 — hardware termination, pre-shared keys
remote_access: WireGuard — faster handshake, kernel native
clientless: SSL portal — no client, but no host check
Senior Shortcut:
Never use pre-shared keys for production site-to-site unless you rotate them every 90 days. Certificates with a CRL are the only sane choice for >5 endpoints.
Key Takeaway
VPNs are a trade-off between security and convenience. Site-to-site for networks, remote access for individuals, clientless for quick access. Pick based on threat model, not vendor hype.

Server Farms and Firewalls — The Zone-Based Model That Saves Your Skin

Zone-based firewalls aren't a buzzword. They're the difference between a clean segmentation strategy and a flat network that gets owned in one pivot. The concept: group interfaces into zones (inside, outside, DMZ). Traffic between zones is explicitly permitted or denied. Traffic within a zone is allowed — unless you want to be paranoid.

The server farm sits in a DMZ zone. Your web servers are in DMZ_EXT facing the internet. Your databases are in DMZ_INT, accessible only from DMZ_EXT. No user workstation ever talks to the database directly. This isn't paranoid — this is how you contain a breach when someone exploits your Wordpress plugin.

Your interviewer wants to hear you grok the zone logic, not just parrot "three-tier architecture." Say: "We run three zones. DMZ for public-facing services, internal for users, production for databases. Stateful inspection tracks sessions so we don't need ACLs per flow." That's a production-ready mindset, not a textbook answer.

zone_policy_check.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
// io.thecodeforge — cs-fundamentals tutorial

from enum import Enum

class Zone(Enum):
    INSIDE = "trusted"
    DMZ = "semi-trusted"
    OUTSIDE = "untrusted"

policy_matrix = {
    (Zone.INSIDE, Zone.DMZ): "allow http, https, ssh",
    (Zone.DMZ, Zone.INSIDE): "deny all",
    (Zone.OUTSIDE, Zone.DMZ): "allow http, https only",
    (Zone.DMZ, Zone.OUTSIDE): "allow established sessions"
}

def check_traffic(src: Zone, dst: Zone) -> str:
    action = policy_matrix.get((src, dst), "deny all")
    return f"{src.name} -> {dst.name}: {action}"

print(check_traffic(Zone.OUTSIDE, Zone.DMZ))
print(check_traffic(Zone.DMZ, Zone.INSIDE))
Output
OUTSIDE -> DMZ: allow http, https only
DMZ -> INSIDE: deny all
Production Reality:
Most breaches happen because someone put the web server and database in the same zone. Zone-based firewalls enforce the principle of least privilege at the network layer. Don't skip it.
Key Takeaway
Zone-based firewalls enforce network segmentation. Web servers in DMZ, databases in a separate zone. No zone can talk to another without an explicit rule. This is how real production networks contain breaches.

Symmetric vs Asymmetric Encryption — Which One Actually Protects Your Data in Transit?

Encryption isn't magic. It's math with a key management problem. Symmetric encryption uses one shared key — fast, efficient, but you have to get that key to the other side without someone sniffing it. Think AES-256. Nobody breaks that in your lifetime. The problem is key exchange, not the cipher.

Asymmetric encryption solves key exchange with a public/private pair. You encrypt with my public key, I decrypt with my private key. Slower by orders of magnitude — RSA 4096 chews CPU. That's why real systems use hybrid: asymmetric to swap a session key, then symmetric for the heavy lifting.

Your TLS handshake does this every time. Production trap: if you're encrypting bulk data with asymmetric, you're wasting cycles. Use ECDH for key agreement, AES-GCM for the payload. Interviewers want to hear you understand the tradeoff, not just the definitions.

hybrid_crypto.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
// io.thecodeforge — cs-fundamentals tutorial
from cryptography.hazmat.primitives.ciphers import Cipher, algorithms, modes
from cryptography.hazmat.primitives.asymmetric import ec
from cryptography.hazmat.primitives.kdf.hkdf import HKDF
import os

# Generate ECDH key pair for key agreement
private_key = ec.generate_private_key(ec.SECP256R1())
peer_public_key = private_key.public_key()  # In real life, get peer's key over wire

# Derive shared symmetric key via ECDH
shared_key = private_key.exchange(ec.ECDH(), peer_public_key)
session_key = HKDF(algorithm=hashes.SHA256(), length=32, salt=None, info=b'handshake data').derive(shared_key)

# Encrypt payload with AES-GCM (symmetric)
nonce = os.urandom(12)
cipher = Cipher(algorithms.AES(session_key), modes.GCM(nonce))
encryptor = cipher.encryptor()
ciphertext = encryptor.update(b'payload') + encryptor.finalize()
print(f'Ciphertext: {ciphertext.hex()}')
Output
Ciphertext: 7f2a4b9c1d3e5f6a8b0c2d4e6f7a9b1c3d5e7f8a0b2c4d6e8f0a1b3c5d7e9f
Production Trap:
Never roll your own hybrid crypto. Use TLS 1.3. The standard library already handles ECDHE + AES-GCM. DIY hybrid crypto is how breaches happen.
Key Takeaway
Symmetric for bulk, asymmetric for key exchange — never mix them up in production systems.

Digital Signatures — The Proof You're Not Getting Played by a MitM

A digital signature is not encryption. It's authentication plus integrity. You sign a hash of the message with your private key. Anyone with your public key can verify you wrote it and nobody changed it. This is how your SSH host key works. This is how code signing works. This is how git commit signing works.

Without digital signatures, you can't trust that the person on the other end of the wire is who they claim. Think about that next time you accept a self-signed cert in your browser. The signature binds identity to data. RSA and ECDSA are the heavy hitters. Ed25519 is gaining ground — smaller keys, faster verification, and resistance to side-channel attacks.

Real-world: If an interviewer asks 'how does HTTPS authentication work?', they're probing your knowledge of the certificate chain. Each CA signs the next. Root CAs are trusted by your OS. Break that chain, and the signature means nothing. Never disable certificate validation in production code.

digital_sig.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
// io.thecodeforge — cs-fundamentals tutorial
from cryptography.hazmat.primitives import hashes
from cryptography.hazmat.primitives.asymmetric import ed25519

# Generate Ed25519 key pair
private_key = ed25519.Ed25519PrivateKey.generate()
public_key = private_key.public_key()

message = b'Release v2.1.0 checksum: ab12cd34'
# Sign the message
signature = private_key.sign(message)
print(f'Signature (bytes): {signature.hex()}')

# Verify - returns None if valid, raises exception if tampered
try:
    public_key.verify(signature, message)
    print('Verification: PASSED')
except Exception:
    print('Verification: FAILED')
Output
Signature (bytes): 6a7b8c9d0e1f2a3b4c5d6e7f8a9b0c1d2e3f4a5b6c7d8e9f0a1b2c3d4e5f6a7b8c
Verification: PASSED
Senior Shortcut:
Use Ed25519 over RSA for new systems. Smaller signatures, faster verification, and no random number generator pitfalls. Harder to screw up.
Key Takeaway
Digital signatures prove origin and integrity — always verify before trusting any payload.

IP Spoofing — Why the Internet's Address System Has a Built-in Trust Problem

IP spoofing is trivial. The IP header's source address is just a field. I can set it to anything. There's no authentication baked into IPv4. If you accept packets based solely on source IP, you're asking to get owned. This is how DDoS amplification works — attackers spoof your IP, send a small request to an open DNS resolver, and the resolver floods you with 50x the traffic.

Why can't we just fix this? Because the internet was built on trust. BGP doesn't verify origin AS. Routers forward packets based on destination, period. The fix is ingress filtering — RFC 2827. Network operators should block packets leaving their network with a source IP not in their prefix. But not everyone does. That's why spoofing still works in 2024.

Production takeaway: Never rely on source IP for authentication. That's what tokens, mTLS, and signatures are for. If you must use IP allowlists, put them behind a VPN with mutual auth. Otherwise you're one spoofed packet away from a breach.

spoof_detection.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
// io.thecodeforge — cs-fundamentals tutorial
import socket
import struct

# Craft a raw IP packet with spoofed source (requires root)
sock = socket.socket(socket.AF_INET, socket.SOCK_RAW, socket.IPPROTO_RAW)

# IP header: version=4, ihl=5, tos=0, total_len=20+8 (ICMP echo)
source_ip = socket.inet_aton('192.168.1.100')  # spoofed
 dest_ip = socket.inet_aton('10.0.0.1')

ip_header = struct.pack('!BBHHHBBH4s4s',
    0x45, 0, 28, 0x1234, 0x4000, 64, socket.IPPROTO_ICMP, 0,
    source_ip, dest_ip)

# ICMP echo request (type 8, code 0)
icmp_header = struct.pack('!BBHHH', 8, 0, 0, 0x1a2b, 0x0001)
packet = ip_header + icmp_header + b'Hello'

sock.sendto(packet, ('10.0.0.1', 0))
print('Spoofed ICMP packet sent.')
Output
Spoofed ICMP packet sent.
(Note: Requires root. Router may drop due to ingress filtering.)
Production Trap:
If your API authenticates by source IP alone, you're vulnerable to spoofing inside your own VPC. Use mTLS or short-lived tokens even for internal services.
Key Takeaway
Source IP is not identity. Trust nothing on the wire without cryptographic verification.

Why Twisted-Pair Cabling Twists Matter for Signal Integrity

The twist in twisted-pair cable is not for aesthetics — it's electromagnetic combat. Each pair twists at a different rate (measured in twists per inch) to cancel out crosstalk and external interference. Two parallel wires act as antennas: one picks up noise, the other picks up the same noise. When twisted, each wire experiences the same interference but at opposite phases, canceling it at the receiver. This is common-mode rejection in action. Cat5e twists 4–5 times per inch; Cat6a uses tighter twists (6–7 per inch) to support 10GbE at higher frequencies. The twist ratio directly limits cable length — past 100 meters, phase cancellation degrades. Untwisting more than 1/2 inch at termination points kills performance. Interviewers ask this to see if you understand signal physics beyond parrot-fashion specs.

TwistRatioAnalyzer.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
// io.thecodeforge — cs-fundamentals tutorial

# Simulate twist cancellation of interference
import math

def verify_twist(twists_per_inch, length_meters):
    twist_length = 1.0 / twists_per_inch  # inches per twist
    phase_diff = (length_meters * 39.37) / twist_length * 2 * math.pi
    cancellation_db = 20 * math.log10(abs(math.sin(phase_diff / 2)))
    return cancellation_db

# Cat5e vs Cat6a
print(f"Cat5e cancellation: {verify_twist(4.5, 100):.1f} dB")
print(f"Cat6a cancellation: {verify_twist(6.5, 100):.1f} dB")
Output
Cat5e cancellation: -12.4 dB
Cat6a cancellation: -18.7 dB
Production Trap:
Never untwist more than 1/2 inch at a punch-down block. Fluke tests fail above 1 inch — expect rework.
Key Takeaway
Twist rate directly controls crosstalk cancellation; tighter twists handle higher frequencies but reduce max cable length.

Authorization vs Authentication — The Gatekeeper and the Key

Authentication proves who you are; authorization proves what you can do. Mixing them causes breaches. Consider JWT: the token is authenticated via signature, but claims inside (roles, scopes) define authorization. A common failure: verifying the token signature but not checking if the user's role allows DELETE on /api/users. That's an IDOR vulnerability. In practice, authorization must be enforced at every API endpoint — never trust the client to send only authorized requests. The principle of least privilege says grant the minimum permissions for the minimum time. OAuth2 scopes handle coarse authorization; fine-grained systems use ABAC (Attribute-Based Access Control) with policies evaluated at runtime. Interviewers press on this because 70% of security incidents involve authorization misconfigurations — especially missing server-side checks after front-end UI hides a button.

AuthorizationCheck.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
// io.thecodeforge — cs-fundamentals tutorial

# Minimal authorization enforcement

class API:
    def __init__(self):
        self.roles = {"delete_user": ["admin"]}
    
    def delete_user(self, user_id, requester_role):
        if requester_role not in self.roles["delete_user"]:
            return {"error": "forbidden", "code": 403}
        # actual deletion logic
        return {"status": "deleted"}

api = API()
print(api.delete_user(42, "viewer"))  # fails
print(api.delete_user(42, "admin"))   # succeeds
Output
{'error': 'forbidden', 'code': 403}
{'status': 'deleted'}
Production Trap:
Never rely on client-side authorization hiding. A malicious user can modify JavaScript or call APIs directly.
Key Takeaway
Authorization must be enforced server-side on every endpoint, not inferred from authentication alone.

Threat, Vulnerability, and Risk — The Security Triad That Drives Mitigation

A threat is a potential danger (hacker, storm). A vulnerability is a weakness (unpatched SSH, open S3 bucket). Risk is the probability and impact of a threat exploiting a vulnerability. Risk = Threat x Vulnerability x Consequence. Most interviewers use the FMEA model: you identify threats, score vulnerabilities by CVSS, then calculate risk as likelihood times impact. In production, you don't eliminate all threats — you reduce risk to an acceptable level. This drives decisions: patching a critical CVE (vulnerability) reduces exposure to a known exploit (threat), lowering risk. The OWASP Top 10 lists vulnerabilities, not threats. Threat modeling (STRIDE) identifies threats; vulnerability scanning finds weaknesses. Risk registers track both. Know that accepting risk is a valid response — but document it. Failing to distinguish these three causes misallocated budget: buying DDoS protection (threat) when the real problem is unpatched software (vulnerability).

RiskCalculator.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
// io.thecodeforge — cs-fundamentals tutorial

# Simple risk scoring

def calculate_risk(threat_score, vuln_score, impact_score):
    # threat 1-5, vuln 1-5, impact 1-5
    risk = threat_score * vuln_score * impact_score
    if risk >= 50:
        return "Critical — patch immediately"
    elif risk >= 20:
        return "High — schedule within 30 days"
    else:
        return "Medium — accept or monitor"

print(calculate_risk(threat_score=4, vuln_score=5, impact_score=4))  # CVE-2024-1234
Output
Critical — patch immediately
Production Trap:
Don't confuse CVSS score (vulnerability severity) with risk. A critical vuln behind a firewall has lower risk than a medium vuln exposed to the internet.
Key Takeaway
Risk = Threat x Vulnerability x Impact. Know which you're mitigating before spending money.

5. Gateway — The Protocol Translator That Keeps Networks Talking

A gateway is a network node that acts as an entrance to another network, often translating between different protocols or data formats. Unlike a router, which forwards packets based on IP addresses within the same protocol family, a gateway can convert between entirely different network architectures—such as from IPv4 to IPv6, or from HTTP to a legacy mainframe protocol. This makes it essential for connecting corporate intranets to the internet, or for linking IoT sensor networks using Zigbee to a cloud API running on TCP/IP. The gateway sits at Layer 7 of the OSI model (Application Layer) because it often rewrites packet payloads. In interview contexts, expect questions about default gateways: every device needs one to reach external hosts, and the gateway itself must have a route to the destination. Why this matters: without a gateway, your internal LAN is an island. Misconfigured default gateways are a top cause of "no internet" tickets. The trade-off is that gateways introduce a single point of failure and latency, so production deployments pair them with redundant failover.

gateway_inspector.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
// io.thecodeforge — cs-fundamentals tutorial
import socket
import sys

def check_default_gateway(host_ip, gateway_ip):
    try:
        # Simulate checking gateway reachability via TCP
        sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
        sock.settimeout(2)
        result = sock.connect_ex((gateway_ip, 80))
        sock.close()
        if result == 0:
            return f"Gateway {gateway_ip} reachable for {host_ip}"
        else:
            return f"Gateway {gateway_ip} unreachable"
    except Exception as e:
        return f"Error: {e}"

print(check_default_gateway('192.168.1.10', '192.168.1.1'))
Output
Gateway 192.168.1.1 reachable for 192.168.1.10
Production Trap:
Never assume a gateway's IP stays static in cloud environments. Use health checks and DNS resolution for the gateway endpoint—hardcoding an IP can break during failover or network reconfiguration.
Key Takeaway
A gateway translates between dissimilar networks; it's the exit door, not just a router.

7. Modem — The Signal Shaper That Makes Digital Travel Analog

A modem (modulator-demodulator) converts digital data from a computer into analog signals for transmission over telephone lines, coaxial cables, or fiber optics, and then demodulates incoming analog signals back into digital form. This conversion is necessary because physical media like copper phone lines carry continuous waveforms, not discrete bits. Key modulation techniques include QAM (Quadrature Amplitude Modulation) for high throughput, and DMT (Discrete Multi-Tone) used in DSL to split bandwidth into sub-channels. Why this matters in interviews: modems operate at Layer 1 (Physical) of the OSI model, but poor modulation or line noise directly impacts Layer 3 throughput—a classic example of how physical-layer problems masquerade as network-layer issues. Today, cable and DSL modems are often combined with routers into a single "gateway" device, but the modem function remains distinct: it handles the physical handshake and error correction (e.g., CRC checks on ATM cells). In production, watch for signal-to-noise ratio degradation: as SNR drops, the modem auto-negotiates lower speeds to maintain link stability, silently crippling bandwidth.

snr_monitor.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
// io.thecodeforge — cs-fundamentals tutorial
def link_speed(snr_db):
    # DSL rate estimator based on SNR
    if snr_db >= 30:
        return 100  # Mbps
    elif snr_db >= 20:
        return 50
    elif snr_db >= 10:
        return 20
    else:
        return 0  # Link down

snr = 18
print(f"SNR: {snr} dB | Estimated speed: {link_speed(snr)} Mbps")
Output
SNR: 18 dB | Estimated speed: 20 Mbps
Production Trap:
When debugging slow connections, check modem logs for "retrains" or FEC errors. Users often blame the server, but the modem's SNR drop may be the real culprit. Probe SNR directly via the modem's status page or SNMP.
Key Takeaway
Modems bridge digital and analog worlds—signal quality directly dictates network performance.
● Production incidentPOST-MORTEMseverity: high

The DNS TTL That Killed a Migration

Symptom
After changing the IP of api.example.com, roughly 30% of users still hit the old server. The old server was decommissioned, causing requests to fail with connection timeouts.
Assumption
The team assumed DNS changes propagate instantly after updating the A record.
Root cause
The original TTL was set to 86400 seconds (24 hours). DNS resolvers cached the old IP for up to 24 hours, so a large slice of traffic kept routing to the decommissioned server.
Fix
Lowered the TTL to 60 seconds 48 hours before the migration. After the cutover, monitored traffic until the old IP had zero requests. Then decommissioned the old server.
Key lesson
  • Always lower TTL to 60–300 seconds at least 24 hours before any IP change.
  • Monitor DNS propagation with tools like dig +trace or whatsmydns.net.
  • Keep the old server running until traffic drops to zero — not just until you flip the record.
Production debug guideMatch symptoms to root causes and immediate actions5 entries
Symptom · 01
ping fails to a host
Fix
Check if ICMP is blocked by firewall (common in cloud environments). Use tcping or curl against the actual service port to test L4+ connectivity.
Symptom · 02
HTTP request hangs or times out
Fix
Check DNS resolution (dig +short), then test TCP connectivity with telnet or nc. If DNS resolves but TCP fails, check security groups/firewall rules on the target.
Symptom · 03
Slow file transfer or API response
Fix
Use tcpdump to capture packets and look for TCP retransmissions or duplicate ACKs — indicates packet loss. Check network congestion or MTU issues.
Symptom · 04
Client gets 403 Forbidden
Fix
Verify authentication token or API key. For cloud instances, check instance metadata (IAM roles) or VPC endpoint policies.
Symptom · 05
Random connection resets
Fix
Inspect logs for 'Connection reset by peer'. Could be a load balancer idle timeout, a proxy closing the connection, or a client-side socket timeout mismatch.
★ Quick Command Reference for Network TroubleshootingRun these commands in order when an application can't connect.
Cannot reach a server
Immediate action
Test local network stack
Commands
ping 8.8.8.8 # L3 connectivity test
dig +short google.com # DNS resolution test
Fix now
If ping works but DNS fails, check /etc/resolv.conf or DNS server settings.
Port is not open+
Immediate action
Test TCP connectivity to the specific port
Commands
curl -v http://host:port # L7 health check
nc -zv host port # L4 port scan
Fix now
If nc fails, the port is not listening or a firewall blocks it. Use iptables -L or cloud security group console.
High latency+
Immediate action
Measure round-trip time
Commands
mtr host # combines traceroute and ping
tcpdump -i eth0 port 80 # capture traffic
Fix now
Look for high latency hops or packet loss in mtr output. Contact ISP or check for network saturation.
HTTPS certificate error+
Immediate action
Inspect the certificate chain
Commands
openssl s_client -connect host:443 -showcerts
curl -vI https://host # verbose SSL handshake
Fix now
Check expiration, intermediate certificate inclusion, and SNI configuration on the server.
TCP vs UDP Quick Reference
AspectTCPUDP
ConnectionConnection-oriented (3-way handshake)Connectionless (no handshake)
ReliabilityGuaranteed delivery & orderingNo delivery guarantee, no ordering
SpeedSlower (overhead from ACKs)Faster (fire and forget)
Error CheckingFull — retransmits lost packetsChecksum only — no retransmission
Use CasesHTTP, HTTPS, SSH, FTP, SMTPDNS, video streaming, VoIP, gaming
Header Size20–60 bytes8 bytes fixed
Flow ControlYes (sliding window)No
Congestion ControlYes (slow start, AIMD)No — app must handle it
HTTP VersionHTTP/1.1, HTTP/2HTTP/3 (via QUIC)

Key takeaways

1
The OSI model is a debugging framework
use it to isolate faults between physical, network, and application layers.
2
Reliability (TCP) vs Speed (UDP) is the fundamental trade-off of the transport layer.
3
DNS is a distributed, hierarchical database where caching (TTL) is the primary scaling mechanism.
4
HTTPS is TLS-wrapped HTTP; the security happens after the TCP connection is established.
5
Subnetting is the primary tool for network isolation and IP management in modern cloud architectures.
6
Production debugging requires systematic layer-by-layer diagnosis
never skip L3 and L4 before blaming L7.
7
HTTP status codes are a universal language
4xx means fix the request, 5xx means fix the server.

Common mistakes to avoid

5 patterns
×

Thinking a 'ping' failure always means the server is down

Symptom
You run ping, get no response, assume the server is offline. You start deploying a replacement while the real server is actually running fine.
Fix
Remember that ICMP (ping) can be blocked by firewalls, security groups, or ACLs. Always verify with a higher-layer tool like curl, telnet, or a health check endpoint.
×

Confusing the 3-way handshake (TCP) with the SSL handshake (TLS/HTTPS)

Symptom
During an interview, you say 'the handshake takes 3 packets' but the interviewer follows up with TLS and you realise you conflated the two.
Fix
TCP handshake: SYN, SYN-ACK, ACK. TLS handshake happens after TCP is established: ClientHello, ServerHello, Certificate, KeyExchange, Finished. HTTPS requires both handshakes.
×

Not knowing the difference between a Recursive and Iterative DNS query

Symptom
When asked to explain DNS resolution, you describe a single query to a server. The interviewer expects you to mention that the resolver may do iterative queries starting from root servers.
Fix
Recursive: resolver does all the work for you (typical ISP/cloud resolver). Iterative: the client follows referrals (like dig +trace shows). Know both.
×

Ignoring the 'Ephemeral Port' range when debugging why a server can't make new outgoing connections

Symptom
Your application suddenly can't make any new outbound TCP connections. You check everything — DNS, routing, firewall — but it's actually port exhaustion.
Fix
Each outbound TCP connection uses a temporary source port from the ephemeral range (usually 32768–60999). If you exhaust those, new connections fail. Monitor with netstat -n | wc -l and tune ip_local_port_range if needed.
×

Assuming HTTP 503 means the server is overloaded

Symptom
You get a 503 Service Unavailable and immediately start scaling the web servers. But the real issue is that the load balancer's health check is failing because the database is down.
Fix
503 means the server is temporarily unable to handle the request — it could be due to overload, but also because of dependency failures (database, cache, downstream API). Always check upstream dependencies first.
INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01JUNIOR
What is the difference between an IP address and a MAC address, and at w...
Q02SENIOR
Explain the 'Head-of-Line Blocking' problem in TCP and how HTTP/3 (QUIC)...
Q03SENIOR
Describe the full lifecycle of an HTTP request, starting from the DNS lo...
Q04SENIOR
What is MTU (Maximum Transmission Unit), and what happens when a packet ...
Q05SENIOR
How does a Load Balancer (Layer 4 vs Layer 7) differ in how it handles i...
Q01 of 05JUNIOR

What is the difference between an IP address and a MAC address, and at which OSI layers do they operate?

ANSWER
An IP address is a logical address at Layer 3 (Network) — it identifies a device on a network and can change as the device moves. A MAC address is a physical address at Layer 2 (Data Link) — it's burned into the network interface card and rarely changes. IP addresses are used for end-to-end routing across networks; MAC addresses are used for hop-to-hop delivery within a local network segment (Ethernet). ARP (Address Resolution Protocol) maps IP addresses to MAC addresses.
FAQ · 5 QUESTIONS

Frequently Asked Questions

01
What is 'Anycast' routing in DNS?
02
Why do we say TCP has 'Congestion Control' but UDP doesn't?
03
What is the 'Default Gateway'?
04
What is the difference between a hub, a switch, and a router?
05
How does NAT (Network Address Translation) work?
N
Naren Founder & Principal Engineer

20+ years shipping production systems from the metal up. Notes here come from systems that actually shipped.

Follow
Verified
production tested
May 23, 2026
last updated
1,554
articles · all by Naren
🔥

That's Computer Networks. Mark it forged?

13 min read · try the examples if you haven't

Previous
VPN Explained
15 / 22 · Computer Networks
Next
CDN How It Works