DNS TTL Killed a Migration — Computer Networks Interview
A 24-hour DNS TTL caused 30% traffic failure during a migration.
20+ years shipping production systems from the metal up. Notes here come from systems that actually shipped.
- OSI model isn't theory — it's a fault-isolation map for debugging network problems.
- TCP guarantees delivery (at a cost); UDP trades reliability for speed — choose based on data criticality.
- DNS resolution walks a cached hierarchy: browser → OS → resolver → root → TLD → authoritative.
- HTTPS = HTTP + TLS; the extra handshake adds ~2 RTT but protects data in transit.
- Subnetting with CIDR is how cloud providers isolate networks and control traffic flow.
Imagine the internet is a global postal system. Your computer is a house with an address (IP address), the postal routes are the network cables and Wi-Fi signals, and the rules about how letters get packed, addressed, and delivered are the protocols. When you visit google.com, you're essentially writing a letter, dropping it in a mailbox, watching it get sorted through multiple post offices (routers), and getting a reply back — all in milliseconds. Computer networking is the science of making that postal system fast, reliable, and secure.
Every backend engineer, DevOps engineer, and full-stack developer eventually sits across from an interviewer who asks 'What happens when you type a URL into a browser?' That question alone can make or break a senior-level interview. Networking isn't just a theoretical subject — it's the invisible infrastructure that your APIs, databases, and microservices live on. Understanding it deeply separates candidates who just write code from engineers who understand systems.
Why DNS TTL Is the Silent Saboteur in Network Migrations
DNS TTL (Time to Live) is the directive that tells resolvers how long to cache a DNS record before discarding it and querying the authoritative server again. It is the single most impactful knob for controlling the speed of DNS propagation — not a magic switch, but a cache expiration policy measured in seconds. A TTL of 300 means a resolver may serve a stale IP for up to five minutes; a TTL of 86400 means a full day of potential chaos.
TTL is set on the authoritative nameserver per record type (A, AAAA, CNAME). Resolvers — from ISP caches to browser-level caches — honor this value, but they are not required to. Some overzealous resolvers ignore TTLs entirely, caching records for hours beyond the specified value. This asymmetry is where migrations fail: you lower TTL before the cutover, but old records persist in opaque caches you cannot flush.
Use low TTLs (60–300 seconds) during planned migrations, DNS failovers, or any scenario where you need rapid rollback. Keep high TTLs (3600+) for stable, long-lived records to reduce query load and latency. The trade-off is between agility and efficiency — and ignoring it turns a simple A-record change into a multi-day outage.
The OSI Model — Why 7 Layers Actually Matter in Practice
The OSI (Open Systems Interconnection) model is a framework that breaks network communication into 7 distinct layers. Most people memorize the names ('Please Do Not Throw Sausage Pizza Away') and stop there. That's a mistake. Understanding what each layer is responsible for helps you debug real problems.
[Image of the 7 layers of the OSI model]
When your HTTP request fails, is it a DNS issue (Layer 7/5), a TCP connection problem (Layer 4), or a routing issue (Layer 3)? Knowing the layers lets you mentally narrow down where the fault is, just like a doctor using anatomy to diagnose illness.
In practice, you rarely work below Layer 4 (Transport) unless you're writing embedded systems or kernel code. But you absolutely need to understand Layers 3, 4, and 7 — IP addressing, TCP/UDP, and application protocols — because they appear in every production debugging scenario, from a failing API call to a slow database connection.
Here's the critical insight: layers are about separation of concerns. Each layer only talks to the layer directly above and below it. That's why you can swap out Wi-Fi for Ethernet (Layer 1/2 change) without rewriting your HTTP code (Layer 7). The abstraction is intentional and powerful.
TCP vs UDP — Choosing the Right Delivery Guarantee
TCP (Transmission Control Protocol) and UDP (User Datagram Protocol) are the two workhorses of the Transport layer, and choosing between them is one of the most consequential decisions in system design.
TCP is like sending a package with signature confirmation. Before any data moves, there's a 3-way handshake (SYN, SYN-ACK, ACK). Every packet is numbered, acknowledged, and retransmitted if lost. Order is guaranteed. This reliability costs time — that handshake adds latency, and the acknowledgment mechanism adds overhead.
UDP is like dropping a flyer through every door in the neighbourhood. You send it and forget it. No handshake, no acknowledgment, no guarantee of delivery or order. But it's blazingly fast, which is exactly what you need for real-time applications.
In modern systems, QUIC (used by HTTP/3) is effectively UDP with reliability built on top of it — proof that the TCP/UDP choice isn't always binary.
DNS Deep Dive — What Actually Happens When You Type a URL
DNS (Domain Name System) is the internet's phonebook. You know the name (google.com), and DNS finds the phone number (IP address). But the process behind that lookup is more fascinating than most people realise — and it's a classic interview question.
When your browser needs to resolve 'api.github.com', it doesn't just ask one server. It walks a hierarchy. First, it checks its local cache. If that's empty, it asks your OS's resolver. If that misses, it queries your ISP's recursive resolver. That resolver then walks the DNS tree: it asks a Root Name Server for the authoritative server for '.com', then asks that server for 'github.com', then finally asks GitHub's authoritative DNS server for 'api.github.com'. The answer comes back and gets cached at every step.
HTTP vs HTTPS, Status Codes, and Subnetting — The Interview Essentials
These three topics appear in virtually every networking interview, so let's cover them with precision.
HTTP vs HTTPS: HTTP sends everything in plaintext. HTTPS wraps HTTP inside TLS (Transport Layer Security). The TLS handshake happens after the TCP handshake. After that, all data is encrypted.
HTTP Status Codes: These are a language. 2xx means success. 3xx means redirect. 4xx means the client made an error. 5xx means the server failed.
Subnetting: An IP address like 192.168.1.100/24 means the first 24 bits identify the network and the last 8 bits identify the host. /24 gives you 256 addresses (254 usable).
Production Network Debugging: Tools Every Engineer Should Know
Knowing theory is one thing. Being able to diagnose a real outage under pressure is what separates senior engineers. Here are the tools that matter in production:
dig — The DNS Swiss Army knife. dig +trace shows you the full resolution path. dig -x does reverse lookup.
curl — Every engineer's first tool for HTTP debugging. Verbose mode (-v) shows the entire handshake. -k bypasses certificate validation (for testing only!).
tcpdump — Raw packet capture. Filter by host, port, or protocol. -A prints ASCII payload. Critical for diagnosing retransmissions and dropped packets.
traceroute/mtr — Shows the path packets take and where latency spikes. mtr combines ping and traceroute in real-time.
netstat/ss — Check open ports, connection states, and socket buffers. ss -tuln lists all listening TCP/UDP ports. ss -s shows overall statistics.
In an interview, being able to describe a real debugging session (e.g., 'I used tcpdump to spot TCP retransmissions, then mtr to find a congested router') is worth more than reciting the OSI layers.
- L1/L2: Physical link up? Check cables, carrier detect, interface stats.
- L3: IP connectivity? Ping the target (but remember ICMP may be blocked).
- L4: Port reachable? Telnet or curl against the port.
- L7: Application responding? Check HTTP status, response body, latency.
- If all layers pass locally but fail in production, the issue is likely configuration (firewall, DNS, load balancer rules).
The Three Pillars Your Interviewer Actually Cares About (CIA)
Every network interview eventually circles back to Confidentiality, Integrity, and Availability. Not because your interviewer loves theory, but because every production outage or security breach traces back to a failure in one of these three axioms.
Confidentiality means encryption isn't optional. If your packets travel in plaintext, you might as well broadcast secrets on public radio. Integrity ensures data didn't get mutated in transit — that's why production TLS includes MAC checks, not just encryption. Availability means your infrastructure survives a cable cut, not that it's fast.
The junior mistake: reciting definitions. The senior move: citing a real outage. "We lost availability when our east-west route flapped because BGP timers weren't tuned for the backup link." That's how you prove you live in the trenches, not the textbook.
VPNs Are Not Magic — Know the Three Flavors Before the Interview
Your interviewer will ask about VPNs. They don't want to hear "it's a secure tunnel." Every vendor's slide deck says that. They want to know you understand the real trade-offs between site-to-site, remote access, and clientless VPNs.
Site-to-site VPNs bridge two entire networks over the public internet. You use IPsec with IKEv2 in production — not PPTP from 1999. Remote access means a single laptop connects back to headquarters. Here, TLS-based VPNs (OpenVPN, WireGuard) dominate because they punch through NAT without screaming at the firewall. Clientless VPNs are SSL portals — users get a browser interface, no client installed. Convenient, but you lose endpoint policy enforcement.
The junior flubs this by blurring the lines. The senior nails it: "For our remote workforce, we run WireGuard because it's 4x faster than OpenVPN on the kernel, and we push posture checks via an always-on client." That's the answer that lands the offer.
Server Farms and Firewalls — The Zone-Based Model That Saves Your Skin
Zone-based firewalls aren't a buzzword. They're the difference between a clean segmentation strategy and a flat network that gets owned in one pivot. The concept: group interfaces into zones (inside, outside, DMZ). Traffic between zones is explicitly permitted or denied. Traffic within a zone is allowed — unless you want to be paranoid.
The server farm sits in a DMZ zone. Your web servers are in DMZ_EXT facing the internet. Your databases are in DMZ_INT, accessible only from DMZ_EXT. No user workstation ever talks to the database directly. This isn't paranoid — this is how you contain a breach when someone exploits your Wordpress plugin.
Your interviewer wants to hear you grok the zone logic, not just parrot "three-tier architecture." Say: "We run three zones. DMZ for public-facing services, internal for users, production for databases. Stateful inspection tracks sessions so we don't need ACLs per flow." That's a production-ready mindset, not a textbook answer.
Symmetric vs Asymmetric Encryption — Which One Actually Protects Your Data in Transit?
Encryption isn't magic. It's math with a key management problem. Symmetric encryption uses one shared key — fast, efficient, but you have to get that key to the other side without someone sniffing it. Think AES-256. Nobody breaks that in your lifetime. The problem is key exchange, not the cipher.
Asymmetric encryption solves key exchange with a public/private pair. You encrypt with my public key, I decrypt with my private key. Slower by orders of magnitude — RSA 4096 chews CPU. That's why real systems use hybrid: asymmetric to swap a session key, then symmetric for the heavy lifting.
Your TLS handshake does this every time. Production trap: if you're encrypting bulk data with asymmetric, you're wasting cycles. Use ECDH for key agreement, AES-GCM for the payload. Interviewers want to hear you understand the tradeoff, not just the definitions.
Digital Signatures — The Proof You're Not Getting Played by a MitM
A digital signature is not encryption. It's authentication plus integrity. You sign a hash of the message with your private key. Anyone with your public key can verify you wrote it and nobody changed it. This is how your SSH host key works. This is how code signing works. This is how git commit signing works.
Without digital signatures, you can't trust that the person on the other end of the wire is who they claim. Think about that next time you accept a self-signed cert in your browser. The signature binds identity to data. RSA and ECDSA are the heavy hitters. Ed25519 is gaining ground — smaller keys, faster verification, and resistance to side-channel attacks.
Real-world: If an interviewer asks 'how does HTTPS authentication work?', they're probing your knowledge of the certificate chain. Each CA signs the next. Root CAs are trusted by your OS. Break that chain, and the signature means nothing. Never disable certificate validation in production code.
IP Spoofing — Why the Internet's Address System Has a Built-in Trust Problem
IP spoofing is trivial. The IP header's source address is just a field. I can set it to anything. There's no authentication baked into IPv4. If you accept packets based solely on source IP, you're asking to get owned. This is how DDoS amplification works — attackers spoof your IP, send a small request to an open DNS resolver, and the resolver floods you with 50x the traffic.
Why can't we just fix this? Because the internet was built on trust. BGP doesn't verify origin AS. Routers forward packets based on destination, period. The fix is ingress filtering — RFC 2827. Network operators should block packets leaving their network with a source IP not in their prefix. But not everyone does. That's why spoofing still works in 2024.
Production takeaway: Never rely on source IP for authentication. That's what tokens, mTLS, and signatures are for. If you must use IP allowlists, put them behind a VPN with mutual auth. Otherwise you're one spoofed packet away from a breach.
Why Twisted-Pair Cabling Twists Matter for Signal Integrity
The twist in twisted-pair cable is not for aesthetics — it's electromagnetic combat. Each pair twists at a different rate (measured in twists per inch) to cancel out crosstalk and external interference. Two parallel wires act as antennas: one picks up noise, the other picks up the same noise. When twisted, each wire experiences the same interference but at opposite phases, canceling it at the receiver. This is common-mode rejection in action. Cat5e twists 4–5 times per inch; Cat6a uses tighter twists (6–7 per inch) to support 10GbE at higher frequencies. The twist ratio directly limits cable length — past 100 meters, phase cancellation degrades. Untwisting more than 1/2 inch at termination points kills performance. Interviewers ask this to see if you understand signal physics beyond parrot-fashion specs.
Authorization vs Authentication — The Gatekeeper and the Key
Authentication proves who you are; authorization proves what you can do. Mixing them causes breaches. Consider JWT: the token is authenticated via signature, but claims inside (roles, scopes) define authorization. A common failure: verifying the token signature but not checking if the user's role allows DELETE on /api/users. That's an IDOR vulnerability. In practice, authorization must be enforced at every API endpoint — never trust the client to send only authorized requests. The principle of least privilege says grant the minimum permissions for the minimum time. OAuth2 scopes handle coarse authorization; fine-grained systems use ABAC (Attribute-Based Access Control) with policies evaluated at runtime. Interviewers press on this because 70% of security incidents involve authorization misconfigurations — especially missing server-side checks after front-end UI hides a button.
Threat, Vulnerability, and Risk — The Security Triad That Drives Mitigation
A threat is a potential danger (hacker, storm). A vulnerability is a weakness (unpatched SSH, open S3 bucket). Risk is the probability and impact of a threat exploiting a vulnerability. Risk = Threat x Vulnerability x Consequence. Most interviewers use the FMEA model: you identify threats, score vulnerabilities by CVSS, then calculate risk as likelihood times impact. In production, you don't eliminate all threats — you reduce risk to an acceptable level. This drives decisions: patching a critical CVE (vulnerability) reduces exposure to a known exploit (threat), lowering risk. The OWASP Top 10 lists vulnerabilities, not threats. Threat modeling (STRIDE) identifies threats; vulnerability scanning finds weaknesses. Risk registers track both. Know that accepting risk is a valid response — but document it. Failing to distinguish these three causes misallocated budget: buying DDoS protection (threat) when the real problem is unpatched software (vulnerability).
5. Gateway — The Protocol Translator That Keeps Networks Talking
A gateway is a network node that acts as an entrance to another network, often translating between different protocols or data formats. Unlike a router, which forwards packets based on IP addresses within the same protocol family, a gateway can convert between entirely different network architectures—such as from IPv4 to IPv6, or from HTTP to a legacy mainframe protocol. This makes it essential for connecting corporate intranets to the internet, or for linking IoT sensor networks using Zigbee to a cloud API running on TCP/IP. The gateway sits at Layer 7 of the OSI model (Application Layer) because it often rewrites packet payloads. In interview contexts, expect questions about default gateways: every device needs one to reach external hosts, and the gateway itself must have a route to the destination. Why this matters: without a gateway, your internal LAN is an island. Misconfigured default gateways are a top cause of "no internet" tickets. The trade-off is that gateways introduce a single point of failure and latency, so production deployments pair them with redundant failover.
7. Modem — The Signal Shaper That Makes Digital Travel Analog
A modem (modulator-demodulator) converts digital data from a computer into analog signals for transmission over telephone lines, coaxial cables, or fiber optics, and then demodulates incoming analog signals back into digital form. This conversion is necessary because physical media like copper phone lines carry continuous waveforms, not discrete bits. Key modulation techniques include QAM (Quadrature Amplitude Modulation) for high throughput, and DMT (Discrete Multi-Tone) used in DSL to split bandwidth into sub-channels. Why this matters in interviews: modems operate at Layer 1 (Physical) of the OSI model, but poor modulation or line noise directly impacts Layer 3 throughput—a classic example of how physical-layer problems masquerade as network-layer issues. Today, cable and DSL modems are often combined with routers into a single "gateway" device, but the modem function remains distinct: it handles the physical handshake and error correction (e.g., CRC checks on ATM cells). In production, watch for signal-to-noise ratio degradation: as SNR drops, the modem auto-negotiates lower speeds to maintain link stability, silently crippling bandwidth.
The DNS TTL That Killed a Migration
- Always lower TTL to 60–300 seconds at least 24 hours before any IP change.
- Monitor DNS propagation with tools like dig +trace or whatsmydns.net.
- Keep the old server running until traffic drops to zero — not just until you flip the record.
ping 8.8.8.8 # L3 connectivity testdig +short google.com # DNS resolution testKey takeaways
Common mistakes to avoid
5 patternsThinking a 'ping' failure always means the server is down
Confusing the 3-way handshake (TCP) with the SSL handshake (TLS/HTTPS)
Not knowing the difference between a Recursive and Iterative DNS query
Ignoring the 'Ephemeral Port' range when debugging why a server can't make new outgoing connections
netstat -n | wc -l and tune ip_local_port_range if needed.Assuming HTTP 503 means the server is overloaded
Interview Questions on This Topic
What is the difference between an IP address and a MAC address, and at which OSI layers do they operate?
Frequently Asked Questions
20+ years shipping production systems from the metal up. Notes here come from systems that actually shipped.
That's Computer Networks. Mark it forged?
13 min read · try the examples if you haven't