Senior 9 min · March 06, 2026
DNS — Domain Name System

DNS Lookups — Why Your MX Change Took 48 Hours to Propagate

Half your email hit the old server for 48 hours after an MX update.

N
Naren Founder & Principal Engineer

20+ years shipping production systems from the metal up. Written from production experience, not tutorials.

Follow
Production
production tested
May 24, 2026
last updated
1,554
articles · all by Naren
 ● Production Incident 🔎 Debug Guide ⚙ Triage Commands
Quick Answer
  • DNS translates domain names to IP addresses using a distributed hierarchy
  • Recursive resolver queries root, TLD, then authoritative nameservers
  • A, CNAME, MX, TXT records control website, email, and verification
  • Caching via TTL makes subsequent lookups take microseconds, not milliseconds
  • Production trap: changing DNS without reducing TTL first causes hours of stale cache
  • Biggest mistake: CNAME at the root domain — it's forbidden by the RFC
✦ Definition~90s read
What is DNS?

The Domain Name System (DNS) is the internet's phonebook — a distributed, hierarchical database that maps human-readable domain names like example.com to machine-routable IP addresses like 93.184.216.34. Without DNS, you'd be typing raw IP addresses into your browser, which is impractical at scale.

Imagine every person on Earth had a phone, but instead of saving contacts by name, you had to memorise everyone's 12-digit phone number.

DNS solves the fundamental problem of translating names to numbers across a global network, using a decentralized architecture that avoids a single point of failure. It's not a single server but a multi-layered system of resolvers, root servers, TLD servers, and authoritative nameservers, each caching results to reduce latency and load.

DNS is critical for nearly every internet service — email routing (MX records), website hosting (A/AAAA records), domain verification (TXT records), and service discovery (SRV records). When you change an MX record for email delivery, you're updating a piece of data on your authoritative nameserver, but the old value may live for hours or days in caches across ISPs, corporate networks, and public resolvers like Google (8.8.8.8) or Cloudflare (1.1.1.1).

This caching is governed by Time-To-Live (TTL) values, which you set per record. The infamous "48-hour propagation" is a myth — DNS doesn't propagate; it expires. Changes take effect only after every cached copy's TTL expires and the next query fetches the fresh value.

Alternatives like mDNS (for local networks) or DNS-over-HTTPS (for privacy) exist, but the core DNS protocol remains the backbone of internet routing. When NOT to use DNS? For real-time service discovery in dynamic environments (e.g., Kubernetes), DNS's caching and TTL delays make it unsuitable — tools like etcd or Consul are better.

In production, you monitor DNS with tools like dig, nslookup, and dnspython, and you watch for pitfalls like stale records, misconfigured TTLs (too long for planned changes, too short for stability), and DNSSEC validation failures. Understanding DNS caching behavior is the difference between a smooth migration and a weekend war room.

Plain-English First

Imagine every person on Earth had a phone, but instead of saving contacts by name, you had to memorise everyone's 12-digit phone number. That would be awful. So we invented contact books — you type 'Mum' and your phone looks up her number for you. DNS is exactly that contact book for the internet. You type 'google.com' and DNS quietly looks up the actual number (called an IP address) that your computer needs to find Google's servers. You never see the number. You never need to.

Every single time you open a browser and visit a website, something remarkable happens in the background — something so fast and so reliable that most developers take it completely for granted. Your computer doesn't actually know where 'youtube.com' lives. It only speaks in numbers. DNS is the invisible system that bridges the human-readable world of domain names and the machine-readable world of IP addresses, and it handles over a trillion lookups every single day across the planet.

Before DNS existed — and yes, there was a time before it — every computer on the internet had to maintain a single text file called HOSTS.TXT that listed every known hostname and its IP address. A central team at Stanford Research Institute would update it, and sysadmins everywhere had to manually download the latest copy. When the internet had a few hundred machines, this was just about manageable. When it grew to thousands? The system collapsed under its own weight. DNS was invented in 1983 by Paul Mockapetris to solve exactly this scalability crisis — and the solution he came up with is still running the internet today.

By the end of this article you'll be able to explain exactly what happens between typing a URL and a webpage loading, describe the hierarchy of DNS servers and the role each one plays, understand what DNS records like A, CNAME, and MX actually do, and confidently answer DNS questions in a technical interview. No prior networking knowledge needed — we're building this from the ground up.

Why DNS Propagation Is a Lie — And What Actually Happens

The Domain Name System (DNS) is a hierarchical, distributed key-value store that maps human-readable names (like example.com) to machine-oriented records (A, AAAA, MX, CNAME, etc.). The core mechanic: delegation. No single server holds the entire namespace. Instead, authority is delegated from root nameservers to TLD servers to authoritative nameservers for each domain. This design gives fault tolerance and global scale, but it also means that when you change a record, every caching resolver between the client and your authoritative server must expire its cached copy before it sees the new value.

DNS relies on Time-To-Live (TTL) values, set in seconds on each record. A resolver that fetches your MX record with a TTL of 86400 (24 hours) will not query again for that record until the TTL expires — even if you changed the record 5 minutes ago. This is not "propagation"; it's cache expiration. The 48-hour horror stories come from the fact that some legacy resolvers ignore TTLs and use their own minimum (often 48 hours), plus the chain of caches (ISP, corporate proxy, browser) each with independent expiry.

In practice, you must plan for DNS changes to take up to the maximum TTL of the old record plus any non-compliant resolver caches. For critical changes like MX (email routing), a 48-hour window is realistic if you had a 24-hour TTL and there are intermediate caches. Always lower the TTL to 300 (5 minutes) at least 48 hours before a planned change, then change the record, then raise the TTL back after the change is verified.

TTL Is Not a Suggestion
Many operators mistakenly think 'propagation' is a network-wide flush. It's not — each resolver independently respects its cached TTL, and some ignore yours entirely.
Production Insight
A team changed their MX record to a new email provider but kept the old TTL at 86400. For the next 48 hours, roughly half their inbound email still went to the old server (which they had decommissioned), causing silent message loss.
The symptom: intermittent email delivery failures with no error from the sender — messages accepted by the old server then lost when it shut down.
Rule of thumb: always stage DNS changes by lowering TTL to 300 at least 2× the old TTL before the change, then change, then raise TTL after 24 hours of stable traffic.
Key Takeaway
DNS is a distributed cache system, not a broadcast network — 'propagation' is just cache expiry.
Always lower TTL to 300 seconds at least 48 hours before any critical record change.
For MX changes, expect up to 48 hours of mixed routing unless you control every resolver in the path.
DNS Resolution: From Query to Answer THECODEFORGE.IO DNS Resolution: From Query to Answer How a domain name becomes an IP address via delegation Recursive Resolver Starts query on behalf of client Root Server Points to TLD nameserver TLD Nameserver Points to authoritative nameserver Authoritative Nameserver Returns final DNS record (A, MX, etc.) ⚠ Cached records ignore your TTL changes Always lower TTL before making changes, then wait out old TTL THECODEFORGE.IO
thecodeforge.io
DNS Resolution: From Query to Answer
Dns Domain Name System

IP Addresses: The Numbers the Internet Actually Uses

Before DNS makes any sense, you need to understand what it's translating TO. Every device connected to the internet — your laptop, your phone, a web server in a data centre in Virginia — has a unique numerical address called an IP address. Think of it like a home address, except for computers.

An IPv4 address looks like this: 142.250.80.46. Four numbers, each between 0 and 255, separated by dots. That specific address is one of Google's servers. Your computer could reach Google by typing that number directly into your browser — try it. It works. But nobody wants to memorise 142.250.80.46 instead of 'google.com'.

IPv6 addresses look even worse: 2607:f8b0:4004:c09::6a. Humans are terrible at remembering these. Computers are great at them. DNS exists purely to bridge that gap — to let humans use words while computers use numbers. Every domain name on the internet is ultimately just a friendly alias for one or more IP addresses.

dns_basic_lookup.shBASH
1
2
3
4
5
6
7
8
9
10
11
# This command performs a DNS lookup from your terminal.
# It asks: "What IP address does 'google.com' map to?"
# nslookup is available on Windows, macOS, and Linux.

nslookup google.com

# You can also use 'dig' on macOS/Linux for more detail:
dig google.com +short

# Let's also look up a smaller site to see a simpler single-IP result:
nslookup github.com
Output
Server: 192.168.1.1
Address: 192.168.1.1#53
Non-authoritative answer:
Name: google.com
Addresses: 142.250.80.46
142.250.80.110
# dig output (cleaner):
142.250.80.46
# github.com lookup:
Name: github.com
Address: 140.82.114.4
Why Multiple IPs?
Google's nslookup returns several IP addresses — that's load balancing. Google has thousands of servers. DNS can return multiple IPs so your browser can spread traffic across them. If one server goes down, your browser tries the next IP in the list.
Production Insight
Many production outages start with a stale IP that no longer exists.
Always verify the A record your application is resolving — use dig @public-resolver.
If you're migrating servers, lower TTL to 300s 48 hours before, not after.
Key Takeaway
IP addresses are the real destination.
DNS is just a lookup table.
Don't confuse the alias with the address.

The DNS Hierarchy: Four Servers, One Answer

DNS isn't a single giant database. If it were, it would be the most catastrophic single point of failure in human history. Instead, DNS is a beautifully distributed hierarchy of servers spread across the entire planet. When your computer needs to resolve a domain name, it talks to up to four different types of server in sequence until it gets an answer.

Think of it like asking for directions in an unfamiliar city. You first ask your hotel concierge (your local DNS resolver). They might know the answer from memory. If not, they call the city's central tourist office (the Root Nameserver). The tourist office doesn't know the exact restaurant, but they say 'for Italian restaurants, call the Italian Quarter office' (the TLD Nameserver). The Italian Quarter office then gives you the exact address (the Authoritative Nameserver). Four stops, one answer.

  1. RECURSIVE RESOLVER — Your ISP or a public DNS service (like 8.8.8.8 — Google's DNS). This is the server your device asks first. It does all the legwork of talking to the other servers on your behalf.
  2. ROOT NAMESERVER — There are 13 logical root nameservers (labelled A through M) run by organisations like NASA and ICANN. They don't know IP addresses — they just know which TLD nameserver to contact for '.com', '.org', '.uk', etc.
  3. TLD NAMESERVER — TLD stands for Top-Level Domain. The .com TLD nameserver knows which Authoritative Nameserver is responsible for every .com domain ever registered.
  4. AUTHORITATIVE NAMESERVER — This is the final authority. It's managed by whoever owns the domain (or their hosting provider). It holds the actual DNS records and gives the definitive answer.
dns_full_trace.shBASH
1
2
3
4
5
6
7
8
9
10
11
12
13
14
# The 'dig +trace' command shows you the FULL DNS resolution journey.
# Watch it walk through all four server types step by step.
# Run this in your terminal (macOS or Linux). On Windows, use WSL.

dig thecodforge.io +trace

# WHAT YOU'LL SEE:
# Step 1Your resolver contacts a ROOT nameserver (e.g., a.root-servers.net)
# Step 2Root server says: "for .io domains, ask this TLD server"
# Step 3TLD server says: "for thecodeforge.io, ask THIS authoritative server"
# Step 4Authoritative server gives the actual IP address

# You can also see exactly WHICH root servers exist:
dig . NS +short
Output
; <<>> DiG 9.18 <<>> thecodeforge.io +trace
;; QUESTION SECTION:
;thecodeforge.io. IN A
. 518400 IN NS a.root-servers.net. ; Root server responds
. 518400 IN NS b.root-servers.net.
io. 172800 IN NS a0.nic.io. ; .io TLD server responds
io. 172800 IN NS b0.nic.io.
thecodeforge.io. 3600 IN NS ns1.hostingprovider.com. ; Authoritative server
thecodeforge.io. 300 IN A 104.21.45.67 ; FINAL ANSWER — the IP address
Pro Tip: This Journey Takes ~100ms Once. After That? Nanoseconds.
This four-step lookup sounds slow, but it only happens once per domain per device (or until the TTL expires). Every server along the way caches the result. The second time you visit google.com, your computer already knows the answer and skips all four steps entirely.
Production Insight
When your resolver fails to reach a root server, DNS breaks entirely.
Root servers are anycast — thousands of physical servers share the same IP.
Check root server reachability with dig . NS before blaming your domain.
Key Takeaway
Four servers collaborate to resolve one domain.
Caching makes it fast.
If any one fails, the lookup fails — no partial answer.

DNS Records Demystified: A, CNAME, MX, TXT and More

DNS isn't just about mapping domain names to IP addresses. It's a full-blown database of records that controls how your domain behaves across the internet. Each record type answers a different question about your domain.

Think of DNS records like different departments in a company. The A Record department handles 'where is the website?'. The MX Record department handles 'where should emails go?'. The CNAME department handles 'this name is just an alias for another name'. They all live in the same building (your Authoritative Nameserver) but do completely different jobs.

— A RECORD: Maps a domain name directly to an IPv4 address. This is the most fundamental record. 'thecodeforge.io → 104.21.45.67'.

— AAAA RECORD: Same as A, but for IPv6 addresses. The four A's stand for 'quad-A'.

— CNAME RECORD: 'Canonical Name' — an alias. 'www.thecodeforge.io → thecodeforge.io'. Points one name to another name, not directly to an IP.

— MX RECORD: 'Mail Exchange' — tells the internet which server handles email for your domain. Without this, nobody can email you at your domain.

— TXT RECORD: Plain text attached to a domain. Used for email verification (SPF, DKIM), proving domain ownership to Google/GitHub, and security configs.

— NS RECORD: 'Nameserver' — declares which servers are authoritative for this domain. When you buy a domain and point it to Cloudflare, you're updating NS records.

— TTL (Time To Live): Not a record type, but every record has one. It's a number in seconds telling DNS servers how long to cache this record before checking again. TTL of 3600 means 'cache this for 1 hour'.

dns_record_types.shBASH
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
# Let's query specific DNS record types for real domains.
# The syntax is: dig <domain> <RECORD_TYPE>

# 1. A Record — get the IPv4 address for github.com
dig github.com A +short
# Output: 140.82.114.4

# 2. MX Record — find where GitHub's email servers are
dig github.com MX +short
# Output: 1 aspmx.l.google.com.  (GitHub uses Google Workspace for email)

# 3. CNAME Record — www is often a CNAME alias
dig www.github.com CNAME +short
# Output: github.com.  (www.github.com is just an alias for github.com)

# 4. TXT Records — see domain verification and security records
dig github.com TXT +short
# Output includes SPF record: "v=spf1 ip4:192.30.252.0/22 include:_netblocks.google.com ~all"

# 5. NS Records — which nameservers are authoritative for github.com?
dig github.com NS +short
# Output:
# ns-1283.awsdns-32.org.   <-- GitHub uses Amazon Route 53
# ns-1707.awsdns-21.co.uk.
# ns-421.awsdns-52.com.
# ns-520.awsdns-01.net.

# 6. Check TTL — how long is this record cached? (look at the number after IN)
dig github.com A
# In the ANSWER SECTION you'll see something like:
# github.com.  60  IN  A  140.82.114.4
#              ^^--- TTL of 60 seconds — GitHub refreshes DNS frequently
Output
# A Record:
140.82.114.4
# MX Record:
1 aspmx.l.google.com.
# CNAME:
github.com.
# TXT (excerpt):
"v=spf1 ip4:192.30.252.0/22 include:_netblocks.google.com ~all"
# NS Records:
ns-1283.awsdns-32.org.
ns-1707.awsdns-21.co.uk.
ns-421.awsdns-52.com.
ns-520.awsdns-01.net.
# Full A record with TTL:
;; ANSWER SECTION:
github.com. 60 IN A 140.82.114.4
Watch Out: CNAME Records Can't Live at the Root Domain
You cannot create a CNAME for 'thecodeforge.io' itself (called the apex or naked domain) — only for subdomains like 'www.thecodeforge.io'. This is a DNS spec limitation. Cloudflare and Route 53 solve this with proprietary 'CNAME flattening' or 'ALIAS' records. If your root domain is broken but www works, this is often why.
Production Insight
Missing MX records silently kill email delivery for days.
A CNAME at the apex will be rejected by most providers.
Always use an A record for the root domain and CNAME only for subdomains.
Key Takeaway
A record for IP, MX for email, CNAME for aliases.
TXT for verification, NS for authority.
CNAME cannot coexist with other records at the same name.

DNS Caching, TTL and Why Your Changes Seem to Take Forever

You've just launched your new website. You updated the DNS records. You refresh the browser. Still showing the old site. You check the DNS settings — everything looks right. What's happening? The answer is caching, and understanding it will save you hours of frustration.

Every DNS resolver along the lookup chain caches the answer it receives for exactly as long as the TTL (Time To Live) says. If your A record has a TTL of 86400 (24 hours), every resolver that looked up your domain in the past 24 hours won't check again until that timer runs out. Even if you change the IP address at the Authoritative Nameserver, millions of resolvers worldwide are still serving the old answer from their cache.

This is called 'DNS propagation' — and it's not actually propagation in the way people imagine (records pushing outward). It's really just the world's cached copies expiring and fetching fresh answers at different times. Different users around the world will see the new records at different times depending on when their local resolver's cache expires.

The professional move: before making a big DNS change (like moving a site to a new host), lower your TTL to 300 seconds (5 minutes) a day or two in advance. Once traffic is migrated successfully, raise it back to 3600 or 86400 for better performance.

dns_lookup_with_python.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
# Python's built-in 'socket' library can perform basic DNS lookups.
# For richer DNS querying, we'll also use 'dnspython' — install it with:
# pip install dnspython

import socket
import dns.resolver  # from the dnspython library

# ─────────────────────────────────────────────
# METHOD 1: Basic lookup using Python's socket module
# This asks your OS's configured DNS resolver for an IP address
# ─────────────────────────────────────────────
domain_name = "github.com"

# getaddrinfo returns a list of (family, type, proto, canonname, sockaddr) tuples
results = socket.getaddrinfo(domain_name, 80)  # port 80 = HTTP

print(f"IP addresses for {domain_name}:")
for result in results:
    ip_address = result[4][0]  # sockaddr is a tuple; index 0 is the IP
    print(f"  → {ip_address}")

print()

# ─────────────────────────────────────────────
# METHOD 2: Querying specific record types with dnspython
# This gives us full control — like running 'dig' from Python
# ─────────────────────────────────────────────

# Query the A record (IPv4 address)
print(f"A Records for {domain_name}:")
a_records = dns.resolver.resolve(domain_name, 'A')
for record in a_records:
    print(f"  IP Address : {record.address}")
    print(f"  TTL        : {a_records.rrset.ttl} seconds")

print()

# Query the MX record (mail servers)
print(f"MX Records for {domain_name}:")
mx_records = dns.resolver.resolve(domain_name, 'MX')
for record in mx_records:
    # MX records have a 'preference' (priority) — lower number = higher priority
    print(f"  Mail Server : {record.exchange}  (Priority: {record.preference})")

print()

# Query TXT records (often used for domain verification)
print(f"TXT Records for {domain_name}:")
txt_records = dns.resolver.resolve(domain_name, 'TXT')
for record in txt_records:
    # Decode bytes to string for readable output
    decoded_text = b"".join(record.strings).decode('utf-8')
    print(f"  TXT : {decoded_text[:80]}...")  # Truncate long records for display
Output
IP addresses for github.com:
→ 140.82.114.4
A Records for github.com:
IP Address : 140.82.114.4
TTL : 60 seconds
MX Records for github.com:
Mail Server : aspmx.l.google.com. (Priority: 1)
Mail Server : alt1.aspmx.l.google.com. (Priority: 5)
Mail Server : alt2.aspmx.l.google.com. (Priority: 5)
TXT Records for github.com:
TXT : v=spf1 ip4:192.30.252.0/22 include:_netblocks.google.com ~all...
Pro Tip: Use a Public DNS Checker to Bypass Your Local Cache
When debugging DNS changes, your own computer's cache may be stale. Use whatsmydns.net or dnschecker.org to see what DNS servers worldwide are currently returning for your domain — without your local cache interfering. This tells you exactly how far 'propagation' has progressed.
Production Insight
The biggest production DNS mistake? Changing records without reducing TTL first.
Lower TTL to 300s at least 48 hours before a migration.
After migration, verify global propagation with dig @resolver, then raise TTL back.
Key Takeaway
TTL controls how long caches hold your DNS data.
Low TTL = fast changes, more queries.
High TTL = faster performance, slow rollouts.

DNS in Production: Monitoring, Troubleshooting, and Pitfalls

DNS looks simple on paper — a few records, a cache, done. But in a production environment, DNS failures are notoriously hard to diagnose because they manifest as unrelated symptoms: 'the site is down', 'email isn't sending', 'API calls timeout'. The root cause is often a DNS misconfiguration that has been silently wrong for hours or days.

Here's what you need to monitor and how to troubleshoot the most common production DNS scenarios:

Monitoring - Track DNS query latency from your application servers. Spikes often indicate resolver overload or network issues. - Set up synthetic checks that resolve your own domains from multiple geographic locations. If any region fails, you'll know before users do. - Watch for NXDOMAIN (non-existent domain) responses. A sudden increase often means a client typo or a misconfigured CNAME.

Troubleshooting flow 1. Start with 'dig @your-resolver yourdomain.com A'. Check the ANSWER SECTION. 2. If you get a response but it's wrong, check the authoritative NS: 'dig @your-ns yourdomain.com A'. 3. If authoritative is correct but public resolvers show old data, you're waiting out TTL. 4. If no response at all, check firewall rules (UDP/53 and TCP/53) and that your resolver IP is reachable.

Common pitfalls in production - Using the same TTL for all records without planning for changes. - Putting a CNAME at the root domain — many providers now throw an error, but some silently break. - Not having a fallback resolver. If your primary resolver (like your ISP's) goes down, your app goes down. Configure a secondary. - Forgetting that DNSSEC can break resolution if misconfigured. Always test with 'delv' before enabling.

dns_troubleshooting_script.shBASH
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
# Quick DNS health check script
# Run this to verify your domain's records from multiple angles

DOMAIN="yourdomain.com"
EXPECTED_IP="203.0.113.10"

# 1. Query a public resolver to see what users see
echo "=== Public resolver (Google 8.8.8.8) ==="
PUBLIC_IP=$(dig @8.8.8.8 $DOMAIN A +short)
echo "Resolved IP: $PUBLIC_IP"

if [ "$PUBLIC_IP" != "$EXPECTED_IP" ]; then
  echo "WARNING: Public resolver does not match expected IP!"
fi

# 2. Query the authoritative nameserver directly
echo ""
echo "=== Authoritative nameserver ==="
NS=$(dig $DOMAIN NS +short | head -1)
dig @$NS $DOMAIN A +short

# 3. Check MX records for email
echo ""
echo "=== MX Records ==="
dig $DOMAIN MX +short

# 4. Check DNS response time
echo ""
echo "=== Response time ==="
time dig @8.8.8.8 $DOMAIN A +short > /dev/null
Output
=== Public resolver (Google 8.8.8.8) ===
Resolved IP: 203.0.113.10
=== Authoritative nameserver ===
203.0.113.10
=== MX Records ===
10 mail.yourdomain.com.
=== Response time ===
real 0m0.045s
DNS Is Like a Distributed Directory Service
  • DNS is someone else's infrastructure — you only control your authoritative NS and your own application's resolver config.
  • Caching is a feature, not a bug. It makes the internet fast, but it also delays your changes.
  • Every DNS query is a potential failure point: resolver down, network partition, authoritative server offline.
  • Treat DNS as critical infrastructure. Monitor it, test it, plan changes around TTLs.
Production Insight
A misconfigured CNAME can silently redirect traffic to a dead host.
Use dig +trace to verify the full chain after any NS record change.
Always have a secondary resolver configured — don't rely on a single one.
Key Takeaway
Production DNS needs monitoring and testing.
TTL management is essential for zero-downtime migrations.
When DNS breaks, it looks like everything is down — troubleshoot methodically.

Zone Transfers: The Silent Backdoor Nobody Locks

You think DNS is just lookups. It's also replication. Zone transfers are how secondary DNS servers get the full record set from a primary. AXFR is the old protocol. IXFR sends only changes. Both are terrifying if left open. Attackers don't brute force DNS. They ask nicely. A misconfigured primary server will happily dump every A, MX, TXT record for your entire domain. That's a network diagram handed to a stranger. Protect zone transfers. Restrict them by IP. Use TSIG signatures for authentication. No exceptions. If a junior on your team says "it's internal only," audit it yourself. I've seen internal breach spread to prod in 12 minutes because someone opened AXFR to 0.0.0.0/0. DNS replication is powerful. Lock it down.

check-axfr.shBASH
1
2
3
4
5
6
7
8
9
10
11
12
13
#!/bin/bash
# io.thecodeforge - quick zone transfer audit
DOMAIN="thecodeforge.io"
# Attempt AXFR from multiple DNS servers
for ns in $(dig NS $DOMAIN +short); do
    echo "Testing $ns for open zone transfer..."
    dig AXFR $DOMAIN @$ns +short 2>/dev/null
    if [ $? -eq 0 ]; then
        echo "[!] VULNERABLE: $ns allows zone transfers"
    fi
done
# Production check: BIND ACL example
# allow-transfer { 192.168.1.2; 10.0.0.3; };
Output
Testing ns1.thecodeforge.io for open zone transfer...
[!] VULNERABLE: ns1.thecodeforge.io allows zone transfers
Production Trap:
Never allow inbound AXFR from the public internet. Use TSIG or restrict to authorized secondaries only. One open transfer = your entire DNS map exposed.
Key Takeaway
Zone transfers leak your entire network topology if unauthenticated. Restrict by IP and sign with TSIG.

DNS Resolution: Not Magic, Just Delegation

When you type a domain, nothing magical happens. It's a chain of delegation. Your browser asks a resolver (usually your ISP or a public one like 8.8.8.8). The resolver doesn't know the answer. It asks the root servers. The root doesn't know either. It points to the TLD server for .com. The TLD server points to your authoritative nameserver. That finally returns the IP. Each server only knows who to ask next. That's the whole trick. No single server knows everything. This layering prevents bottlenecks and global failure. But it also means every hop can fail or lie. Resolvers cache aggressively. Authoritative servers can serve stale data. The hierarchy exists for scaling, not for performance. Understand it. Debug it. Use dig +trace to see each step live. When DNS breaks, it's almost always a broken delegation or a poisoned cache. Not a cosmic ray.

trace-dns.shBASH
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
#!/bin/bash
# io.thecodeforge - trace each delegation step manually
domain="thecodeforge.io"
echo "Tracing DNS delegation for $domain..."
# Step 1: Root hint (a.root-servers.net)
dig @198.41.0.4 $domain A +short
echo "-> Root gave us TLD servers"
# Step 2: Ask TLD for .io (a.nic.io)
dig @a.nic.io $domain A +short
echo "-> TLD gave us Authoritative servers"
# Step 3: Ask authoritative (ns1.thecodeforge.io)
AUTHORITATIVE=$(dig NS $domain +short | head -1)
dig @$AUTHORITATIVE $domain A +short
echo "-> Authoritative returned final IP"
# Production debugging:
# dig +trace $domain  # one-liner for the same effect
Output
Tracing DNS delegation for thecodeforge.io...
-> Root gave us TLD servers
-> TLD gave us Authoritative servers
198.51.100.42
-> Authoritative returned final IP
Debug Fast:
Use dig +trace example.com to see every delegation step in real time. Pinpoints broken delegations instantly.
Key Takeaway
DNS resolution is delegation, not knowledge. Each server points down the hierarchy.
● Production incidentPOST-MORTEMseverity: high

The 48-Hour Email Blackout From a Single MX Record Change

Symptom
After updating MX records at the authoritative nameserver, about half of incoming email continued arriving at the old mail server, while the other half went to the new one. Users reported missing messages and bouncebacks.
Assumption
The team assumed DNS changes take effect 'within a few hours' and that restarting the mail servers would force the resolver to re-query.
Root cause
The MX record had a TTL of 86400 seconds (24 hours). Many ISPs and corporate resolvers cache DNS records for the full TTL. Even after the change, those resolvers continued delivering email to the old server IP for up to 48 hours. No amount of server restarts could override a cached TTL.
Fix
Immediately set the old MX record with a TTL of 300 seconds (5 minutes) at the old provider — but since the domain was already pointed to new NS, they had to wait out the original TTL. The lesson: always reduce TTL to 300 at least 48 hours before any record change.
Production debug guideWhen the network isn't resolving as expected, follow these steps to isolate the problem.4 entries
Symptom · 01
Website loads in some regions but not others
Fix
Check DNS propagation with a global checker (whatsmydns.net). If still mixed, wait out the TTL. Use dig @8.8.8.8 to bypass local caches.
Symptom · 02
dig returns an IP, but curl fails
Fix
Compare the IP from dig to the expected server IP. If mismatch, check for stale cached records at your local resolver. Flush your local DNS cache.
Symptom · 03
Email bounces with 'domain not found'
Fix
Query the MX record: dig MX yourdomain.com. Ensure the mailbox hostname resolves to an A record. Check SPF/TXT records for email authentication.
Symptom · 04
nslookup times out
Fix
Verify the configured nameservers are reachable (ping or nc -zv to port 53). Check firewall rules blocking UDP/53. Try a public resolver like 8.8.8.8.
★ Quick DNS Debug Cheat SheetWhen DNS breaks in production, these commands cut through the noise. Run them in order.
Propagation not complete
Immediate action
Query a public resolver directly to bypass local cache
Commands
dig @8.8.8.8 yourdomain.com A
dig @1.1.1.1 yourdomain.com A
Fix now
Compare both outputs. If they differ, wait for caches to expire — you cannot force it.
Wrong A record returned+
Immediate action
Check what your authoritative NS says
Commands
dig @your-ns-server.com yourdomain.com A
dig +trace yourdomain.com
Fix now
If the authoritative NS has the correct record but external resolvers return old IP, you're waiting out old TTL. If authoritative NS is wrong, update your DNS provider.
Email delivery failing+
Immediate action
Verify MX and SPF records resolve
Commands
dig yourdomain.com MX +short
dig yourdomain.com TXT +short | grep spf
Fix now
Ensure MX hostname has an A record. If missing, add one. If SPF is missing, add a TXT record with v=spf1 include:... ~all.
DNSSEC validation failure+
Immediate action
Check if the domain has DS records at the parent zone
Commands
dig DNSSEC yourdomain.com +dnssec
delv yourdomain.com
Fix now
If DNSSEC is enabled but DS records are missing or wrong, disable DNSSEC for now and re-add the correct DS records at the registrar.
DNS Record Types at a Glance
DNS Record TypeWhat It MapsReal-World Use CaseCan Be at Root Domain?
A RecordDomain → IPv4 addressPoint thecodeforge.io to your web server's IPYes
AAAA RecordDomain → IPv6 addressSame as A record but for IPv6-enabled serversYes
CNAME RecordDomain → Another domain namewww.thecodeforge.io → thecodeforge.ioNo — subdomains only
MX RecordDomain → Mail server hostnameRoute emails sent to @thecodeforge.io to Google WorkspaceYes
TXT RecordDomain → Arbitrary text stringSPF email security, proving domain ownership to Google/GitHubYes
NS RecordDomain → Authoritative nameserverTell the internet which DNS servers manage your domainYes
TTL (on any record)Cache lifetime in seconds300s before migrations, 86400s for stable production recordsN/A

Key takeaways

1
DNS is a distributed, hierarchical database
not a single server. Four server types collaborate on every lookup: Recursive Resolver → Root Nameserver → TLD Nameserver → Authoritative Nameserver.
2
TTL controls caching. A low TTL (300s) means faster propagation of changes but more DNS queries. A high TTL (86400s) means faster performance for users but slow DNS change rollouts. Tune it intentionally, especially before migrations.
3
CNAME records create aliases between domain names, but they can only be used on subdomains
never at the root/apex domain (e.g., you can CNAME 'www.site.com' but not 'site.com' itself).
4
Your computer has its own DNS cache, your router has one, your ISP's resolver has one, and every server along the chain has one. When debugging DNS issues, always test with a specific public resolver (dig @8.8.8.8) to bypass all local caches and get the true current answer.
5
Production DNS failures often look like app failures. Monitor DNS latency, set up synthetic checks from multiple regions, and always have a secondary resolver configured.

Common mistakes to avoid

3 patterns
×

Changing DNS records and expecting instant results

Symptom
You update your A record but the old site still loads for hours. Users in different regions see different versions of your site.
Fix
This is TTL caching at work. Plan ahead — reduce TTL to 300 seconds at least 48 hours before a migration. After the migration, verify with 'dig @8.8.8.8 yourdomain.com A' to bypass your local cache and query Google's resolver directly.
×

Creating a CNAME at the root/apex domain

Symptom
Your DNS provider throws an error, or email stops working entirely because the DNS spec forbids CNAME at the zone apex (e.g., thecodeforge.io) since CNAME must be the only record for that name, but root domain also needs NS and SOA records.
Fix
Use an ALIAS or ANAME record if your DNS provider supports it (Cloudflare, Route 53), or simply use an A record at the root domain and a CNAME only for 'www'.
×

Forgetting that DNS is public and fully inspectable

Symptom
Developers accidentally expose internal infrastructure details (server IPs, internal subdomain names like 'staging.company.com' or 'vpn.company.com') in DNS records that anyone on the internet can query.
Fix
Audit your DNS records regularly. Never put sensitive hostnames in public DNS. Use split-horizon DNS (different DNS records for internal vs. external networks) if you need internal names to resolve only within your organisation.
INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01SENIOR
Walk me through exactly what happens — at the DNS level — from the momen...
Q02JUNIOR
What is the difference between an A record and a CNAME record, and when ...
Q03SENIOR
What is DNS TTL and why does it matter in a production deployment? If yo...
Q01 of 03SENIOR

Walk me through exactly what happens — at the DNS level — from the moment I type 'google.com' in my browser and hit Enter to when the page starts loading.

ANSWER
Your browser first checks its local DNS cache. If not found, it asks the operating system's cache, then the configured recursive resolver (usually your ISP's or 8.8.8.8). The resolver, if it doesn't have it cached, starts a recursive query: it contacts a root nameserver (e.g., a.root-servers.net) which returns a referral to the .com TLD nameserver. The resolver then queries the .com TLD, which returns the authoritative nameserver for google.com. Finally, the resolver queries that authoritative server for the A or AAAA record, gets the IP address (e.g., 142.250.80.46), caches it according to TTL, and returns it to your browser. Your browser then opens a TCP connection to that IP, performs TLS handshake if HTTPS, and begins fetching the page. Total time for uncached lookup: 20-120ms. Subsequent visits: microseconds due to caching.
FAQ · 5 QUESTIONS

Frequently Asked Questions

01
What is DNS in simple terms?
02
How long does DNS propagation take?
03
What is the difference between a DNS resolver and an authoritative nameserver?
04
Can I use a CNAME for my root domain?
05
What happens if my authoritative nameserver goes down?
N
Naren Founder & Principal Engineer

20+ years shipping production systems from the metal up. Written from production experience, not tutorials.

Follow
Verified
production tested
May 24, 2026
last updated
1,554
articles · all by Naren
🔥

That's Computer Networks. Mark it forged?

9 min read · try the examples if you haven't

Previous
HTTP/2 and HTTP/3
7 / 22 · Computer Networks
Next
IP Addressing and Subnetting