Beginner 10 min · March 06, 2026

DNS — Domain Name System

DNS Lookups — Why Your MX Change Took 48 Hours to Propagate

Q: What is DNS in simple terms?

DNS (Domain Name System) is the internet's contact book. It translates human-readable domain names like 'google.com' into numerical IP addresses like '142.250.80.46' that computers use to find each other. Without DNS, you'd have to memorise a unique number for every website you want to visit.

Q: How long does DNS propagation take?

DNS propagation isn't a single event — it's the process of the world's cached DNS records expiring and being refreshed. It can take anywhere from a few minutes (if the previous TTL was low) to 48 hours (if the previous TTL was set to 86400 seconds). You can speed this up significantly by lowering your TTL to 300 seconds a day or two before making DNS changes.

Q: What is the difference between a DNS resolver and an authoritative nameserver?

A recursive resolver (like Google's 8.8.8.8 or your ISP's DNS server) is the middleman — it receives your query and hunts down the answer by talking to other servers on your behalf. An authoritative nameserver is the final authority — it's the server that actually holds the DNS records for a specific domain and gives the definitive, canonical answer. When you buy a domain and set up DNS records in Cloudflare or Route 53, those are your authoritative nameservers.

Q: Can I use a CNAME for my root domain?

No, the DNS specification forbids CNAME records at the zone apex (root domain) because a CNAME must be the only record for that name, but the root domain always needs NS and SOA records. Some DNS providers offer workarounds like 'CNAME flattening' or 'ALIAS' records that behave like CNAMEs but can coexist with other records.

Q: What happens if my authoritative nameserver goes down?

If the authoritative nameserver for your domain becomes unreachable, any resolver that doesn't have your records cached will be unable to resolve your domain — your site will appear down, email will be undeliverable, etc. That's why production domains typically have multiple nameservers (often 2-4) configured via NS records, spread across different networks or providers.

Half your email hit the old server for 48 hours after an MX update.

Naren Founder & Principal Engineer

20+ years shipping production systems from the metal up. Written from production experience, not tutorials.

✓ Production

production tested

July 18, 2026

last updated

2,466

articles · all by Naren

Before you start⏱ 20 min

✓Basic programming fundamentals
✓A computer with internet access
✓Willingness to follow along with examples

● Production Incident 🔎 Debug Guide ⚙ Triage Commands

⚡Quick Answer

DNS translates domain names to IP addresses using a distributed hierarchy
Recursive resolver queries root, TLD, then authoritative nameservers
A, CNAME, MX, TXT records control website, email, and verification
Caching via TTL makes subsequent lookups take microseconds, not milliseconds
Production trap: changing DNS without reducing TTL first causes hours of stale cache
Biggest mistake: CNAME at the root domain — it's forbidden by the RFC

✦ Definition~90s read

What is DNS?

The Domain Name System (DNS) is the internet's phonebook — a distributed, hierarchical database that maps human-readable domain names like example.com to machine-routable IP addresses like 93.184.216.34. Without DNS, you'd be typing raw IP addresses into your browser, which is impractical at scale.

★

Imagine every person on Earth had a phone, but instead of saving contacts by name, you had to memorise everyone's 12-digit phone number.

DNS solves the fundamental problem of translating names to numbers across a global network, using a decentralized architecture that avoids a single point of failure. It's not a single server but a multi-layered system of resolvers, root servers, TLD servers, and authoritative nameservers, each caching results to reduce latency and load.

DNS is critical for nearly every internet service — email routing (MX records), website hosting (A/AAAA records), domain verification (TXT records), and service discovery (SRV records). When you change an MX record for email delivery, you're updating a piece of data on your authoritative nameserver, but the old value may live for hours or days in caches across ISPs, corporate networks, and public resolvers like Google (8.8.8.8) or Cloudflare (1.1.1.1).

This caching is governed by Time-To-Live (TTL) values, which you set per record. The infamous "48-hour propagation" is a myth — DNS doesn't propagate; it expires. Changes take effect only after every cached copy's TTL expires and the next query fetches the fresh value.

Alternatives like mDNS (for local networks) or DNS-over-HTTPS (for privacy) exist, but the core DNS protocol remains the backbone of internet routing. When NOT to use DNS? For real-time service discovery in dynamic environments (e.g., Kubernetes), DNS's caching and TTL delays make it unsuitable — tools like etcd or Consul are better.

In production, you monitor DNS with tools like dig, nslookup, and dnspython, and you watch for pitfalls like stale records, misconfigured TTLs (too long for planned changes, too short for stability), and DNSSEC validation failures. Understanding DNS caching behavior is the difference between a smooth migration and a weekend war room.

Plain-English First

Imagine every person on Earth had a phone, but instead of saving contacts by name, you had to memorise everyone's 12-digit phone number. That would be awful. So we invented contact books — you type 'Mum' and your phone looks up her number for you. DNS is exactly that contact book for the internet. You type 'google.com' and DNS quietly looks up the actual number (called an IP address) that your computer needs to find Google's servers. You never see the number. You never need to.

Every single time you open a browser and visit a website, something remarkable happens in the background — something so fast and so reliable that most developers take it completely for granted. Your computer doesn't actually know where 'youtube.com' lives. It only speaks in numbers. DNS is the invisible system that bridges the human-readable world of domain names and the machine-readable world of IP addresses, and it handles over a trillion lookups every single day across the planet.

Before DNS existed — and yes, there was a time before it — every computer on the internet had to maintain a single text file called HOSTS.TXT that listed every known hostname and its IP address. A central team at Stanford Research Institute would update it, and sysadmins everywhere had to manually download the latest copy. When the internet had a few hundred machines, this was just about manageable. When it grew to thousands? The system collapsed under its own weight. DNS was invented in 1983 by Paul Mockapetris to solve exactly this scalability crisis — and the solution he came up with is still running the internet today.

By the end of this article you'll be able to explain exactly what happens between typing a URL and a webpage loading, describe the hierarchy of DNS servers and the role each one plays, understand what DNS records like A, CNAME, and MX actually do, and confidently answer DNS questions in a technical interview. No prior networking knowledge needed — we're building this from the ground up.

Why DNS Propagation Is a Lie — And What Actually Happens

The Domain Name System (DNS) is a hierarchical, distributed key-value store that maps human-readable names (like example.com) to machine-oriented records (A, AAAA, MX, CNAME, etc.). The core mechanic: delegation. No single server holds the entire namespace. Instead, authority is delegated from root nameservers to TLD servers to authoritative nameservers for each domain. This design gives fault tolerance and global scale, but it also means that when you change a record, every caching resolver between the client and your authoritative server must expire its cached copy before it sees the new value.

DNS relies on Time-To-Live (TTL) values, set in seconds on each record. A resolver that fetches your MX record with a TTL of 86400 (24 hours) will not query again for that record until the TTL expires — even if you changed the record 5 minutes ago. This is not "propagation"; it's cache expiration. The 48-hour horror stories come from the fact that some legacy resolvers ignore TTLs and use their own minimum (often 48 hours), plus the chain of caches (ISP, corporate proxy, browser) each with independent expiry.

In practice, you must plan for DNS changes to take up to the maximum TTL of the old record plus any non-compliant resolver caches. For critical changes like MX (email routing), a 48-hour window is realistic if you had a 24-hour TTL and there are intermediate caches. Always lower the TTL to 300 (5 minutes) at least 48 hours before a planned change, then change the record, then raise the TTL back after the change is verified.

⚠ TTL Is Not a Suggestion

Many operators mistakenly think 'propagation' is a network-wide flush. It's not — each resolver independently respects its cached TTL, and some ignore yours entirely.

📊 Production Insight

A team changed their MX record to a new email provider but kept the old TTL at 86400. For the next 48 hours, roughly half their inbound email still went to the old server (which they had decommissioned), causing silent message loss.

The symptom: intermittent email delivery failures with no error from the sender — messages accepted by the old server then lost when it shut down.

Rule of thumb: always stage DNS changes by lowering TTL to 300 at least 2× the old TTL before the change, then change, then raise TTL after 24 hours of stable traffic.

🎯 Key Takeaway

DNS is a distributed cache system, not a broadcast network — 'propagation' is just cache expiry.

Always lower TTL to 300 seconds at least 48 hours before any critical record change.

For MX changes, expect up to 48 hours of mixed routing unless you control every resolver in the path.

thecodeforge.io

Dns Domain Name System

IP Addresses: The Numbers the Internet Actually Uses

Before DNS makes any sense, you need to understand what it's translating TO. Every device connected to the internet — your laptop, your phone, a web server in a data centre in Virginia — has a unique numerical address called an IP address. Think of it like a home address, except for computers.

An IPv4 address looks like this: 142.250.80.46. Four numbers, each between 0 and 255, separated by dots. That specific address is one of Google's servers. Your computer could reach Google by typing that number directly into your browser — try it. It works. But nobody wants to memorise 142.250.80.46 instead of 'google.com'.

IPv6 addresses look even worse: 2607:f8b0:4004:c09::6a. Humans are terrible at remembering these. Computers are great at them. DNS exists purely to bridge that gap — to let humans use words while computers use numbers. Every domain name on the internet is ultimately just a friendly alias for one or more IP addresses.

dns_basic_lookup.shBASH

# This command performs a DNS lookup from your terminal.
# It asks: "What IP address does 'google.com' map to?"
# nslookup is available on Windows, macOS, and Linux.

nslookup google.com

# You can also use 'dig' on macOS/Linux for more detail:
dig google.com +short

# Let's also look up a smaller site to see a simpler single-IP result:
nslookup github.com

Output

Server: 192.168.1.1

Address: 192.168.1.1#53

Non-authoritative answer:

Name: google.com

Addresses: 142.250.80.46

142.250.80.110

# dig output (cleaner):

142.250.80.46

# github.com lookup:

Name: github.com

Address: 140.82.114.4

🔥Why Multiple IPs?

Google's nslookup returns several IP addresses — that's load balancing. Google has thousands of servers. DNS can return multiple IPs so your browser can spread traffic across them. If one server goes down, your browser tries the next IP in the list.

📊 Production Insight

Many production outages start with a stale IP that no longer exists.

Always verify the A record your application is resolving — use dig @public-resolver.

If you're migrating servers, lower TTL to 300s 48 hours before, not after.

🎯 Key Takeaway

IP addresses are the real destination.

DNS is just a lookup table.

Don't confuse the alias with the address.

The DNS Hierarchy: Four Servers, One Answer

DNS isn't a single giant database. If it were, it would be the most catastrophic single point of failure in human history. Instead, DNS is a beautifully distributed hierarchy of servers spread across the entire planet. When your computer needs to resolve a domain name, it talks to up to four different types of server in sequence until it gets an answer.

Think of it like asking for directions in an unfamiliar city. You first ask your hotel concierge (your local DNS resolver). They might know the answer from memory. If not, they call the city's central tourist office (the Root Nameserver). The tourist office doesn't know the exact restaurant, but they say 'for Italian restaurants, call the Italian Quarter office' (the TLD Nameserver). The Italian Quarter office then gives you the exact address (the Authoritative Nameserver). Four stops, one answer.

Here are the four players:

RECURSIVE RESOLVER — Your ISP or a public DNS service (like 8.8.8.8 — Google's DNS). This is the server your device asks first. It does all the legwork of talking to the other servers on your behalf.
ROOT NAMESERVER — There are 13 logical root nameservers (labelled A through M) run by organisations like NASA and ICANN. They don't know IP addresses — they just know which TLD nameserver to contact for '.com', '.org', '.uk', etc.
TLD NAMESERVER — TLD stands for Top-Level Domain. The .com TLD nameserver knows which Authoritative Nameserver is responsible for every .com domain ever registered.
AUTHORITATIVE NAMESERVER — This is the final authority. It's managed by whoever owns the domain (or their hosting provider). It holds the actual DNS records and gives the definitive answer.

dns_full_trace.shBASH

# The 'dig +trace' command shows you the FULL DNS resolution journey.
# Watch it walk through all four server types step by step.
# Run this in your terminal (macOS or Linux). On Windows, use WSL.

dig thecodforge.io +trace

# WHAT YOU'LL SEE:
# Step 1 — Your resolver contacts a ROOT nameserver (e.g., a.root-servers.net)
# Step 2 — Root server says: "for .io domains, ask this TLD server"
# Step 3 — TLD server says: "for thecodeforge.io, ask THIS authoritative server"
# Step 4 — Authoritative server gives the actual IP address

# You can also see exactly WHICH root servers exist:
dig . NS +short

Output

; <<>> DiG 9.18 <<>> thecodeforge.io +trace

;; QUESTION SECTION:

;thecodeforge.io. IN A

. 518400 IN NS a.root-servers.net. ; Root server responds

. 518400 IN NS b.root-servers.net.

io. 172800 IN NS a0.nic.io. ; .io TLD server responds

io. 172800 IN NS b0.nic.io.

thecodeforge.io. 3600 IN NS ns1.hostingprovider.com. ; Authoritative server

thecodeforge.io. 300 IN A 104.21.45.67 ; FINAL ANSWER — the IP address

💡Pro Tip: This Journey Takes ~100ms Once. After That? Nanoseconds.

This four-step lookup sounds slow, but it only happens once per domain per device (or until the TTL expires). Every server along the way caches the result. The second time you visit google.com, your computer already knows the answer and skips all four steps entirely.

📊 Production Insight

When your resolver fails to reach a root server, DNS breaks entirely.

Root servers are anycast — thousands of physical servers share the same IP.

Check root server reachability with dig . NS before blaming your domain.

🎯 Key Takeaway

Four servers collaborate to resolve one domain.

Caching makes it fast.

If any one fails, the lookup fails — no partial answer.

thecodeforge.io

Dns Domain Name System

DNS Records Demystified: A, CNAME, MX, TXT and More

DNS isn't just about mapping domain names to IP addresses. It's a full-blown database of records that controls how your domain behaves across the internet. Each record type answers a different question about your domain.

Think of DNS records like different departments in a company. The A Record department handles 'where is the website?'. The MX Record department handles 'where should emails go?'. The CNAME department handles 'this name is just an alias for another name'. They all live in the same building (your Authoritative Nameserver) but do completely different jobs.

Here are the records every developer needs to know:

— A RECORD: Maps a domain name directly to an IPv4 address. This is the most fundamental record. 'thecodeforge.io → 104.21.45.67'.

— AAAA RECORD: Same as A, but for IPv6 addresses. The four A's stand for 'quad-A'.

— CNAME RECORD: 'Canonical Name' — an alias. 'www.thecodeforge.io → thecodeforge.io'. Points one name to another name, not directly to an IP.

— MX RECORD: 'Mail Exchange' — tells the internet which server handles email for your domain. Without this, nobody can email you at your domain.

— TXT RECORD: Plain text attached to a domain. Used for email verification (SPF, DKIM), proving domain ownership to Google/GitHub, and security configs.

— NS RECORD: 'Nameserver' — declares which servers are authoritative for this domain. When you buy a domain and point it to Cloudflare, you're updating NS records.

— TTL (Time To Live): Not a record type, but every record has one. It's a number in seconds telling DNS servers how long to cache this record before checking again. TTL of 3600 means 'cache this for 1 hour'.

dns_record_types.shBASH

# Let's query specific DNS record types for real domains.
# The syntax is: dig <domain> <RECORD_TYPE>

# 1. A Record — get the IPv4 address for github.com
dig github.com A +short
# Output: 140.82.114.4

# 2. MX Record — find where GitHub's email servers are
dig github.com MX +short
# Output: 1 aspmx.l.google.com.  (GitHub uses Google Workspace for email)

# 3. CNAME Record — www is often a CNAME alias
dig www.github.com CNAME +short
# Output: github.com.  (www.github.com is just an alias for github.com)

# 4. TXT Records — see domain verification and security records
dig github.com TXT +short
# Output includes SPF record: "v=spf1 ip4:192.30.252.0/22 include:_netblocks.google.com ~all"

# 5. NS Records — which nameservers are authoritative for github.com?
dig github.com NS +short
# Output:
# ns-1283.awsdns-32.org.   <-- GitHub uses Amazon Route 53
# ns-1707.awsdns-21.co.uk.
# ns-421.awsdns-52.com.
# ns-520.awsdns-01.net.

# 6. Check TTL — how long is this record cached? (look at the number after IN)
dig github.com A
# In the ANSWER SECTION you'll see something like:
# github.com.  60  IN  A  140.82.114.4
#              ^^--- TTL of 60 seconds — GitHub refreshes DNS frequently

Output

# A Record:

140.82.114.4

# MX Record:

1 aspmx.l.google.com.

# CNAME:

github.com.

# TXT (excerpt):

"v=spf1 ip4:192.30.252.0/22 include:_netblocks.google.com ~all"

# NS Records:

ns-1283.awsdns-32.org.

ns-1707.awsdns-21.co.uk.

ns-421.awsdns-52.com.

ns-520.awsdns-01.net.

# Full A record with TTL:

;; ANSWER SECTION:

github.com. 60 IN A 140.82.114.4

⚠ Watch Out: CNAME Records Can't Live at the Root Domain

You cannot create a CNAME for 'thecodeforge.io' itself (called the apex or naked domain) — only for subdomains like 'www.thecodeforge.io'. This is a DNS spec limitation. Cloudflare and Route 53 solve this with proprietary 'CNAME flattening' or 'ALIAS' records. If your root domain is broken but www works, this is often why.

📊 Production Insight

Missing MX records silently kill email delivery for days.

A CNAME at the apex will be rejected by most providers.

Always use an A record for the root domain and CNAME only for subdomains.

🎯 Key Takeaway

A record for IP, MX for email, CNAME for aliases.

TXT for verification, NS for authority.

CNAME cannot coexist with other records at the same name.

DNS Caching, TTL and Why Your Changes Seem to Take Forever

You've just launched your new website. You updated the DNS records. You refresh the browser. Still showing the old site. You check the DNS settings — everything looks right. What's happening? The answer is caching, and understanding it will save you hours of frustration.

Every DNS resolver along the lookup chain caches the answer it receives for exactly as long as the TTL (Time To Live) says. If your A record has a TTL of 86400 (24 hours), every resolver that looked up your domain in the past 24 hours won't check again until that timer runs out. Even if you change the IP address at the Authoritative Nameserver, millions of resolvers worldwide are still serving the old answer from their cache.

This is called 'DNS propagation' — and it's not actually propagation in the way people imagine (records pushing outward). It's really just the world's cached copies expiring and fetching fresh answers at different times. Different users around the world will see the new records at different times depending on when their local resolver's cache expires.

The professional move: before making a big DNS change (like moving a site to a new host), lower your TTL to 300 seconds (5 minutes) a day or two in advance. Once traffic is migrated successfully, raise it back to 3600 or 86400 for better performance.

dns_lookup_with_python.pyPYTHON

# Python's built-in 'socket' library can perform basic DNS lookups.
# For richer DNS querying, we'll also use 'dnspython' — install it with:
# pip install dnspython

import socket
import dns.resolver  # from the dnspython library

# ─────────────────────────────────────────────
# METHOD 1: Basic lookup using Python's socket module
# This asks your OS's configured DNS resolver for an IP address
# ─────────────────────────────────────────────
domain_name = "github.com"

# getaddrinfo returns a list of (family, type, proto, canonname, sockaddr) tuples
results = socket.getaddrinfo(domain_name, 80)  # port 80 = HTTP

print(f"IP addresses for {domain_name}:")
for result in results:
    ip_address = result[4][0]  # sockaddr is a tuple; index 0 is the IP
    print(f"  → {ip_address}")

print()

# ─────────────────────────────────────────────
# METHOD 2: Querying specific record types with dnspython
# This gives us full control — like running 'dig' from Python
# ─────────────────────────────────────────────

# Query the A record (IPv4 address)
print(f"A Records for {domain_name}:")
a_records = dns.resolver.resolve(domain_name, 'A')
for record in a_records:
    print(f"  IP Address : {record.address}")
    print(f"  TTL        : {a_records.rrset.ttl} seconds")

print()

# Query the MX record (mail servers)
print(f"MX Records for {domain_name}:")
mx_records = dns.resolver.resolve(domain_name, 'MX')
for record in mx_records:
    # MX records have a 'preference' (priority) — lower number = higher priority
    print(f"  Mail Server : {record.exchange}  (Priority: {record.preference})")

print()

# Query TXT records (often used for domain verification)
print(f"TXT Records for {domain_name}:")
txt_records = dns.resolver.resolve(domain_name, 'TXT')
for record in txt_records:
    # Decode bytes to string for readable output
    decoded_text = b"".join(record.strings).decode('utf-8')
    print(f"  TXT : {decoded_text[:80]}...")  # Truncate long records for display

Output

IP addresses for github.com:

→ 140.82.114.4

A Records for github.com:

IP Address : 140.82.114.4

TTL : 60 seconds

MX Records for github.com:

Mail Server : aspmx.l.google.com. (Priority: 1)

Mail Server : alt1.aspmx.l.google.com. (Priority: 5)

Mail Server : alt2.aspmx.l.google.com. (Priority: 5)

TXT Records for github.com:

TXT : v=spf1 ip4:192.30.252.0/22 include:_netblocks.google.com ~all...

💡Pro Tip: Use a Public DNS Checker to Bypass Your Local Cache

When debugging DNS changes, your own computer's cache may be stale. Use whatsmydns.net or dnschecker.org to see what DNS servers worldwide are currently returning for your domain — without your local cache interfering. This tells you exactly how far 'propagation' has progressed.

📊 Production Insight

The biggest production DNS mistake? Changing records without reducing TTL first.

Lower TTL to 300s at least 48 hours before a migration.

After migration, verify global propagation with dig @resolver, then raise TTL back.

🎯 Key Takeaway

TTL controls how long caches hold your DNS data.

Low TTL = fast changes, more queries.

High TTL = faster performance, slow rollouts.

DNS in Production: Monitoring, Troubleshooting, and Pitfalls

DNS looks simple on paper — a few records, a cache, done. But in a production environment, DNS failures are notoriously hard to diagnose because they manifest as unrelated symptoms: 'the site is down', 'email isn't sending', 'API calls timeout'. The root cause is often a DNS misconfiguration that has been silently wrong for hours or days.

Here's what you need to monitor and how to troubleshoot the most common production DNS scenarios:

Monitoring - Track DNS query latency from your application servers. Spikes often indicate resolver overload or network issues. - Set up synthetic checks that resolve your own domains from multiple geographic locations. If any region fails, you'll know before users do. - Watch for NXDOMAIN (non-existent domain) responses. A sudden increase often means a client typo or a misconfigured CNAME.

Troubleshooting flow 1. Start with 'dig @your-resolver yourdomain.com A'. Check the ANSWER SECTION. 2. If you get a response but it's wrong, check the authoritative NS: 'dig @your-ns yourdomain.com A'. 3. If authoritative is correct but public resolvers show old data, you're waiting out TTL. 4. If no response at all, check firewall rules (UDP/53 and TCP/53) and that your resolver IP is reachable.

Common pitfalls in production - Using the same TTL for all records without planning for changes. - Putting a CNAME at the root domain — many providers now throw an error, but some silently break. - Not having a fallback resolver. If your primary resolver (like your ISP's) goes down, your app goes down. Configure a secondary. - Forgetting that DNSSEC can break resolution if misconfigured. Always test with 'delv' before enabling.

dns_troubleshooting_script.shBASH

# Quick DNS health check script
# Run this to verify your domain's records from multiple angles

DOMAIN="yourdomain.com"
EXPECTED_IP="203.0.113.10"

# 1. Query a public resolver to see what users see
echo "=== Public resolver (Google 8.8.8.8) ==="
PUBLIC_IP=$(dig @8.8.8.8 $DOMAIN A +short)
echo "Resolved IP: $PUBLIC_IP"

if [ "$PUBLIC_IP" != "$EXPECTED_IP" ]; then
  echo "WARNING: Public resolver does not match expected IP!"
fi

# 2. Query the authoritative nameserver directly
echo ""
echo "=== Authoritative nameserver ==="
NS=$(dig $DOMAIN NS +short | head -1)
dig @$NS $DOMAIN A +short

# 3. Check MX records for email
echo ""
echo "=== MX Records ==="
dig $DOMAIN MX +short

# 4. Check DNS response time
echo ""
echo "=== Response time ==="
time dig @8.8.8.8 $DOMAIN A +short > /dev/null

Output

=== Public resolver (Google 8.8.8.8) ===

Resolved IP: 203.0.113.10

=== Authoritative nameserver ===

203.0.113.10

=== MX Records ===

10 mail.yourdomain.com.

=== Response time ===

real 0m0.045s

Mental Model

DNS Is Like a Distributed Directory Service

Your application's reliability depends on a system you don't control.

DNS is someone else's infrastructure — you only control your authoritative NS and your own application's resolver config.
Caching is a feature, not a bug. It makes the internet fast, but it also delays your changes.
Every DNS query is a potential failure point: resolver down, network partition, authoritative server offline.
Treat DNS as critical infrastructure. Monitor it, test it, plan changes around TTLs.

📊 Production Insight

A misconfigured CNAME can silently redirect traffic to a dead host.

Use dig +trace to verify the full chain after any NS record change.

Always have a secondary resolver configured — don't rely on a single one.

🎯 Key Takeaway

Production DNS needs monitoring and testing.

TTL management is essential for zero-downtime migrations.

When DNS breaks, it looks like everything is down — troubleshoot methodically.

Zone Transfers: The Silent Backdoor Nobody Locks

You think DNS is just lookups. It's also replication. Zone transfers are how secondary DNS servers get the full record set from a primary. AXFR is the old protocol. IXFR sends only changes. Both are terrifying if left open. Attackers don't brute force DNS. They ask nicely. A misconfigured primary server will happily dump every A, MX, TXT record for your entire domain. That's a network diagram handed to a stranger. Protect zone transfers. Restrict them by IP. Use TSIG signatures for authentication. No exceptions. If a junior on your team says "it's internal only," audit it yourself. I've seen internal breach spread to prod in 12 minutes because someone opened AXFR to 0.0.0.0/0. DNS replication is powerful. Lock it down.

check-axfr.shBASH

#!/bin/bash
# io.thecodeforge - quick zone transfer audit
DOMAIN="thecodeforge.io"
# Attempt AXFR from multiple DNS servers
for ns in $(dig NS $DOMAIN +short); do
    echo "Testing $ns for open zone transfer..."
    dig AXFR $DOMAIN @$ns +short 2>/dev/null
    if [ $? -eq 0 ]; then
        echo "[!] VULNERABLE: $ns allows zone transfers"
    fi
done
# Production check: BIND ACL example
# allow-transfer { 192.168.1.2; 10.0.0.3; };

Output

Testing ns1.thecodeforge.io for open zone transfer...

[!] VULNERABLE: ns1.thecodeforge.io allows zone transfers

⚠ Production Trap:

Never allow inbound AXFR from the public internet. Use TSIG or restrict to authorized secondaries only. One open transfer = your entire DNS map exposed.

🎯 Key Takeaway

Zone transfers leak your entire network topology if unauthenticated. Restrict by IP and sign with TSIG.

DNS Resolution: Not Magic, Just Delegation

When you type a domain, nothing magical happens. It's a chain of delegation. Your browser asks a resolver (usually your ISP or a public one like 8.8.8.8). The resolver doesn't know the answer. It asks the root servers. The root doesn't know either. It points to the TLD server for .com. The TLD server points to your authoritative nameserver. That finally returns the IP. Each server only knows who to ask next. That's the whole trick. No single server knows everything. This layering prevents bottlenecks and global failure. But it also means every hop can fail or lie. Resolvers cache aggressively. Authoritative servers can serve stale data. The hierarchy exists for scaling, not for performance. Understand it. Debug it. Use dig +trace to see each step live. When DNS breaks, it's almost always a broken delegation or a poisoned cache. Not a cosmic ray.

trace-dns.shBASH

#!/bin/bash
# io.thecodeforge - trace each delegation step manually
domain="thecodeforge.io"
echo "Tracing DNS delegation for $domain..."
# Step 1: Root hint (a.root-servers.net)
dig @198.41.0.4 $domain A +short
echo "-> Root gave us TLD servers"
# Step 2: Ask TLD for .io (a.nic.io)
dig @a.nic.io $domain A +short
echo "-> TLD gave us Authoritative servers"
# Step 3: Ask authoritative (ns1.thecodeforge.io)
AUTHORITATIVE=$(dig NS $domain +short | head -1)
dig @$AUTHORITATIVE $domain A +short
echo "-> Authoritative returned final IP"
# Production debugging:
# dig +trace $domain  # one-liner for the same effect

Output

Tracing DNS delegation for thecodeforge.io...

-> Root gave us TLD servers

-> TLD gave us Authoritative servers

198.51.100.42

-> Authoritative returned final IP

🔥Debug Fast:

Use dig +trace example.com to see every delegation step in real time. Pinpoints broken delegations instantly.

🎯 Key Takeaway

DNS resolution is delegation, not knowledge. Each server points down the hierarchy.

DNS over HTTPS and DNS over TLS: Modern Privacy

Traditional DNS queries are sent in plaintext over UDP or TCP, meaning anyone on the network path—your ISP, a Wi-Fi hotspot operator, or an attacker—can see which domains you're visiting. DNS over HTTPS (DoH) and DNS over TLS (DoT) encrypt these queries to protect user privacy and integrity. DoT uses a dedicated port (853) and TLS encryption, while DoH wraps DNS queries in HTTPS traffic on port 443, making them indistinguishable from regular web traffic. For example, configuring your browser to use Cloudflare's 1.1.1.1 over DoH ensures that your DNS lookups are encrypted from end to end. In practice, DoH is easier to deploy in environments where port 853 is blocked, but DoT is simpler to implement at the system level. Both protocols prevent DNS spoofing and eavesdropping, though they shift trust to the resolver provider. As of 2025, major browsers and operating systems support DoH/DoT, but adoption is not universal due to concerns about centralization and enterprise control. For system administrators, enabling DoH on recursive resolvers like Unbound or using a forwarder that supports DoT can enhance privacy for all clients on a network.

unbound-doh.confINI

server:
    # Enable DNS over HTTPS
    do-tcp: yes
    interface: 0.0.0.0@853
    # Forward to Cloudflare DoH
    forward-zone:
        name: "."
        forward-tls-upstream: yes
        forward-addr: 1.1.1.1@853#cloudflare-dns.com
        forward-addr: 1.0.0.1@853#cloudflare-dns.com

🔥Privacy vs. Performance

📊 Production Insight

In enterprise networks, be cautious with DoH: it can bypass corporate DNS filtering policies. Use DNS sinkholes or firewall rules to block DoH endpoints if you need to enforce content filtering. For public resolvers, monitor for any changes in resolver IPs or certificates to avoid outages.

🎯 Key Takeaway

DNS over HTTPS and DNS over TLS encrypt DNS queries to prevent eavesdropping and tampering, with DoH being more firewall-friendly and DoT offering simpler server-side configuration.

DNS Resolution in Kubernetes: CoreDNS Deep-Dive

In Kubernetes, DNS is a critical service for service discovery. CoreDNS is the default DNS server since Kubernetes 1.13, replacing kube-dns. It runs as a set of pods in the kube-system namespace and serves DNS records for Services and Pods. For example, a Service named 'my-svc' in namespace 'default' is resolvable as 'my-svc.default.svc.cluster.local'. CoreDNS uses a plugin architecture; the 'kubernetes' plugin handles service discovery by querying the Kubernetes API. When a Pod makes a DNS query, the node's kubelet configures the Pod's /etc/resolv.conf to point to the CoreDNS Service IP (typically 10.96.0.10). CoreDNS then resolves the name by checking its cache, then the configured upstream resolvers (e.g., Google's 8.8.8.8). For custom domains, you can add a 'rewrite' plugin to map external names to internal services. Troubleshooting DNS in Kubernetes often involves checking CoreDNS logs, verifying Pod DNS config, and ensuring network policies allow DNS traffic. A common pitfall is the 'search domain' expansion causing unexpected delays; for instance, a query for 'database' might expand to 'database.default.svc.cluster.local' before falling back to the public DNS. To optimize, you can adjust the ndots option in Pod's dnsConfig.

coredns-custom.yamlYAML

apiVersion: v1
kind: ConfigMap
metadata:
  name: coredns-custom
  namespace: kube-system
data:
  Corefile: |
    .:53 {
        errors
        health {
            lameduck 5s
        }
        ready
        kubernetes cluster.local in-addr.arpa ip6.arpa {
            pods insecure
            fallthrough in-addr.arpa ip6.arpa
            ttl 30
        }
        prometheus :9153
        forward . 8.8.8.8 8.8.4.4 {
            max_concurrent 1000
        }
        cache 30
        loop
        reload
        loadbalance
    }

⚠ CoreDNS Resource Limits

📊 Production Insight

Always configure CoreDNS with a fallback forwarder (e.g., 8.8.8.8) and set appropriate cache TTLs. For large clusters, consider using NodeLocal DNSCache to reduce latency and CoreDNS load. Regularly update CoreDNS to patch security vulnerabilities.

🎯 Key Takeaway

CoreDNS is the backbone of Kubernetes service discovery, using plugins to resolve cluster-internal names and forward external queries, with caching and health checks for reliability.

DNSSEC: How It Works and Why Adoption Is Slow

DNSSEC (DNS Security Extensions) adds cryptographic signatures to DNS records to ensure authenticity and integrity. It prevents attacks like DNS spoofing and cache poisoning by allowing resolvers to verify that responses come from the authoritative source and haven't been tampered with. The chain of trust starts at the root zone, which is signed with a key pair. Each domain signs its records with a Zone Signing Key (ZSK), and the ZSK is signed by a Key Signing Key (KSK). The KSK's public key is published as a DS record in the parent zone. When a resolver queries a signed zone, it receives RRSIG records along with the answer, and validates them using the public keys. For example, if you query 'example.com' with DNSSEC, you'll get an RRSIG record that the resolver can verify against the DS record in '.com'. Despite its security benefits, DNSSEC adoption remains low—only about 30% of .com domains are signed as of 2025. Reasons include operational complexity: key management, regular key rollovers, and the risk of misconfiguration causing resolution failures. Additionally, DNSSEC does not encrypt queries, so it doesn't provide privacy. Many organizations find the overhead outweighs the benefits, especially with the rise of DoH/DoT. However, for high-security domains (e.g., banking, government), DNSSEC is increasingly mandated. Tools like 'dnssec-keygen' and 'dnssec-signzone' help manage signing, and providers like Cloudflare offer one-click DNSSEC.

dnssec-sign.shBASH

#!/bin/bash
# Generate ZSK and KSK for example.com
dnssec-keygen -a ECDSAP256SHA256 example.com
dnssec-keygen -f KSK -a ECDSAP256SHA256 example.com
# Sign the zone file
dnssec-signzone -A -3 $(head -c 1000 /dev/urandom | sha1sum | cut -b 1-16) -N INCREMENT -o example.com -t example.com.zone

💡Automate Key Rollovers

📊 Production Insight

Before enabling DNSSEC, ensure your DNS provider supports it and test with tools like 'dnsviz.net'. Monitor for validation failures, which can cause legitimate domains to become unreachable. Consider DNSSEC only for critical domains where authenticity is paramount.

🎯 Key Takeaway

DNSSEC provides cryptographic verification of DNS data, preventing spoofing, but adoption is hindered by operational complexity and lack of privacy protection.

● Production incidentPOST-MORTEMseverity: high

The 48-Hour Email Blackout From a Single MX Record Change

Symptom

After updating MX records at the authoritative nameserver, about half of incoming email continued arriving at the old mail server, while the other half went to the new one. Users reported missing messages and bouncebacks.

Assumption

The team assumed DNS changes take effect 'within a few hours' and that restarting the mail servers would force the resolver to re-query.

Root cause

The MX record had a TTL of 86400 seconds (24 hours). Many ISPs and corporate resolvers cache DNS records for the full TTL. Even after the change, those resolvers continued delivering email to the old server IP for up to 48 hours. No amount of server restarts could override a cached TTL.

Fix

Immediately set the old MX record with a TTL of 300 seconds (5 minutes) at the old provider — but since the domain was already pointed to new NS, they had to wait out the original TTL. The lesson: always reduce TTL to 300 at least 48 hours before any record change.

Production debug guideWhen the network isn't resolving as expected, follow these steps to isolate the problem.4 entries

Symptom · 01

Website loads in some regions but not others

→

Fix

Check DNS propagation with a global checker (whatsmydns.net). If still mixed, wait out the TTL. Use dig @8.8.8.8 to bypass local caches.

Symptom · 02

dig returns an IP, but curl fails

→

Fix

Compare the IP from dig to the expected server IP. If mismatch, check for stale cached records at your local resolver. Flush your local DNS cache.

Symptom · 03

Email bounces with 'domain not found'

→

Fix

Query the MX record: dig MX yourdomain.com. Ensure the mailbox hostname resolves to an A record. Check SPF/TXT records for email authentication.

Symptom · 04

nslookup times out

→

Fix

Verify the configured nameservers are reachable (ping or nc -zv to port 53). Check firewall rules blocking UDP/53. Try a public resolver like 8.8.8.8.

★ Quick DNS Debug Cheat SheetWhen DNS breaks in production, these commands cut through the noise. Run them in order.

Propagation not complete−

Immediate action

Query a public resolver directly to bypass local cache

Commands

dig @8.8.8.8 yourdomain.com A

dig @1.1.1.1 yourdomain.com A

Fix now

Compare both outputs. If they differ, wait for caches to expire — you cannot force it.

Wrong A record returned+

Email delivery failing+

DNSSEC validation failure+

DNS Record Types at a Glance

DNS Record Type	What It Maps	Real-World Use Case	Can Be at Root Domain?
A Record	Domain → IPv4 address	Point thecodeforge.io to your web server's IP	Yes
AAAA Record	Domain → IPv6 address	Same as A record but for IPv6-enabled servers	Yes
CNAME Record	Domain → Another domain name	www.thecodeforge.io → thecodeforge.io	No — subdomains only
MX Record	Domain → Mail server hostname	Route emails sent to @thecodeforge.io to Google Workspace	Yes
TXT Record	Domain → Arbitrary text string	SPF email security, proving domain ownership to Google/GitHub	Yes
NS Record	Domain → Authoritative nameserver	Tell the internet which DNS servers manage your domain	Yes
TTL (on any record)	Cache lifetime in seconds	300s before migrations, 86400s for stable production records	N/A

⚙ Quick Reference

10 commands from this guide

File	Command / Code	Purpose
dns_basic_lookup.sh	nslookup google.com	IP Addresses
dns_full_trace.sh	dig thecodforge.io +trace	The DNS Hierarchy
dns_record_types.sh	dig github.com A +short	DNS Records Demystified
dns_lookup_with_python.py	domain_name = "github.com"	DNS Caching, TTL and Why Your Changes Seem to Take Forever
dns_troubleshooting_script.sh	DOMAIN="yourdomain.com"	DNS in Production
check-axfr.sh	DOMAIN="thecodeforge.io"	Zone Transfers
trace-dns.sh	domain="thecodeforge.io"	DNS Resolution
unbound-doh.conf	server:	DNS over HTTPS and DNS over TLS
coredns-custom.yaml	apiVersion: v1	DNS Resolution in Kubernetes
dnssec-sign.sh	dnssec-keygen -a ECDSAP256SHA256 example.com	DNSSEC

Key takeaways

DNS is a distributed, hierarchical database

not a single server. Four server types collaborate on every lookup: Recursive Resolver → Root Nameserver → TLD Nameserver → Authoritative Nameserver.

TTL controls caching. A low TTL (300s) means faster propagation of changes but more DNS queries. A high TTL (86400s) means faster performance for users but slow DNS change rollouts. Tune it intentionally, especially before migrations.

CNAME records create aliases between domain names, but they can only be used on subdomains

never at the root/apex domain (e.g., you can CNAME 'www.site.com' but not 'site.com' itself).

Your computer has its own DNS cache, your router has one, your ISP's resolver has one, and every server along the chain has one. When debugging DNS issues, always test with a specific public resolver (dig @8.8.8.8) to bypass all local caches and get the true current answer.

Production DNS failures often look like app failures. Monitor DNS latency, set up synthetic checks from multiple regions, and always have a secondary resolver configured.

INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01SENIOR

Walk me through exactly what happens — at the DNS level — from the momen...

Q02JUNIOR

What is the difference between an A record and a CNAME record, and when ...

Q03SENIOR

What is DNS TTL and why does it matter in a production deployment? If yo...

Q01 of 03SENIOR

Walk me through exactly what happens — at the DNS level — from the moment I type 'google.com' in my browser and hit Enter to when the page starts loading.

ANSWER

Your browser first checks its local DNS cache. If not found, it asks the operating system's cache, then the configured recursive resolver (usually your ISP's or 8.8.8.8). The resolver, if it doesn't have it cached, starts a recursive query: it contacts a root nameserver (e.g., a.root-servers.net) which returns a referral to the .com TLD nameserver. The resolver then queries the .com TLD, which returns the authoritative nameserver for google.com. Finally, the resolver queries that authoritative server for the A or AAAA record, gets the IP address (e.g., 142.250.80.46), caches it according to TTL, and returns it to your browser. Your browser then opens a TCP connection to that IP, performs TLS handshake if HTTPS, and begins fetching the page. Total time for uncached lookup: 20-120ms. Subsequent visits: microseconds due to caching.

FAQ · 5 QUESTIONS

Frequently Asked Questions

What is DNS in simple terms?

How long does DNS propagation take?

What is the difference between a DNS resolver and an authoritative nameserver?

Can I use a CNAME for my root domain?

What happens if my authoritative nameserver goes down?

Naren Founder & Principal Engineer

20+ years shipping production systems from the metal up. Written from production experience, not tutorials.

✓ Verified

production tested

July 18, 2026

last updated

2,466

articles · all by Naren

🔥

That's Computer Networks. Mark it forged?

10 min read · try the examples if you haven't