The ip command replaces ifconfig for interface and route management
ss replaces netstat for socket statistics and is dramatically faster on busy servers
dig tests DNS layer; curl -w gives exact timing for each TLS/TCP phase
tcpdump requires filters: always specify port or host to avoid flooding disk
ping and traceroute can mislead when ICMP is blocked — verify with TCP probes
Plain-English First
Imagine your computer is a post office. Networking commands are the tools the postmaster uses to check which delivery routes are open, which packages are stuck, and whether the roads between buildings are working. Just like a postmaster can trace a lost parcel or see which trucks are currently on the road, Linux networking commands let you trace packets, spot blocked ports, and see exactly which processes are talking to the outside world.
Every production outage has a moment — usually at 2am — where someone types a networking command into a terminal and either finds the problem in 30 seconds or spends three hours guessing. Linux networking tools are what separates a DevOps engineer who can diagnose a flaky microservice from one who just restarts containers and hopes for the best. These commands are your stethoscope for the network layer.
ip vs ifconfig — Why the Old Tool Is Dead and What Replaced It
For years, ifconfig was the go-to command for inspecting network interfaces. It still works on many systems, but it's been deprecated and is no longer installed by default on modern Linux distributions like Ubuntu 20.04+ and RHEL 8+. The replacement is the ip command, which is part of the iproute2 package and talks directly to the kernel's netlink socket instead of parsing /proc files.
The key difference isn't just syntax — it's capability. ip can manage routing tables, network namespaces, tunnels, and ARP/NDP caches all through one unified tool. ifconfig could only touch interfaces and basic IP configuration.
When you're debugging a container networking issue in Kubernetes, you'll often drop into a pod's network namespace and run ip addr to see what the container thinks its IP is. That's impossible with ifconfig, which has no namespace awareness. Know ip deeply and you'll be comfortable anywhere from a bare-metal server to a Docker container.
network_interface_inspection.shBASH
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
#!/bin/bash
# Shows practical ip command usage forinterface inspection
# Run these on any modern Linux host
# List all network interfaces with their IP addresses
# The'addr' subcommand shows Layer3 (IP) info bound to each interface
ip addr show
# Filter to just one interface — useful when you have many (eth0, lo, docker0, etc.)
ip addr show dev eth0
# Show the routing table — which gateway handles which destination networks
# This is the first thing to check when packets aren't leaving the host
ip route show
# Show the default gateway specifically — the 'exit door'for all unknown traffic
ip route show default
# Add a temporary static route for a specific subnet via a specific gateway
# This survives until reboot — use /etc/netplan or /etc/network/interfaces for persistence
ip route add 192.168.50.0/24 via 10.0.0.1 dev eth0
# Bring an interfaceDOWN and back UP without rebooting
# Useful after changing IP config or when an interface is in a bad state
ip link set eth0 down
ip link set eth0 up
# CheckARP cache — maps IP addresses to MAC addresses on your local network
# If a host's MAC shows as INCOMPLETE, ARP resolution is failing — suspect a firewall or VLAN issue
ip neigh show
Output
# ip addr show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN
inet 10.0.1.15/24 brd 10.0.1.255 scope global dynamic eth0
# ip route show default
default via 10.0.1.1 dev eth0 proto dhcp src 10.0.1.15 metric 100
# ip neigh show
10.0.1.1 dev eth0 lladdr 52:54:00:11:22:33 REACHABLE
10.0.1.20 dev eth0 lladdr 52:54:00:aa:bb:cc STALE
Watch Out: ifconfig Changes Don't Stick
Any interface change made with ip (or ifconfig) is wiped on reboot. To persist changes, use your distro's config files — netplan YAML files on Ubuntu, nmcli/NetworkManager on RHEL, or /etc/network/interfaces on Debian. Making changes with ip and forgetting to persist them is a classic 'works until the server reboots' trap.
Production Insight
In one incident, a team added a secondary IP with 'ip addr add' during a migration and forgot to persist it.
When the server rebooted for a kernel update, the IP vanished and the application broke.
Rule: always configure persistent network settings in distro-specific files, not ad-hoc ip commands.
Key Takeaway
ip is the single command you need for Layer 2-3 inspection.
ifconfig is dead. Don't learn it.
ip addr, ip route, ip link, ip neigh — these four subcommands cover 90% of daily needs.
When to use ip vs ifconfig
IfYou need to inspect or modify ARP/NDP cache
→
UseUse 'ip neigh' — ifconfig cannot manage ARP at all.
IfYou are inside a container or network namespace
→
UseUse 'ip' — ifconfig has no namespace awareness.
IfYou only need to show IP and MAC of local interfaces
→
UseBoth work, but 'ip addr show' is more detailed.
IfYou are on an ancient system without iproute2
→
UseUse ifconfig as fallback, but plan to migrate.
ss and netstat — Seeing Every Open Door on Your Server
Think of your server as a building with thousands of numbered doors (ports). ss and netstat let you see exactly which doors are open, who's standing at each one, and which processes are responsible. This is critical when deploying a new service — you need to know whether port 8080 is already taken before your app fails to bind to it.
netstat is the old tool, ss (Socket Statistics) is the modern replacement. ss talks directly to the kernel via netlink, which makes it dramatically faster on systems with thousands of connections. On a busy web server, netstat can take 10+ seconds while ss returns instantly.
The real power of ss is in its filtering. You can filter by state (ESTABLISHED, LISTEN, TIME_WAIT), by port, by process, or by remote address. In a microservices environment, you might want to see all connections from this service to the database on port 5432 — ss makes that a one-liner.
Understanding TCP connection states matters here. TIME_WAIT is normal and means your server is waiting for late packets before closing a connection. A flood of TIME_WAIT entries is usually fine. CLOSE_WAIT means the remote side closed but your application hasn't — that often points to a bug in connection handling code.
socket_inspection.shBASH
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
#!/bin/bash
# Practical ss usage patterns for diagnosing connection issues
# Show all LISTENINGTCP ports with the process that owns them
# -t = TCP only, -l = listening sockets, -n = numeric (don't resolve hostnames, much faster),
# -p = show the process (requires root or sudo for processes owned by other users)
ss -tlnp
# Show all ESTABLISHED connections — who is currently connected to this server
ss -tn state established
# Find what's using a specific port (e.g., 8080)
# The'sport' filter means 'source port' — the local port your service is bound to
ss -tlnp sport = :8080
# Count connections per remote IP — useful for spotting a single client hammering your API
# This pipes ss output through awk to extract remote IPs and count occurrences
ss -tn state established | awk '{print $5}' | cut -d: -f1 | sort | uniq -c | sort -rn | head -20
# Show all UDPsockets (DNS, DHCP, NTP all use UDP — don't forget to check these)
ss -ulnp
# Show connection summary statistics by state
# Huge TIME_WAIT count is normal; huge CLOSE_WAIT count may indicate a connection leak
ss -s
# The netstat equivalent for those on older systems without ss
# -a = all sockets, -n = numeric, -t = TCP, -u = UDP, -p = process
netstat -antp
Output
# ss -tlnp
State Recv-Q Send-Q Local Address:Port Peer Address:Port Process
Pro Tip: 127.0.0.1 vs 0.0.0.0 in the Local Address Column
When ss shows a service listening on 127.0.0.1:5432, it's only reachable from localhost — that's correct for a database. If it shows 0.0.0.0:5432, it's accepting connections from any interface — which may be a security misconfiguration. Always verify your database and internal services are NOT listening on 0.0.0.0 unless you have a firewall rule protecting them.
Production Insight
CLOSE_WAIT is your enemy. A single misbehaving client can leave a connection in CLOSE_WAIT forever.
In one event, a proxy pool leaked connections because the proxy waited for data it would never get.
Rule: if you see >100 CLOSE_WAIT connections on your web server, audit your connection cleanup logic.
Key Takeaway
ss -tlnp is your first command for port listening status.
Watch for 127.0.0.1 vs 0.0.0.0 — security difference.
CLOSE_WAIT >100 indicates a connection leak in your app.
TIME_WAIT is normal, not a problem.
Choosing between ss and netstat
IfSystem has netstat installed and you need a quick check
→
UseUse netstat -antp, but be aware it's slow on busy systems.
IfYou need fast, filtered output on a production server with many connections
→
UseUse ss — it's faster and supports complex filters.
IfYou need to see process names without root
→
UseNeither — process names require root/sudo with -p flag.
IfYou need to analyze UDP or Unix domain sockets
→
UseUse ss -ulnp for UDP, ss -xl for Unix sockets.
curl, wget and dig — Testing Connectivity Layer by Layer
When a service is unreachable, you need to narrow down the layer where things break. Is it DNS? TCP connectivity? HTTP routing? TLS? curl is exceptional here because it can test each layer independently and gives you precise timing data. dig is your dedicated DNS debugging tool, and together they let you methodically eliminate suspects.
curl's --verbose flag is one of the most useful things in networking debugging. It shows you the DNS resolution, the TCP handshake, the TLS negotiation, and the HTTP headers — all in sequence. When your HTTPS endpoint is slow, curl -w timing reveals whether the slowness is in DNS, in TCP, in TLS, or in the actual server response time.
dig is purpose-built for DNS and goes far beyond what you can learn from ping or curl. You can query specific record types, target specific nameservers, and trace the full delegation chain from root to authoritative server. This matters when you're investigating DNS propagation issues after a domain change, or debugging split-horizon DNS in a VPN setup.
wget is simpler and better for quick file downloads or when you need recursive mirroring. For API testing and network diagnosis, curl is almost always the right choice.
connectivity_layer_testing.shBASH
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
#!/bin/bash
# Test each networking layer independently to isolate where failures occur
# ─── DNSTESTINGWITH dig ───────────────────────────────────────────────
# Basic A record lookup — what IP does this hostname resolve to?
dig api.example.com A
# Query a SPECIFICnameserver (e.g., Google's 8.8.8.8) to bypass your local resolver
# Useful when debugging 'works on my machine'DNS issues caused by local caching
dig @8.8.8.8 api.example.com A
# Trace the full DNS delegation from root servers down to the authoritative nameserver
# This is the gold standard for debugging DNS propagation issues
dig +trace api.example.com
# CheckMXrecords (mail routing) — +short gives just the values, no noise
dig api.example.com MX +short
# ReverseDNS lookup — find the hostname for an IP address
dig -x 93.184.216.34 +short
# ─── HTTP/HTTPSTESTINGWITH curl ───────────────────────────────────────
# Followredirects (-L), show verbose output (-v) — see every step of the connection
curl -Lv https://api.example.com/health
# Time each phase of the connection — invaluable for performance diagnosis
# This custom format prints timing for each phase on separate lines
curl -o /dev/null -s -w "
DNS lookup: %{time_namelookup}s
TCP connect: %{time_connect}s
TLS handshake: %{time_appconnect}s
Time to first byte: %{time_starttransfer}s
Total time: %{time_total}s
HTTP status code: %{http_code}
" https://api.example.com/health
# Test an API endpoint with a JSONPOST body — simulates what your app does
curl -X POST https://api.example.com/users \
-H "Content-Type: application/json" \
-H "Authorization: Bearer your-token-here" \
-d '{"username": "alice", "email": "alice@example.com"}' \
--max-time 10 # fail after 10 seconds instead of hanging forever
# TestTLS certificate details — check expiry, issuer, and SANs
curl -vI https://api.example.com 2>&1 | grep -E '(subject|issuer|expire|SSL)'
# SkipTLSverification (ONLYfor debugging self-signed certs — never in production)
curl -k https://internal-service.local/health
Output
# dig api.example.com A +short
93.184.216.34
# curl timing output
DNS lookup: 0.012s
TCP connect: 0.034s
TLS handshake: 0.089s
Time to first byte: 0.201s
Total time: 0.203s
HTTP status code: 200
# If DNS takes 2+ seconds, your resolver is slow or the record isn't cached
# If TLS handshake takes 500ms+, suspect certificate chain or OCSP issues
# If 'time to first byte' is slow but TLS is fast, the app itself is the bottleneck
Interview Gold: The Layers of a Failed Request
Interviewers love asking 'how would you debug a service that users say is slow?' The layered answer — check DNS with dig, check TCP with ss/telnet, check HTTP timing with curl -w — demonstrates real operational thinking. Always start at the lowest layer (DNS) and work up. Jumping straight to application logs while skipping network diagnostics is the classic junior mistake.
Production Insight
A production outage was caused by a TLS handshake taking 3 seconds because the server was doing OCSP stapling with a slow responder.
curl -w revealed TLS time was 10x normal, but the app logs showed nothing.
Rule: always isolate timing with curl -w before blaming the application code.
Key Takeaway
dig first, then curl -w, then investigate the slowest phase.
curl -v shows every layer of the connection.
Never jump to app logs without first ruling out DNS and network timing issues.
Which tool for which layer?
IfNeed to verify DNS resolution for a hostname
→
UseUse dig. Start with @8.8.8.8 to bypass local resolver.
IfNeed to test full HTTP/HTTPS path including headers and response
→
UseUse curl with -v or -w for timing.
IfNeed to download a file or mirror a website
→
UseUse wget for its recursive download capability.
IfNeed to check TLS certificate chain and expiry
→
UseUse 'curl -vI https://host' and grep for subject/issuer/expire.
ping, traceroute and tcpdump — Tracing the Path and Catching Packets
Once you've confirmed DNS resolves correctly and the service is listening, the next question is whether packets are actually reaching their destination. ping tells you if a host is reachable and measures round-trip time. traceroute shows you every hop between you and the destination. tcpdump lets you actually capture and inspect raw packets on the wire.
ping is often misused as a binary 'is it up?' test, but it's more nuanced than that. ICMP packets (what ping uses) can be blocked by firewalls while TCP traffic flows fine. A failed ping doesn't mean a service is down — it might just mean ICMP is blocked. Always follow up a failed ping with a TCP-level check.
traceroute reveals the routing path and where latency is introduced. Each hop shows you a router, and timing spikes between hops show you where delays occur. When a cloud VM can't reach an external API, traceroute often reveals the packet dying at a NAT gateway or security group that's silently dropping traffic.
tcpdump is the most powerful of the three but also the most complex. It captures actual packet data, which is essential for diagnosing issues that higher-level tools can't see — like retransmissions, RST floods, or malformed HTTP headers. Always combine it with Wireshark for complex analysis.
packet_path_tracing.shBASH
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
#!/bin/bash
# Trace packet paths and capture traffic for deep network diagnosis
# ─── PING ───────────────────────────────────────────────────────────────
# Basic connectivity check — send 5 packets, show statistics
ping -c 58.8.8.8
# Ping with a larger packet size to test MTUissues (1472 bytes + 28byteIP/ICMP header = 1500MTU)
# Ifthis fails but small pings work, you likely have an MTUmismatch (common with VPNs)
ping -c 4 -s 1472 -M do8.8.8.8 # -M do means 'don't fragment'
# ─── TRACEROUTE ─────────────────────────────────────────────────────────
# Standard traceroute using UDP probes
traceroute api.example.com
# UseTCPSYN probes on port 443 instead of UDP
# More likely to get through firewalls that block ICMP and UDP probes
traceroute -T -p 443 api.example.com
# mtr combines ping and traceroute — shows live packet loss per hop
# This is often the single best tool for diagnosing intermittent routing issues
mtr --report --report-cycles 20 api.example.com
# ─── TCPDUMP ────────────────────────────────────────────────────────────
# Capture all traffic on eth0 — CTRL+C to stop
# -n = don't resolve hostnames (faster), -i = interface
tcpdump -i eth0 -n
# Capture only HTTPtraffic (port 80 or 443) to/from a specific host
# 'host' filters by IP in either direction
tcpdump -i eth0 -n host 10.0.1.20 and \( port 80 or port 443 \)
# CaptureDNS queries — see what your server is resolving and to which nameserver
tcpdump -i eth0 -n port 53
# Save capture to a file for later analysis in Wireshark
# -w writes raw packets, -C 100 rotates files at 100MB to avoid filling the disk
tcpdump -i eth0 -n -w /tmp/capture.pcap -C 100 host 10.0.1.20
# Capture and print packet payloads in ASCII — see the actual HTTP request/response text
# -A = ASCII output, -s 0 = capture full packet (no truncation)
tcpdump -i eth0 -n -A -s 0 port 8080 | grep -E '(GET|POST|HTTP|Host:)'
Output
# ping -c 5 8.8.8.8
PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.
64 bytes from 8.8.8.8: icmp_seq=1 ttl=118 time=4.21 ms
64 bytes from 8.8.8.8: icmp_seq=2 ttl=118 time=4.18 ms
--- 8.8.8.8 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss
rtt min/avg/max/mdev = 4.18/4.22/4.31/0.04 ms
# traceroute -T -p 443 api.example.com
traceroute to api.example.com (93.184.216.34), 30 hops max
1 10.0.1.1 (10.0.1.1) 0.432 ms
2 203.0.113.1 (203.0.113.1) 3.211 ms
3 * * * <-- asterisks mean this hop is not responding (ICMP filtered)
4 93.184.216.34 12.44 ms
# mtr output (after 20 cycles)
Host Loss% Snt Last Avg Best Wrst
1. 10.0.1.1 0.0% 20 0.4 0.4 0.3 0.6
2. 203.0.113.1 0.0% 20 3.2 3.1 2.9 3.8
3. ??? -- hop doesn't respond to probes
4. 93.184.216.34 0.0% 20 12.4 12.3 12.0 13.1
Pro Tip: Asterisks in traceroute Don't Mean Packet Loss
Asterisks ( *) on a traceroute hop mean that router isn't responding to ICMP TTL-exceeded messages — not that your packets aren't passing through it. If the next hop appears and the final destination is reachable, that router is simply configured to drop probe packets silently. Only worry about asterisks if all subsequent hops also show asterisks.
Production Insight
A critical incident: 'ping' to an API returned fine, but curl failed with connection timeouts.
tcpdump on the server showed TCP SYNs arriving but no SYN-ACKs — the server was dropping them.
Turned out the iptables rules had a rule that matched the source IP but not ICMP.
Rule: ping is not a proxy for TCP connectivity. Always test on the same protocol.
Key Takeaway
tcpdump is the last resort, not the first.
Always use ping -c 5, traceroute -T, mtr before tcpdump.
A failed ping does not mean the service is down — verify with curl or nc.
When to use ping, traceroute or tcpdump
IfYou want to check basic reachability and latency to a host
→
UseUse ping. But remember ICMP may be blocked.
IfYou need to see the full path and identify latency spikes
→
UseUse traceroute -T -p 443 for TCP-based, or mtr for continuous monitoring.
IfYou need to inspect actual packet contents (e.g., malformed packets, retransmissions)
→
UseUse tcpdump with filters. Always write to file with -w and limit size with -C.
IfYou suspect packet loss or MTU issues
→
UseUse ping with -M do and varying -s sizes to find the MTU limit.
Network Performance Monitoring — Bandwidth, Throughput and Bottlenecks
Production servers don't just need connectivity — they need performance. When a service becomes slow, you need to know whether the bottleneck is your server's network interface, the application itself, or somewhere in between. Tools like nload, iftop, iptraf-ng, and nethogs give you real-time bandwidth and per-process traffic data.
nload shows total incoming and outgoing traffic on each interface with a live graph. iftop shows traffic per connection — which remote IPs are consuming the most bandwidth. iptraf-ng adds detailed statistics per protocol and interface. nethogs breaks down traffic per process, so you can see which application is saturating the link.
A common production scenario: a misconfigured backup job or a rogue cron script starts transferring gigabytes of data and saturates the NIC. nethogs reveals 'python3' as the culprit. iftop shows the destination IP is an internal backup server. You then find the backup script is running on the wrong schedule.
For throughput testing, iperf3 is essential. It measures actual TCP/UDP throughput between two hosts, revealing issues like buffer bloat or misconfigured flow control that won't appear in ping.
performance_monitoring.shBASH
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
#!/bin/bash
# Real-time network performance monitoring tools
# ─── INSTALLATION ───────────────────────────────────────────────────────
# sudo apt install nload iftop iptraf-ng nethogs iperf3 -y
# ─── NLOAD: total bandwidth per interface ───────────────────────────────
# Run with interfacename (or all interfaces default)
nload eth0
# ─── IFTOP: bandwidth per connection ────────────────────────────────────
# Shows top talkers; press 'p' to toggle port display
sudo iftop -i eth0
# ─── IPTRAF-NG: per-protocol statistics ────────────────────────────────
sudo iptraf-ng
# Then choose 'IP traffic monitor' or 'Detailed interface statistics'
# ─── NETHOGS: bandwidth per process ────────────────────────────────────
sudo nethogs eth0
# ─── IPERF3: throughput test between two hosts ──────────────────────────
# Start server on host B
iperf3 -s
# Connect from host A (client)
iperf3 -c 10.0.1.20 -t 30 # run for30 seconds
# TestUDPthroughput (includes jitter and packet loss)
iperf3 -c 10.0.1.20 -u -b 100M -t 30
# ─── CHECKINTERFACEERRORS ────────────────────────────────────────────
# Look at 'errors', 'dropped', 'overruns' counters
ip -s link show eth0
When the on-call phone rings with 'the API is slow', the first thing after checking CPU and memory is to check network bandwidth. If the interface is saturated, a process-level tool like nethogs will tell you exactly which PID is causing it. In one case, a developer left a large database dump running on a cron job that overlapped with peak traffic — nethogs caught it in 30 seconds.
Production Insight
A burst of outgoing traffic from an unexpected process is the top cause of network latency in production.
nethogs revealed a node.js process sending gigabytes of logs to a centralized logging server.
Rule: when latency spikes without a traffic increase, check for accumulated buffering on the NIC.
Key Takeaway
When a service slows down, check network bandwidth before blaming the app.
nethogs shows which process is using the interface.
iperf3 tells you the actual throughput capacity, not just ping latency.
Choosing a performance monitoring tool
IfYou need total bandwidth per interface with a quick view
→
UseUse nload. Simple, real-time, no root required for monitoring.
IfYou need to see which remote IPs consume the most bandwidth
→
UseUse iftop. Good for identifying DDOS or heavy client patterns.
IfYou need to know which process is generating traffic
→
UseUse nethogs. Perfect for finding misbehaving applications.
IfYou need to measure maximum achievable throughput between two hosts
→
UseUse iperf3. Essential for network capacity planning and troubleshooting flow control.
● Production incidentPOST-MORTEMseverity: high
The Silent Blackhole: When a New Server Stops Responding After a Router Reboot
Symptom
Server accessible via IPMI/console. ping responses normal, but HTTP clients got timeouts. tcpdump showed no incoming TCP SYN packets despite ping replies. From the server itself, outgoing traffic worked fine.
Assumption
Firewall rules on the server or upstream ACLs are blocking inbound traffic. Engineering team spent 45 minutes reviewing security groups and iptables rules.
Root cause
The router's ARP cache held the old MAC address for the server's IP from a previous deployment. The router forwarded packets to the old MAC, which belonged to a different (now decommissioned) host. The new server's NIC never received the traffic. ARP broadcast was failing because the new server's ARP table was stuck with a stale entry from a duplicate IP conflict during provisioning.
Fix
Cleared ARP cache on both the server and the upstream router. On the server: 'ip neigh flush all'. On the router: cleared ARP entry for the server's IP. Then forced a gratuitous ARP broadcast from the server by bouncing the interface with 'ip link set eth0 down && ip link set eth0 up'. Verified with 'arping -c 3 -I eth0 SERVER_IP'.
Key lesson
Always verify both arp and route tables after server provisioning or network changes.
A single ping reply does not mean the server is reachable on all protocols — ARP is link-layer.
After moving IPs between physical hosts, flush the ARP cache on the local subnet's gateway.
Production debug guideQuick reference for common network failure symptoms and the exact commands to run.4 entries
Symptom · 01
Service unreachable, no response from curl
→
Fix
Check DNS first: dig @8.8.8.8 service.example.com. Then check local listener: ss -tlnp | grep :PORT. Then check remote: nc -zv TARGET_IP PORT. If no response, check route: ip route get TARGET_IP.
Symptom · 02
Intermittent timeouts to an external API
→
Fix
Run curl with timing: curl -w '@timing_format' -o /dev/null -s https://api.example.com. Check for packet loss: mtr --report --report-cycles 20 api.example.com. Compare to a known-good endpoint.
Symptom · 03
Server sees incoming connections but app doesn't
→
Fix
Check ss -tlnp to confirm process is listening on expected IP (0.0.0.0 vs 127.0.0.1). Check app logs for bind errors. Verify kernel reverse path filtering: sysctl net.ipv4.conf.all.rp_filter.
Symptom · 04
High latency to a single hop in traceroute
→
Fix
Run mtr --report --report-cycles 50 TARGET for packet loss per hop. If latency spike is consistent, check interface errors: ip -s link show dev eth0. If the hop is your own router, check ARP table: ip neigh show.
★ Quick Debug Cheat SheetFive-second answers for the most common networking emergencies.
Can't reach any external host−
Immediate action
Check default route and DNS
Commands
ip route show default
dig @8.8.8.8 google.com
Fix now
If default route missing: 'ip route add default via GATEWAY_IP dev eth0'. If DNS fails: check /etc/resolv.conf and restart systemd-resolved.
Port 8080 shows as already in use+
Immediate action
Find the process using the port
Commands
ss -tlnp sport = :8080
ps aux | grep PID
Fix now
If you want to free the port: 'kill -9 PID' or stop the service. If you want to keep the process, change your app's config to use a different port.
curl returns 'Connection refused'+
Immediate action
Check if service is listening on expected interface
Commands
ss -tlnp | grep PORT
netstat -antp | grep PORT (if ss not available)
Fix now
If not listening: start the service. If listening on 127.0.0.1 only: reconfigure service to listen on 0.0.0.0 or the external interface.
DNS resolution is slow (2+ seconds)+
Immediate action
Query a public resolver to isolate the problem
Commands
dig @8.8.8.8 example.com +stats
cat /etc/resolv.conf
Fix now
If local resolver slow: check systemd-resolved or dnsmasq. If your ISP's resolver is slow: change to public DNS (8.8.8.8, 1.1.1.1) in /etc/resolv.conf.
Linux Networking Commands Quick Reference
Command
Layer
Best Use Case
Requires Root?
Modern Alternative
ifconfig
Layer 3 (IP)
Legacy interface config inspection
No
ip addr
ip
Layer 2-3
Interfaces, routes, ARP, namespaces
For changes, yes
Still current — no replacement
netstat
Layer 4 (TCP/UDP)
Port and socket inspection
For -p flag
ss
ss
Layer 4 (TCP/UDP)
Fast socket stats, large systems
For -p flag
Still current — no replacement
ping
Layer 3 (ICMP)
Basic reachability and latency check
No
Still current
traceroute
Layer 3
Routing path and hop latency
No
mtr (combines with ping)
tcpdump
Layer 2-7
Raw packet capture and inspection
Yes
tshark (CLI Wireshark)
curl
Layer 7 (HTTP/S)
API testing, TLS and timing diagnosis
No
Still current
dig
Layer 7 (DNS)
DNS record queries and propagation debug
No
drill (alternative on some distros)
nload
Layer 2-4
Real-time total bandwidth per interface
No
Still current
iftop
Layer 4
Bandwidth per connection
Yes
Still current
nethogs
Layer 4
Bandwidth per process
Yes
Still current
iperf3
Layer 4
Throughput and jitter measurement
On server, yes
Still current
Key takeaways
1
ip and ss are the modern replacements for ifconfig and netstat
learn ip addr, ip route, and ss -tlnp as your default starting point on any system
2
Always debug network failures layer by layer
DNS first with dig, TCP connectivity second with ss/curl, then HTTP/TLS with curl -v, and packet-level last with tcpdump — jumping layers wastes time
3
tcpdump is your last resort, not your first
always filter by host and port, always write to a file with -w, and cap file size with -C to avoid filling disk on production servers
4
Asterisks in traceroute and a failed ping are not proof of failure
ICMP is routinely blocked by firewalls while TCP traffic flows freely, so always verify with a TCP-level tool before declaring a host unreachable
5
When a service slows, check network bandwidth with nload or nethogs before blaming the application
the problem is often a rogue process saturating the interface
Common mistakes to avoid
4 patterns
×
Using ping to confirm a service is down
Symptom
ping fails so you conclude the server is unreachable, but SSH and HTTP still work
Fix
ICMP is frequently blocked by cloud security groups and firewalls. Always follow a failed ping with a TCP check: 'curl -v http://hostname' or 'nc -zv hostname 22' to confirm the service is actually listening.
×
Running tcpdump without a filter on a busy server
Symptom
Terminal freezes or disk fills within seconds on a server handling thousands of connections per second
Fix
Always specify a host, port, or protocol filter: 'tcpdump -i eth0 host 10.0.1.20 and port 443'. Add '-w filename.pcap' to write to a file instead of flooding stdout, and use '-C 50' to cap file size at 50MB.
×
Trusting a single DNS resolver when debugging resolution failures
Symptom
'dig api.example.com' returns an IP but the service still can't connect from your app
Fix
Your app may be using a different resolver than your shell. Use 'dig @127.0.0.53 api.example.com' to query systemd-resolved specifically, 'cat /etc/resolv.conf' to see which resolver is configured, and compare results. Containers often have their own /etc/resolv.conf pointing to a different resolver entirely.
×
Assuming that a high TIME_WAIT count indicates a problem
Symptom
You see thousands of TIME_WAIT connections in ss output and think the server is misconfigured
Fix
TIME_WAIT is a normal TCP state. It means the connection is closing and waiting for late packets. It's harmless unless the socket table is full (which is rare on modern kernels). Spend your time investigating CLOSE_WAIT instead — that's the real connection leak indicator.
INTERVIEW PREP · PRACTICE MODE
Interview Questions on This Topic
Q01SENIOR
A user reports that a web service is intermittently unreachable but your...
Q02SENIOR
What's the difference between a port that shows as LISTEN on 127.0.0.1 v...
Q03JUNIOR
You run traceroute to an external API endpoint and see the third hop ret...
Q04SENIOR
How would you measure the actual available bandwidth between two cloud i...
Q01 of 04SENIOR
A user reports that a web service is intermittently unreachable but your monitoring shows the server is up. Walk me through exactly how you would diagnose this — which commands would you run and in what order?
ANSWER
I'd start by isolating the problem layer by layer:
1. DNS: dig @8.8.8.8 service.example.com to check resolution.
2. TCP connectivity: nc -zv service.example.com 443 from a client outside the cluster.
3. Local listener: ssh into the server and run ss -tlnp | grep :443 to confirm the process is listening.
4. Firewall: check iptables -L and cloud security groups.
5. Application logs: if all network layers work, check the app's logs for errors. If it's intermittent, I'd set up continuous monitoring with curl -w every few seconds and correlate with timing and response codes.
Q02 of 04SENIOR
What's the difference between a port that shows as LISTEN on 127.0.0.1 versus 0.0.0.0 in ss output, and why does it matter from a security perspective?
ANSWER
127.0.0.1 means the service is only reachable from the local machine (loopback interface). This is correct for databases, caches, or internal services that should not be exposed. 0.0.0.0 means the service is listening on all available network interfaces, including public ones. If a database or an internal API is listening on 0.0.0.0, it can be accessed remotely, which is a major security risk if not protected by a firewall. Always check that sensitive services bind to 127.0.0.1 or a specific private IP.
Q03 of 04JUNIOR
You run traceroute to an external API endpoint and see the third hop returns asterisks (* * *) but the final destination is reachable with normal latency. Is there a problem, and how do you explain this to a non-technical stakeholder?
ANSWER
No, there is no problem. Asterisks in traceroute mean that particular router is not responding to ICMP TTL-exceeded messages — it's a common configuration to drop probe packets to reduce load or for security. As long as the final destination is reachable and latency is normal, the path is functioning correctly. To a non-technical stakeholder, I'd say: 'That router is simply too busy to answer our diagnostic pings, but it's still forwarding our traffic correctly. Think of it like a postal sorting office that doesn't answer the phone but still processes all packages.'
Q04 of 04SENIOR
How would you measure the actual available bandwidth between two cloud instances with iperf3, and what would you look for beyond just the transfer rate?
ANSWER
On one instance run 'iperf3 -s' as server, on the other run 'iperf3 -c <SERVER_IP> -t 30' for a 30-second TCP test. The result shows bandwidth in Mbits/sec. Beyond the raw rate, check for:
- Retransmissions: high retransmits indicate packet loss or congestion.
- Jitter (UDP test: iperf3 -u -b 100M -t 30): high jitter can affect real-time applications.
- CPU usage on both ends: if CPU is near 100% during the test, the bottleneck may be the application, not the network.
- Multiple parallel streams: sometimes single-stream limits are due to TCP window sizing; use '-P 4' for parallel tests.
01
A user reports that a web service is intermittently unreachable but your monitoring shows the server is up. Walk me through exactly how you would diagnose this — which commands would you run and in what order?
SENIOR
02
What's the difference between a port that shows as LISTEN on 127.0.0.1 versus 0.0.0.0 in ss output, and why does it matter from a security perspective?
SENIOR
03
You run traceroute to an external API endpoint and see the third hop returns asterisks (* * *) but the final destination is reachable with normal latency. Is there a problem, and how do you explain this to a non-technical stakeholder?
JUNIOR
04
How would you measure the actual available bandwidth between two cloud instances with iperf3, and what would you look for beyond just the transfer rate?
SENIOR
FAQ · 5 QUESTIONS
Frequently Asked Questions
01
What is the difference between ss and netstat in Linux?
Both show socket and port information, but ss is the modern replacement for netstat. ss communicates directly with the Linux kernel via netlink sockets, which makes it far faster on systems with thousands of connections. netstat relies on parsing /proc filesystem files, which becomes slow under high load. On modern distributions (Ubuntu 20.04+, RHEL 8+), netstat may not even be installed by default — use ss -tlnp instead.
Was this helpful?
02
How do I find which process is using a specific port in Linux?
Run 'sudo ss -tlnp sport = :PORT_NUMBER' replacing PORT_NUMBER with the port you're investigating — for example 'sudo ss -tlnp sport = :8080'. The -p flag shows the process name and PID. You need sudo because processes owned by other users aren't visible without root privileges. Alternatively, 'sudo lsof -i :8080' also works and gives similar output.
Was this helpful?
03
Why does ping fail but the service is still accessible in a browser?
ping uses ICMP, which is a completely different protocol from TCP (which HTTP and HTTPS use). Cloud providers like AWS and GCP block ICMP by default in their security groups and firewall rules. So a host can be fully accessible via HTTP on port 80 while refusing to respond to ping entirely. This is intentional — exposing ICMP can enable network reconnaissance. Always use a TCP-based check like 'curl -v http://hostname' to confirm a service is truly unreachable, not just ping.
Was this helpful?
04
What does 'ss -tlnp' stand for and when would I use it?
It stands for: -t (TCP sockets only), -l (listening sockets only), -n (numeric output, don't resolve hostnames), -p (show process name and PID). You use this command immediately after deploying a new service to verify it's listening on the expected port and interface. For example, after starting a Node.js app on port 3000, run 'ss -tlnp | grep 3000' to confirm it's running.
Was this helpful?
05
How can I capture traffic for later analysis without overloading my production server?
Use tcpdump with strict filters and file size limits: ' sudo tcpdump -i eth0 host 10.0.1.20 and port 443 -w /tmp/capture.pcap -C 100 ' This writes to a file instead of stdout, rotates at 100MB increments, and only captures traffic to/from that specific host on port 443. After capture, copy the .pcap file to your laptop and analyze with Wireshark for deep inspection.