Senior 5 min · March 06, 2026

Linux Networking Commands — Why Ping Works But TCP Fails

Ping responses normal but TCP SYN packets vanished due to a stale ARP cache.

N
Naren · Founder
Plain-English first. Then code. Then the interview question.
About
 ● Production Incident 🔎 Debug Guide
Quick Answer
  • The ip command replaces ifconfig for interface and route management
  • ss replaces netstat for socket statistics and is dramatically faster on busy servers
  • dig tests DNS layer; curl -w gives exact timing for each TLS/TCP phase
  • tcpdump requires filters: always specify port or host to avoid flooding disk
  • ping and traceroute can mislead when ICMP is blocked — verify with TCP probes
Plain-English First

Imagine your computer is a post office. Networking commands are the tools the postmaster uses to check which delivery routes are open, which packages are stuck, and whether the roads between buildings are working. Just like a postmaster can trace a lost parcel or see which trucks are currently on the road, Linux networking commands let you trace packets, spot blocked ports, and see exactly which processes are talking to the outside world.

Every production outage has a moment — usually at 2am — where someone types a networking command into a terminal and either finds the problem in 30 seconds or spends three hours guessing. Linux networking tools are what separates a DevOps engineer who can diagnose a flaky microservice from one who just restarts containers and hopes for the best. These commands are your stethoscope for the network layer.

ip vs ifconfig — Why the Old Tool Is Dead and What Replaced It

For years, ifconfig was the go-to command for inspecting network interfaces. It still works on many systems, but it's been deprecated and is no longer installed by default on modern Linux distributions like Ubuntu 20.04+ and RHEL 8+. The replacement is the ip command, which is part of the iproute2 package and talks directly to the kernel's netlink socket instead of parsing /proc files.

The key difference isn't just syntax — it's capability. ip can manage routing tables, network namespaces, tunnels, and ARP/NDP caches all through one unified tool. ifconfig could only touch interfaces and basic IP configuration.

When you're debugging a container networking issue in Kubernetes, you'll often drop into a pod's network namespace and run ip addr to see what the container thinks its IP is. That's impossible with ifconfig, which has no namespace awareness. Know ip deeply and you'll be comfortable anywhere from a bare-metal server to a Docker container.

network_interface_inspection.shBASH
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
#!/bin/bash
# Shows practical ip command usage for interface inspection
# Run these on any modern Linux host

# List all network interfaces with their IP addresses
# The 'addr' subcommand shows Layer 3 (IP) info bound to each interface
ip addr show

# Filter to just one interface — useful when you have many (eth0, lo, docker0, etc.)
ip addr show dev eth0

# Show the routing table — which gateway handles which destination networks
# This is the first thing to check when packets aren't leaving the host
ip route show

# Show the default gateway specifically — the 'exit door' for all unknown traffic
ip route show default

# Add a temporary static route for a specific subnet via a specific gateway
# This survives until reboot — use /etc/netplan or /etc/network/interfaces for persistence
ip route add 192.168.50.0/24 via 10.0.0.1 dev eth0

# Bring an interface DOWN and back UP without rebooting
# Useful after changing IP config or when an interface is in a bad state
ip link set eth0 down
ip link set eth0 up

# Check ARP cache — maps IP addresses to MAC addresses on your local network
# If a host's MAC shows as INCOMPLETE, ARP resolution is failing — suspect a firewall or VLAN issue
ip neigh show
Output
# ip addr show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP
link/ether 52:54:00:ab:cd:ef brd ff:ff:ff:ff:ff:ff
inet 10.0.1.15/24 brd 10.0.1.255 scope global dynamic eth0
# ip route show default
default via 10.0.1.1 dev eth0 proto dhcp src 10.0.1.15 metric 100
# ip neigh show
10.0.1.1 dev eth0 lladdr 52:54:00:11:22:33 REACHABLE
10.0.1.20 dev eth0 lladdr 52:54:00:aa:bb:cc STALE
Watch Out: ifconfig Changes Don't Stick
Any interface change made with ip (or ifconfig) is wiped on reboot. To persist changes, use your distro's config files — netplan YAML files on Ubuntu, nmcli/NetworkManager on RHEL, or /etc/network/interfaces on Debian. Making changes with ip and forgetting to persist them is a classic 'works until the server reboots' trap.
Production Insight
In one incident, a team added a secondary IP with 'ip addr add' during a migration and forgot to persist it.
When the server rebooted for a kernel update, the IP vanished and the application broke.
Rule: always configure persistent network settings in distro-specific files, not ad-hoc ip commands.
Key Takeaway
ip is the single command you need for Layer 2-3 inspection.
ifconfig is dead. Don't learn it.
ip addr, ip route, ip link, ip neigh — these four subcommands cover 90% of daily needs.
When to use ip vs ifconfig
IfYou need to inspect or modify ARP/NDP cache
UseUse 'ip neigh' — ifconfig cannot manage ARP at all.
IfYou are inside a container or network namespace
UseUse 'ip' — ifconfig has no namespace awareness.
IfYou only need to show IP and MAC of local interfaces
UseBoth work, but 'ip addr show' is more detailed.
IfYou are on an ancient system without iproute2
UseUse ifconfig as fallback, but plan to migrate.

ss and netstat — Seeing Every Open Door on Your Server

Think of your server as a building with thousands of numbered doors (ports). ss and netstat let you see exactly which doors are open, who's standing at each one, and which processes are responsible. This is critical when deploying a new service — you need to know whether port 8080 is already taken before your app fails to bind to it.

netstat is the old tool, ss (Socket Statistics) is the modern replacement. ss talks directly to the kernel via netlink, which makes it dramatically faster on systems with thousands of connections. On a busy web server, netstat can take 10+ seconds while ss returns instantly.

The real power of ss is in its filtering. You can filter by state (ESTABLISHED, LISTEN, TIME_WAIT), by port, by process, or by remote address. In a microservices environment, you might want to see all connections from this service to the database on port 5432 — ss makes that a one-liner.

Understanding TCP connection states matters here. TIME_WAIT is normal and means your server is waiting for late packets before closing a connection. A flood of TIME_WAIT entries is usually fine. CLOSE_WAIT means the remote side closed but your application hasn't — that often points to a bug in connection handling code.

socket_inspection.shBASH
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
#!/bin/bash
# Practical ss usage patterns for diagnosing connection issues

# Show all LISTENING TCP ports with the process that owns them
# -t = TCP only, -l = listening sockets, -n = numeric (don't resolve hostnames, much faster),
# -p = show the process (requires root or sudo for processes owned by other users)
ss -tlnp

# Show all ESTABLISHED connections — who is currently connected to this server
ss -tn state established

# Find what's using a specific port (e.g., 8080)
# The 'sport' filter means 'source port' — the local port your service is bound to
ss -tlnp sport = :8080

# Count connections per remote IP — useful for spotting a single client hammering your API
# This pipes ss output through awk to extract remote IPs and count occurrences
ss -tn state established | awk '{print $5}' | cut -d: -f1 | sort | uniq -c | sort -rn | head -20

# Show all UDP sockets (DNS, DHCP, NTP all use UDP — don't forget to check these)
ss -ulnp

# Show connection summary statistics by state
# Huge TIME_WAIT count is normal; huge CLOSE_WAIT count may indicate a connection leak
ss -s

# The netstat equivalent for those on older systems without ss
# -a = all sockets, -n = numeric, -t = TCP, -u = UDP, -p = process
netstat -antp
Output
# ss -tlnp
State Recv-Q Send-Q Local Address:Port Peer Address:Port Process
LISTEN 0 128 0.0.0.0:22 0.0.0.0:* users:(("sshd",pid=1023,fd=3))
LISTEN 0 511 0.0.0.0:80 0.0.0.0:* users:(("nginx",pid=2150,fd=6))
LISTEN 0 128 127.0.0.1:5432 0.0.0.0:* users:(("postgres",pid=3201,fd=5))
LISTEN 0 128 0.0.0.0:8080 0.0.0.0:* users:(("node",pid=4422,fd=12))
# ss -s
Total: 234
TCP: 89 (estab 41, closed 12, orphaned 0, timewait 11)
UDP: 8
RAW: 0
FRAG: 0
Pro Tip: 127.0.0.1 vs 0.0.0.0 in the Local Address Column
When ss shows a service listening on 127.0.0.1:5432, it's only reachable from localhost — that's correct for a database. If it shows 0.0.0.0:5432, it's accepting connections from any interface — which may be a security misconfiguration. Always verify your database and internal services are NOT listening on 0.0.0.0 unless you have a firewall rule protecting them.
Production Insight
CLOSE_WAIT is your enemy. A single misbehaving client can leave a connection in CLOSE_WAIT forever.
In one event, a proxy pool leaked connections because the proxy waited for data it would never get.
Rule: if you see >100 CLOSE_WAIT connections on your web server, audit your connection cleanup logic.
Key Takeaway
ss -tlnp is your first command for port listening status.
Watch for 127.0.0.1 vs 0.0.0.0 — security difference.
CLOSE_WAIT >100 indicates a connection leak in your app.
TIME_WAIT is normal, not a problem.
Choosing between ss and netstat
IfSystem has netstat installed and you need a quick check
UseUse netstat -antp, but be aware it's slow on busy systems.
IfYou need fast, filtered output on a production server with many connections
UseUse ss — it's faster and supports complex filters.
IfYou need to see process names without root
UseNeither — process names require root/sudo with -p flag.
IfYou need to analyze UDP or Unix domain sockets
UseUse ss -ulnp for UDP, ss -xl for Unix sockets.

curl, wget and dig — Testing Connectivity Layer by Layer

When a service is unreachable, you need to narrow down the layer where things break. Is it DNS? TCP connectivity? HTTP routing? TLS? curl is exceptional here because it can test each layer independently and gives you precise timing data. dig is your dedicated DNS debugging tool, and together they let you methodically eliminate suspects.

curl's --verbose flag is one of the most useful things in networking debugging. It shows you the DNS resolution, the TCP handshake, the TLS negotiation, and the HTTP headers — all in sequence. When your HTTPS endpoint is slow, curl -w timing reveals whether the slowness is in DNS, in TCP, in TLS, or in the actual server response time.

dig is purpose-built for DNS and goes far beyond what you can learn from ping or curl. You can query specific record types, target specific nameservers, and trace the full delegation chain from root to authoritative server. This matters when you're investigating DNS propagation issues after a domain change, or debugging split-horizon DNS in a VPN setup.

wget is simpler and better for quick file downloads or when you need recursive mirroring. For API testing and network diagnosis, curl is almost always the right choice.

connectivity_layer_testing.shBASH
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
#!/bin/bash
# Test each networking layer independently to isolate where failures occur

# ─── DNS TESTING WITH dig ───────────────────────────────────────────────

# Basic A record lookup — what IP does this hostname resolve to?
dig api.example.com A

# Query a SPECIFIC nameserver (e.g., Google's 8.8.8.8) to bypass your local resolver
# Useful when debugging 'works on my machine' DNS issues caused by local caching
dig @8.8.8.8 api.example.com A

# Trace the full DNS delegation from root servers down to the authoritative nameserver
# This is the gold standard for debugging DNS propagation issues
dig +trace api.example.com

# Check MX records (mail routing) — +short gives just the values, no noise
dig api.example.com MX +short

# Reverse DNS lookup — find the hostname for an IP address
dig -x 93.184.216.34 +short

# ─── HTTP/HTTPS TESTING WITH curl ───────────────────────────────────────

# Follow redirects (-L), show verbose output (-v) — see every step of the connection
curl -Lv https://api.example.com/health

# Time each phase of the connection — invaluable for performance diagnosis
# This custom format prints timing for each phase on separate lines
curl -o /dev/null -s -w "
DNS lookup:        %{time_namelookup}s
TCP connect:       %{time_connect}s
TLS handshake:     %{time_appconnect}s
Time to first byte: %{time_starttransfer}s
Total time:        %{time_total}s
HTTP status code:  %{http_code}
" https://api.example.com/health

# Test an API endpoint with a JSON POST body — simulates what your app does
curl -X POST https://api.example.com/users \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer your-token-here" \
  -d '{"username": "alice", "email": "alice@example.com"}' \
  --max-time 10  # fail after 10 seconds instead of hanging forever

# Test TLS certificate details — check expiry, issuer, and SANs
curl -vI https://api.example.com 2>&1 | grep -E '(subject|issuer|expire|SSL)'

# Skip TLS verification (ONLY for debugging self-signed certs — never in production)
curl -k https://internal-service.local/health
Output
# dig api.example.com A +short
93.184.216.34
# curl timing output
DNS lookup: 0.012s
TCP connect: 0.034s
TLS handshake: 0.089s
Time to first byte: 0.201s
Total time: 0.203s
HTTP status code: 200
# If DNS takes 2+ seconds, your resolver is slow or the record isn't cached
# If TLS handshake takes 500ms+, suspect certificate chain or OCSP issues
# If 'time to first byte' is slow but TLS is fast, the app itself is the bottleneck
Interview Gold: The Layers of a Failed Request
Interviewers love asking 'how would you debug a service that users say is slow?' The layered answer — check DNS with dig, check TCP with ss/telnet, check HTTP timing with curl -w — demonstrates real operational thinking. Always start at the lowest layer (DNS) and work up. Jumping straight to application logs while skipping network diagnostics is the classic junior mistake.
Production Insight
A production outage was caused by a TLS handshake taking 3 seconds because the server was doing OCSP stapling with a slow responder.
curl -w revealed TLS time was 10x normal, but the app logs showed nothing.
Rule: always isolate timing with curl -w before blaming the application code.
Key Takeaway
dig first, then curl -w, then investigate the slowest phase.
curl -v shows every layer of the connection.
Never jump to app logs without first ruling out DNS and network timing issues.
Which tool for which layer?
IfNeed to verify DNS resolution for a hostname
UseUse dig. Start with @8.8.8.8 to bypass local resolver.
IfNeed to test full HTTP/HTTPS path including headers and response
UseUse curl with -v or -w for timing.
IfNeed to download a file or mirror a website
UseUse wget for its recursive download capability.
IfNeed to check TLS certificate chain and expiry
UseUse 'curl -vI https://host' and grep for subject/issuer/expire.

ping, traceroute and tcpdump — Tracing the Path and Catching Packets

Once you've confirmed DNS resolves correctly and the service is listening, the next question is whether packets are actually reaching their destination. ping tells you if a host is reachable and measures round-trip time. traceroute shows you every hop between you and the destination. tcpdump lets you actually capture and inspect raw packets on the wire.

ping is often misused as a binary 'is it up?' test, but it's more nuanced than that. ICMP packets (what ping uses) can be blocked by firewalls while TCP traffic flows fine. A failed ping doesn't mean a service is down — it might just mean ICMP is blocked. Always follow up a failed ping with a TCP-level check.

traceroute reveals the routing path and where latency is introduced. Each hop shows you a router, and timing spikes between hops show you where delays occur. When a cloud VM can't reach an external API, traceroute often reveals the packet dying at a NAT gateway or security group that's silently dropping traffic.

tcpdump is the most powerful of the three but also the most complex. It captures actual packet data, which is essential for diagnosing issues that higher-level tools can't see — like retransmissions, RST floods, or malformed HTTP headers. Always combine it with Wireshark for complex analysis.

packet_path_tracing.shBASH
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
#!/bin/bash
# Trace packet paths and capture traffic for deep network diagnosis

# ─── PING ───────────────────────────────────────────────────────────────

# Basic connectivity check — send 5 packets, show statistics
ping -c 5 8.8.8.8

# Ping with a larger packet size to test MTU issues (1472 bytes + 28 byte IP/ICMP header = 1500 MTU)
# If this fails but small pings work, you likely have an MTU mismatch (common with VPNs)
ping -c 4 -s 1472 -M do 8.8.8.8  # -M do means 'don't fragment'

# ─── TRACEROUTE ─────────────────────────────────────────────────────────

# Standard traceroute using UDP probes
traceroute api.example.com

# Use TCP SYN probes on port 443 instead of UDP
# More likely to get through firewalls that block ICMP and UDP probes
traceroute -T -p 443 api.example.com

# mtr combines ping and traceroute — shows live packet loss per hop
# This is often the single best tool for diagnosing intermittent routing issues
mtr --report --report-cycles 20 api.example.com

# ─── TCPDUMP ────────────────────────────────────────────────────────────

# Capture all traffic on eth0 — CTRL+C to stop
# -n = don't resolve hostnames (faster), -i = interface
tcpdump -i eth0 -n

# Capture only HTTP traffic (port 80 or 443) to/from a specific host
# 'host' filters by IP in either direction
tcpdump -i eth0 -n host 10.0.1.20 and \( port 80 or port 443 \)

# Capture DNS queries — see what your server is resolving and to which nameserver
tcpdump -i eth0 -n port 53

# Save capture to a file for later analysis in Wireshark
# -w writes raw packets, -C 100 rotates files at 100MB to avoid filling the disk
tcpdump -i eth0 -n -w /tmp/capture.pcap -C 100 host 10.0.1.20

# Capture and print packet payloads in ASCII — see the actual HTTP request/response text
# -A = ASCII output, -s 0 = capture full packet (no truncation)
tcpdump -i eth0 -n -A -s 0 port 8080 | grep -E '(GET|POST|HTTP|Host:)'
Output
# ping -c 5 8.8.8.8
PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.
64 bytes from 8.8.8.8: icmp_seq=1 ttl=118 time=4.21 ms
64 bytes from 8.8.8.8: icmp_seq=2 ttl=118 time=4.18 ms
--- 8.8.8.8 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss
rtt min/avg/max/mdev = 4.18/4.22/4.31/0.04 ms
# traceroute -T -p 443 api.example.com
traceroute to api.example.com (93.184.216.34), 30 hops max
1 10.0.1.1 (10.0.1.1) 0.432 ms
2 203.0.113.1 (203.0.113.1) 3.211 ms
3 * * * <-- asterisks mean this hop is not responding (ICMP filtered)
4 93.184.216.34 12.44 ms
# mtr output (after 20 cycles)
Host Loss% Snt Last Avg Best Wrst
1. 10.0.1.1 0.0% 20 0.4 0.4 0.3 0.6
2. 203.0.113.1 0.0% 20 3.2 3.1 2.9 3.8
3. ??? -- hop doesn't respond to probes
4. 93.184.216.34 0.0% 20 12.4 12.3 12.0 13.1
Pro Tip: Asterisks in traceroute Don't Mean Packet Loss
Asterisks ( *) on a traceroute hop mean that router isn't responding to ICMP TTL-exceeded messages — not that your packets aren't passing through it. If the next hop appears and the final destination is reachable, that router is simply configured to drop probe packets silently. Only worry about asterisks if all subsequent hops also show asterisks.
Production Insight
A critical incident: 'ping' to an API returned fine, but curl failed with connection timeouts.
tcpdump on the server showed TCP SYNs arriving but no SYN-ACKs — the server was dropping them.
Turned out the iptables rules had a rule that matched the source IP but not ICMP.
Rule: ping is not a proxy for TCP connectivity. Always test on the same protocol.
Key Takeaway
tcpdump is the last resort, not the first.
Always use ping -c 5, traceroute -T, mtr before tcpdump.
A failed ping does not mean the service is down — verify with curl or nc.
When to use ping, traceroute or tcpdump
IfYou want to check basic reachability and latency to a host
UseUse ping. But remember ICMP may be blocked.
IfYou need to see the full path and identify latency spikes
UseUse traceroute -T -p 443 for TCP-based, or mtr for continuous monitoring.
IfYou need to inspect actual packet contents (e.g., malformed packets, retransmissions)
UseUse tcpdump with filters. Always write to file with -w and limit size with -C.
IfYou suspect packet loss or MTU issues
UseUse ping with -M do and varying -s sizes to find the MTU limit.

Network Performance Monitoring — Bandwidth, Throughput and Bottlenecks

Production servers don't just need connectivity — they need performance. When a service becomes slow, you need to know whether the bottleneck is your server's network interface, the application itself, or somewhere in between. Tools like nload, iftop, iptraf-ng, and nethogs give you real-time bandwidth and per-process traffic data.

nload shows total incoming and outgoing traffic on each interface with a live graph. iftop shows traffic per connection — which remote IPs are consuming the most bandwidth. iptraf-ng adds detailed statistics per protocol and interface. nethogs breaks down traffic per process, so you can see which application is saturating the link.

A common production scenario: a misconfigured backup job or a rogue cron script starts transferring gigabytes of data and saturates the NIC. nethogs reveals 'python3' as the culprit. iftop shows the destination IP is an internal backup server. You then find the backup script is running on the wrong schedule.

For throughput testing, iperf3 is essential. It measures actual TCP/UDP throughput between two hosts, revealing issues like buffer bloat or misconfigured flow control that won't appear in ping.

performance_monitoring.shBASH
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
#!/bin/bash
# Real-time network performance monitoring tools

# ─── INSTALLATION ───────────────────────────────────────────────────────
# sudo apt install nload iftop iptraf-ng nethogs iperf3 -y

# ─── NLOAD: total bandwidth per interface ───────────────────────────────
# Run with interface name (or all interfaces default)
nload eth0

# ─── IFTOP: bandwidth per connection ────────────────────────────────────
# Shows top talkers; press 'p' to toggle port display
sudo iftop -i eth0

# ─── IPTRAF-NG: per-protocol statistics ────────────────────────────────
sudo iptraf-ng
# Then choose 'IP traffic monitor' or 'Detailed interface statistics'

# ─── NETHOGS: bandwidth per process ────────────────────────────────────
sudo nethogs eth0

# ─── IPERF3: throughput test between two hosts ──────────────────────────
# Start server on host B
iperf3 -s

# Connect from host A (client)
iperf3 -c 10.0.1.20 -t 30  # run for 30 seconds

# Test UDP throughput (includes jitter and packet loss)
iperf3 -c 10.0.1.20 -u -b 100M -t 30

# ─── CHECK INTERFACE ERRORS ────────────────────────────────────────────
# Look at 'errors', 'dropped', 'overruns' counters
ip -s link show eth0
Output
# ip -s link show eth0 (partial)
RX: bytes packets errors dropped overrun mcast
12G 15M 0 0 0 5.2M
TX: bytes packets errors dropped carrier collsns
8.4G 11M 0 0 0 0
# nload output (live updating)
Device eth0 (10.0.1.15):
Incoming: 12.34 Mbps (Cumulative: 1.2 GB)
Outgoing: 5.67 Mbps (Cumulative: 500 MB)
# iperf3 client result example
[ ID] Interval Transfer Bandwidth
[ 4] 0.00-30.00 sec 3.48 GBytes 998 Mbits/sec sender
[ 4] 0.00-30.00 sec 3.48 GBytes 997 Mbits/sec receiver
Bandwidth Monitoring for Incident Response
When the on-call phone rings with 'the API is slow', the first thing after checking CPU and memory is to check network bandwidth. If the interface is saturated, a process-level tool like nethogs will tell you exactly which PID is causing it. In one case, a developer left a large database dump running on a cron job that overlapped with peak traffic — nethogs caught it in 30 seconds.
Production Insight
A burst of outgoing traffic from an unexpected process is the top cause of network latency in production.
nethogs revealed a node.js process sending gigabytes of logs to a centralized logging server.
Rule: when latency spikes without a traffic increase, check for accumulated buffering on the NIC.
Key Takeaway
When a service slows down, check network bandwidth before blaming the app.
nethogs shows which process is using the interface.
iperf3 tells you the actual throughput capacity, not just ping latency.
Choosing a performance monitoring tool
IfYou need total bandwidth per interface with a quick view
UseUse nload. Simple, real-time, no root required for monitoring.
IfYou need to see which remote IPs consume the most bandwidth
UseUse iftop. Good for identifying DDOS or heavy client patterns.
IfYou need to know which process is generating traffic
UseUse nethogs. Perfect for finding misbehaving applications.
IfYou need to measure maximum achievable throughput between two hosts
UseUse iperf3. Essential for network capacity planning and troubleshooting flow control.
● Production incidentPOST-MORTEMseverity: high

The Silent Blackhole: When a New Server Stops Responding After a Router Reboot

Symptom
Server accessible via IPMI/console. ping responses normal, but HTTP clients got timeouts. tcpdump showed no incoming TCP SYN packets despite ping replies. From the server itself, outgoing traffic worked fine.
Assumption
Firewall rules on the server or upstream ACLs are blocking inbound traffic. Engineering team spent 45 minutes reviewing security groups and iptables rules.
Root cause
The router's ARP cache held the old MAC address for the server's IP from a previous deployment. The router forwarded packets to the old MAC, which belonged to a different (now decommissioned) host. The new server's NIC never received the traffic. ARP broadcast was failing because the new server's ARP table was stuck with a stale entry from a duplicate IP conflict during provisioning.
Fix
Cleared ARP cache on both the server and the upstream router. On the server: 'ip neigh flush all'. On the router: cleared ARP entry for the server's IP. Then forced a gratuitous ARP broadcast from the server by bouncing the interface with 'ip link set eth0 down && ip link set eth0 up'. Verified with 'arping -c 3 -I eth0 SERVER_IP'.
Key lesson
  • Always verify both arp and route tables after server provisioning or network changes.
  • A single ping reply does not mean the server is reachable on all protocols — ARP is link-layer.
  • After moving IPs between physical hosts, flush the ARP cache on the local subnet's gateway.
Production debug guideQuick reference for common network failure symptoms and the exact commands to run.4 entries
Symptom · 01
Service unreachable, no response from curl
Fix
Check DNS first: dig @8.8.8.8 service.example.com. Then check local listener: ss -tlnp | grep :PORT. Then check remote: nc -zv TARGET_IP PORT. If no response, check route: ip route get TARGET_IP.
Symptom · 02
Intermittent timeouts to an external API
Fix
Run curl with timing: curl -w '@timing_format' -o /dev/null -s https://api.example.com. Check for packet loss: mtr --report --report-cycles 20 api.example.com. Compare to a known-good endpoint.
Symptom · 03
Server sees incoming connections but app doesn't
Fix
Check ss -tlnp to confirm process is listening on expected IP (0.0.0.0 vs 127.0.0.1). Check app logs for bind errors. Verify kernel reverse path filtering: sysctl net.ipv4.conf.all.rp_filter.
Symptom · 04
High latency to a single hop in traceroute
Fix
Run mtr --report --report-cycles 50 TARGET for packet loss per hop. If latency spike is consistent, check interface errors: ip -s link show dev eth0. If the hop is your own router, check ARP table: ip neigh show.
★ Quick Debug Cheat SheetFive-second answers for the most common networking emergencies.
Can't reach any external host
Immediate action
Check default route and DNS
Commands
ip route show default
dig @8.8.8.8 google.com
Fix now
If default route missing: 'ip route add default via GATEWAY_IP dev eth0'. If DNS fails: check /etc/resolv.conf and restart systemd-resolved.
Port 8080 shows as already in use+
Immediate action
Find the process using the port
Commands
ss -tlnp sport = :8080
ps aux | grep PID
Fix now
If you want to free the port: 'kill -9 PID' or stop the service. If you want to keep the process, change your app's config to use a different port.
curl returns 'Connection refused'+
Immediate action
Check if service is listening on expected interface
Commands
ss -tlnp | grep PORT
netstat -antp | grep PORT (if ss not available)
Fix now
If not listening: start the service. If listening on 127.0.0.1 only: reconfigure service to listen on 0.0.0.0 or the external interface.
DNS resolution is slow (2+ seconds)+
Immediate action
Query a public resolver to isolate the problem
Commands
dig @8.8.8.8 example.com +stats
cat /etc/resolv.conf
Fix now
If local resolver slow: check systemd-resolved or dnsmasq. If your ISP's resolver is slow: change to public DNS (8.8.8.8, 1.1.1.1) in /etc/resolv.conf.
Linux Networking Commands Quick Reference
CommandLayerBest Use CaseRequires Root?Modern Alternative
ifconfigLayer 3 (IP)Legacy interface config inspectionNoip addr
ipLayer 2-3Interfaces, routes, ARP, namespacesFor changes, yesStill current — no replacement
netstatLayer 4 (TCP/UDP)Port and socket inspectionFor -p flagss
ssLayer 4 (TCP/UDP)Fast socket stats, large systemsFor -p flagStill current — no replacement
pingLayer 3 (ICMP)Basic reachability and latency checkNoStill current
tracerouteLayer 3Routing path and hop latencyNomtr (combines with ping)
tcpdumpLayer 2-7Raw packet capture and inspectionYestshark (CLI Wireshark)
curlLayer 7 (HTTP/S)API testing, TLS and timing diagnosisNoStill current
digLayer 7 (DNS)DNS record queries and propagation debugNodrill (alternative on some distros)
nloadLayer 2-4Real-time total bandwidth per interfaceNoStill current
iftopLayer 4Bandwidth per connectionYesStill current
nethogsLayer 4Bandwidth per processYesStill current
iperf3Layer 4Throughput and jitter measurementOn server, yesStill current

Key takeaways

1
ip and ss are the modern replacements for ifconfig and netstat
learn ip addr, ip route, and ss -tlnp as your default starting point on any system
2
Always debug network failures layer by layer
DNS first with dig, TCP connectivity second with ss/curl, then HTTP/TLS with curl -v, and packet-level last with tcpdump — jumping layers wastes time
3
tcpdump is your last resort, not your first
always filter by host and port, always write to a file with -w, and cap file size with -C to avoid filling disk on production servers
4
Asterisks in traceroute and a failed ping are not proof of failure
ICMP is routinely blocked by firewalls while TCP traffic flows freely, so always verify with a TCP-level tool before declaring a host unreachable
5
When a service slows, check network bandwidth with nload or nethogs before blaming the application
the problem is often a rogue process saturating the interface

Common mistakes to avoid

4 patterns
×

Using ping to confirm a service is down

Symptom
ping fails so you conclude the server is unreachable, but SSH and HTTP still work
Fix
ICMP is frequently blocked by cloud security groups and firewalls. Always follow a failed ping with a TCP check: 'curl -v http://hostname' or 'nc -zv hostname 22' to confirm the service is actually listening.
×

Running tcpdump without a filter on a busy server

Symptom
Terminal freezes or disk fills within seconds on a server handling thousands of connections per second
Fix
Always specify a host, port, or protocol filter: 'tcpdump -i eth0 host 10.0.1.20 and port 443'. Add '-w filename.pcap' to write to a file instead of flooding stdout, and use '-C 50' to cap file size at 50MB.
×

Trusting a single DNS resolver when debugging resolution failures

Symptom
'dig api.example.com' returns an IP but the service still can't connect from your app
Fix
Your app may be using a different resolver than your shell. Use 'dig @127.0.0.53 api.example.com' to query systemd-resolved specifically, 'cat /etc/resolv.conf' to see which resolver is configured, and compare results. Containers often have their own /etc/resolv.conf pointing to a different resolver entirely.
×

Assuming that a high TIME_WAIT count indicates a problem

Symptom
You see thousands of TIME_WAIT connections in ss output and think the server is misconfigured
Fix
TIME_WAIT is a normal TCP state. It means the connection is closing and waiting for late packets. It's harmless unless the socket table is full (which is rare on modern kernels). Spend your time investigating CLOSE_WAIT instead — that's the real connection leak indicator.
INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01SENIOR
A user reports that a web service is intermittently unreachable but your...
Q02SENIOR
What's the difference between a port that shows as LISTEN on 127.0.0.1 v...
Q03JUNIOR
You run traceroute to an external API endpoint and see the third hop ret...
Q04SENIOR
How would you measure the actual available bandwidth between two cloud i...
Q01 of 04SENIOR

A user reports that a web service is intermittently unreachable but your monitoring shows the server is up. Walk me through exactly how you would diagnose this — which commands would you run and in what order?

ANSWER
I'd start by isolating the problem layer by layer: 1. DNS: dig @8.8.8.8 service.example.com to check resolution. 2. TCP connectivity: nc -zv service.example.com 443 from a client outside the cluster. 3. Local listener: ssh into the server and run ss -tlnp | grep :443 to confirm the process is listening. 4. Firewall: check iptables -L and cloud security groups. 5. Application logs: if all network layers work, check the app's logs for errors. If it's intermittent, I'd set up continuous monitoring with curl -w every few seconds and correlate with timing and response codes.
FAQ · 5 QUESTIONS

Frequently Asked Questions

01
What is the difference between ss and netstat in Linux?
02
How do I find which process is using a specific port in Linux?
03
Why does ping fail but the service is still accessible in a browser?
04
What does 'ss -tlnp' stand for and when would I use it?
05
How can I capture traffic for later analysis without overloading my production server?
🔥

That's Linux. Mark it forged?

5 min read · try the examples if you haven't

Previous
cron Jobs in Linux
8 / 12 · Linux
Next
SSH and SCP Explained