Intermediate 6 min · March 06, 2026

Linux Networking Commands — Why Ping Works But TCP Fails

Q: What is the difference between ss and netstat in Linux?

Both show socket and port information, but ss is the modern replacement for netstat. ss communicates directly with the Linux kernel via netlink sockets, which makes it far faster on systems with thousands of connections. netstat relies on parsing /proc filesystem files, which becomes slow under high load. On modern distributions (Ubuntu 20.04+, RHEL 8+), netstat may not even be installed by default — use ss -tlnp instead.

Q: How do I find which process is using a specific port in Linux?

Run 'sudo ss -tlnp sport = :PORT_NUMBER' replacing PORT_NUMBER with the port you're investigating — for example 'sudo ss -tlnp sport = :8080'. The -p flag shows the process name and PID. You need sudo because processes owned by other users aren't visible without root privileges. Alternatively, 'sudo lsof -i :8080' also works and gives similar output.

Q: Why does ping fail but the service is still accessible in a browser?

ping uses ICMP, which is a completely different protocol from TCP (which HTTP and HTTPS use). Cloud providers like AWS and GCP block ICMP by default in their security groups and firewall rules. So a host can be fully accessible via HTTP on port 80 while refusing to respond to ping entirely. This is intentional — exposing ICMP can enable network reconnaissance. Always use a TCP-based check like 'curl -v http://hostname' to confirm a service is truly unreachable, not just ping.

Q: What does 'ss -tlnp' stand for and when would I use it?

It stands for: -t (TCP sockets only), -l (listening sockets only), -n (numeric output, don't resolve hostnames), -p (show process name and PID). You use this command immediately after deploying a new service to verify it's listening on the expected port and interface. For example, after starting a Node.js app on port 3000, run 'ss -tlnp | grep 3000' to confirm it's running.

Q: How can I capture traffic for later analysis without overloading my production server?

Use tcpdump with strict filters and file size limits: ' sudo tcpdump -i eth0 host 10.0.1.20 and port 443 -w /tmp/capture.pcap -C 100 ' This writes to a file instead of stdout, rotates at 100MB increments, and only captures traffic to/from that specific host on port 443. After capture, copy the .pcap file to your laptop and analyze with Wireshark for deep inspection.

Ping responses normal but TCP SYN packets vanished due to a stale ARP cache.

Naren Founder & Principal Engineer

20+ years shipping production infrastructure and CI/CD at scale. Written from production experience, not tutorials.

✓ Production

production tested

July 19, 2026

last updated

2,466

articles · all by Naren

Before you start⏱ 25 min

✓Solid grasp of DevOps fundamentals
✓Comfortable with command-line tools
✓Basic Linux administration knowledge

● Production Incident 🔎 Debug Guide ⚙ Triage Commands

⚡Quick Answer

The ip command replaces ifconfig for interface and route management
ss replaces netstat for socket statistics and is dramatically faster on busy servers
dig tests DNS layer; curl -w gives exact timing for each TLS/TCP phase
tcpdump requires filters: always specify port or host to avoid flooding disk
ping and traceroute can mislead when ICMP is blocked — verify with TCP probes

✦ Definition~90s read

What is Linux Networking Commands?

Linux networking commands are the system administrator's toolkit for diagnosing, debugging, and monitoring network connectivity from the command line. They exist because Linux servers rarely have a GUI, and network problems—whether a misconfigured firewall, a dead route, or a saturated link—require precise, low-level interrogation.

★

Imagine your computer is a post office.

These commands let you inspect interfaces (ip, ifconfig), check listening ports (ss, netstat), test application-layer reachability (curl, wget), resolve DNS (dig), trace packet paths (traceroute), and capture raw traffic (tcpdump). Without them, you're flying blind in a distributed system.

The classic distinction between ICMP (ping) and TCP failures is the bread and butter of network troubleshooting. Ping works because ICMP is often allowed through firewalls and doesn't require a listening service—it tests Layer 3 reachability. TCP fails when a port is closed, a firewall drops SYN packets, or an application isn't running.

Understanding this gap is why you need ss to verify a service is actually listening, curl to test HTTP specifically, and tcpdump to see exactly where packets die. The modern replacements (ip over ifconfig, ss over netstat) exist because the old tools couldn't handle namespaces, multiple routing tables, or the scale of modern containerized networks.

In practice, you'll use these commands in a layered approach: start with ping to confirm basic IP connectivity, then dig to verify DNS resolution, then curl to test the application port, and finally tcpdump or traceroute to pinpoint where the chain breaks. For performance, iperf3 measures throughput, ss shows connection queues, and tc (traffic control) reveals bottlenecks.

These tools are essential for any engineer running production systems—they're the difference between guessing and knowing.

Plain-English First

Imagine your computer is a post office. Networking commands are the tools the postmaster uses to check which delivery routes are open, which packages are stuck, and whether the roads between buildings are working. Just like a postmaster can trace a lost parcel or see which trucks are currently on the road, Linux networking commands let you trace packets, spot blocked ports, and see exactly which processes are talking to the outside world.

Every production outage has a moment — usually at 2am — where someone types a networking command into a terminal and either finds the problem in 30 seconds or spends three hours guessing. Linux networking tools are what separates a DevOps engineer who can diagnose a flaky microservice from one who just restarts containers and hopes for the best. These commands are your stethoscope for the network layer.

Why Ping Works But TCP Fails

Linux networking commands are the tools that expose the OSI stack's raw behavior — they let you probe, diagnose, and manipulate network state from user space. Ping uses ICMP echo requests, which operate at the network layer (L3) and require no port or connection state. TCP, on the other hand, requires a three-way handshake, kernel socket buffers, and firewall rules that inspect L4 headers. This fundamental difference means ping can succeed while TCP connections hang or reset.

In practice, ICMP packets bypass iptables rules that filter TCP ports, and they don't consume ephemeral ports or socket file descriptors. A server can respond to ping even when its TCP listen backlog is full, or when the application process has crashed but the kernel network stack still handles ICMP. This is why ping is a poor proxy for application health — it tests L3 reachability, not service availability.

Use ping first to rule out L1-L3 issues (cable, ARP, routing). When ping succeeds but TCP fails, focus on firewall rules (iptables, nftables), socket limits (net.core.somaxconn, net.ipv4.tcp_max_syn_backlog), and application process state. In production, always pair ping with a TCP-specific probe like nc -zv or curl to get the full picture.

⚠ Ping ≠ Health Check

A server that pings fine can still have a dead application, full listen backlog, or firewall silently dropping TCP SYN packets — never rely on ping alone for monitoring.

📊 Production Insight

A Kubernetes pod with a readiness probe using ping passes while the app inside is OOM-killed because the kernel still responds to ICMP.

Symptom: load balancer reports the pod as healthy, but all TCP connections time out with no SYN-ACK.

Rule: always probe the exact port and protocol your service uses — ICMP success means nothing for L4+ availability.

🎯 Key Takeaway

Ping tests L3 reachability, not service health — TCP failure with ping success points to L4+ issues.

Firewalls, socket backlogs, and application crashes all break TCP while leaving ICMP intact.

Always use a TCP-specific tool (nc, curl, ss -tlnp) to confirm actual service availability.

thecodeforge.io

Linux Networking Commands

ip vs ifconfig — Why the Old Tool Is Dead and What Replaced It

For years, ifconfig was the go-to command for inspecting network interfaces. It still works on many systems, but it's been deprecated and is no longer installed by default on modern Linux distributions like Ubuntu 20.04+ and RHEL 8+. The replacement is the ip command, which is part of the iproute2 package and talks directly to the kernel's netlink socket instead of parsing /proc files.

The key difference isn't just syntax — it's capability. ip can manage routing tables, network namespaces, tunnels, and ARP/NDP caches all through one unified tool. ifconfig could only touch interfaces and basic IP configuration.

When you're debugging a container networking issue in Kubernetes, you'll often drop into a pod's network namespace and run ip addr to see what the container thinks its IP is. That's impossible with ifconfig, which has no namespace awareness. Know ip deeply and you'll be comfortable anywhere from a bare-metal server to a Docker container.

network_interface_inspection.shBASH

#!/bin/bash
# Shows practical ip command usage for interface inspection
# Run these on any modern Linux host

# List all network interfaces with their IP addresses
# The 'addr' subcommand shows Layer 3 (IP) info bound to each interface
ip addr show

# Filter to just one interface — useful when you have many (eth0, lo, docker0, etc.)
ip addr show dev eth0

# Show the routing table — which gateway handles which destination networks
# This is the first thing to check when packets aren't leaving the host
ip route show

# Show the default gateway specifically — the 'exit door' for all unknown traffic
ip route show default

# Add a temporary static route for a specific subnet via a specific gateway
# This survives until reboot — use /etc/netplan or /etc/network/interfaces for persistence
ip route add 192.168.50.0/24 via 10.0.0.1 dev eth0

# Bring an interface DOWN and back UP without rebooting
# Useful after changing IP config or when an interface is in a bad state
ip link set eth0 down
ip link set eth0 up

# Check ARP cache — maps IP addresses to MAC addresses on your local network
# If a host's MAC shows as INCOMPLETE, ARP resolution is failing — suspect a firewall or VLAN issue
ip neigh show

Output

# ip addr show

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN

link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00

inet 127.0.0.1/8 scope host lo

2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP

link/ether 52:54:00:ab:cd:ef brd ff:ff:ff:ff:ff:ff

inet 10.0.1.15/24 brd 10.0.1.255 scope global dynamic eth0

# ip route show default

default via 10.0.1.1 dev eth0 proto dhcp src 10.0.1.15 metric 100

# ip neigh show

10.0.1.1 dev eth0 lladdr 52:54:00:11:22:33 REACHABLE

10.0.1.20 dev eth0 lladdr 52:54:00:aa:bb:cc STALE

⚠ Watch Out: ifconfig Changes Don't Stick

Any interface change made with ip (or ifconfig) is wiped on reboot. To persist changes, use your distro's config files — netplan YAML files on Ubuntu, nmcli/NetworkManager on RHEL, or /etc/network/interfaces on Debian. Making changes with ip and forgetting to persist them is a classic 'works until the server reboots' trap.

📊 Production Insight

In one incident, a team added a secondary IP with 'ip addr add' during a migration and forgot to persist it.

When the server rebooted for a kernel update, the IP vanished and the application broke.

Rule: always configure persistent network settings in distro-specific files, not ad-hoc ip commands.

🎯 Key Takeaway

ip is the single command you need for Layer 2-3 inspection.

ifconfig is dead. Don't learn it.

ip addr, ip route, ip link, ip neigh — these four subcommands cover 90% of daily needs.

When to use ip vs ifconfig

IfYou need to inspect or modify ARP/NDP cache

→

UseUse 'ip neigh' — ifconfig cannot manage ARP at all.

IfYou are inside a container or network namespace

→

UseUse 'ip' — ifconfig has no namespace awareness.

IfYou only need to show IP and MAC of local interfaces

→

UseBoth work, but 'ip addr show' is more detailed.

IfYou are on an ancient system without iproute2

→

UseUse ifconfig as fallback, but plan to migrate.

ss and netstat — Seeing Every Open Door on Your Server

Think of your server as a building with thousands of numbered doors (ports). ss and netstat let you see exactly which doors are open, who's standing at each one, and which processes are responsible. This is critical when deploying a new service — you need to know whether port 8080 is already taken before your app fails to bind to it.

netstat is the old tool, ss (Socket Statistics) is the modern replacement. ss talks directly to the kernel via netlink, which makes it dramatically faster on systems with thousands of connections. On a busy web server, netstat can take 10+ seconds while ss returns instantly.

The real power of ss is in its filtering. You can filter by state (ESTABLISHED, LISTEN, TIME_WAIT), by port, by process, or by remote address. In a microservices environment, you might want to see all connections from this service to the database on port 5432 — ss makes that a one-liner.

Understanding TCP connection states matters here. TIME_WAIT is normal and means your server is waiting for late packets before closing a connection. A flood of TIME_WAIT entries is usually fine. CLOSE_WAIT means the remote side closed but your application hasn't — that often points to a bug in connection handling code.

socket_inspection.shBASH

#!/bin/bash
# Practical ss usage patterns for diagnosing connection issues

# Show all LISTENING TCP ports with the process that owns them
# -t = TCP only, -l = listening sockets, -n = numeric (don't resolve hostnames, much faster),
# -p = show the process (requires root or sudo for processes owned by other users)
ss -tlnp

# Show all ESTABLISHED connections — who is currently connected to this server
ss -tn state established

# Find what's using a specific port (e.g., 8080)
# The 'sport' filter means 'source port' — the local port your service is bound to
ss -tlnp sport = :8080

# Count connections per remote IP — useful for spotting a single client hammering your API
# This pipes ss output through awk to extract remote IPs and count occurrences
ss -tn state established | awk '{print $5}' | cut -d: -f1 | sort | uniq -c | sort -rn | head -20

# Show all UDP sockets (DNS, DHCP, NTP all use UDP — don't forget to check these)
ss -ulnp

# Show connection summary statistics by state
# Huge TIME_WAIT count is normal; huge CLOSE_WAIT count may indicate a connection leak
ss -s

# The netstat equivalent for those on older systems without ss
# -a = all sockets, -n = numeric, -t = TCP, -u = UDP, -p = process
netstat -antp

Output

# ss -tlnp

State Recv-Q Send-Q Local Address:Port Peer Address:Port Process

LISTEN 0 128 0.0.0.0:22 0.0.0.0:* users:(("sshd",pid=1023,fd=3))

LISTEN 0 511 0.0.0.0:80 0.0.0.0:* users:(("nginx",pid=2150,fd=6))

LISTEN 0 128 127.0.0.1:5432 0.0.0.0:* users:(("postgres",pid=3201,fd=5))

LISTEN 0 128 0.0.0.0:8080 0.0.0.0:* users:(("node",pid=4422,fd=12))

# ss -s

Total: 234

TCP: 89 (estab 41, closed 12, orphaned 0, timewait 11)

UDP: 8

RAW: 0

FRAG: 0

💡Pro Tip: 127.0.0.1 vs 0.0.0.0 in the Local Address Column

When ss shows a service listening on 127.0.0.1:5432, it's only reachable from localhost — that's correct for a database. If it shows 0.0.0.0:5432, it's accepting connections from any interface — which may be a security misconfiguration. Always verify your database and internal services are NOT listening on 0.0.0.0 unless you have a firewall rule protecting them.

📊 Production Insight

CLOSE_WAIT is your enemy. A single misbehaving client can leave a connection in CLOSE_WAIT forever.

In one event, a proxy pool leaked connections because the proxy waited for data it would never get.

Rule: if you see >100 CLOSE_WAIT connections on your web server, audit your connection cleanup logic.

🎯 Key Takeaway

ss -tlnp is your first command for port listening status.

Watch for 127.0.0.1 vs 0.0.0.0 — security difference.

CLOSE_WAIT >100 indicates a connection leak in your app.

TIME_WAIT is normal, not a problem.

Choosing between ss and netstat

IfSystem has netstat installed and you need a quick check

→

UseUse netstat -antp, but be aware it's slow on busy systems.

IfYou need fast, filtered output on a production server with many connections

→

UseUse ss — it's faster and supports complex filters.

IfYou need to see process names without root

→

UseNeither — process names require root/sudo with -p flag.

IfYou need to analyze UDP or Unix domain sockets

→

UseUse ss -ulnp for UDP, ss -xl for Unix sockets.

thecodeforge.io

Linux Networking Commands

curl, wget and dig — Testing Connectivity Layer by Layer

When a service is unreachable, you need to narrow down the layer where things break. Is it DNS? TCP connectivity? HTTP routing? TLS? curl is exceptional here because it can test each layer independently and gives you precise timing data. dig is your dedicated DNS debugging tool, and together they let you methodically eliminate suspects.

curl's --verbose flag is one of the most useful things in networking debugging. It shows you the DNS resolution, the TCP handshake, the TLS negotiation, and the HTTP headers — all in sequence. When your HTTPS endpoint is slow, curl -w timing reveals whether the slowness is in DNS, in TCP, in TLS, or in the actual server response time.

dig is purpose-built for DNS and goes far beyond what you can learn from ping or curl. You can query specific record types, target specific nameservers, and trace the full delegation chain from root to authoritative server. This matters when you're investigating DNS propagation issues after a domain change, or debugging split-horizon DNS in a VPN setup.

wget is simpler and better for quick file downloads or when you need recursive mirroring. For API testing and network diagnosis, curl is almost always the right choice.

connectivity_layer_testing.shBASH

#!/bin/bash
# Test each networking layer independently to isolate where failures occur

# ─── DNS TESTING WITH dig ───────────────────────────────────────────────

# Basic A record lookup — what IP does this hostname resolve to?
dig api.example.com A

# Query a SPECIFIC nameserver (e.g., Google's 8.8.8.8) to bypass your local resolver
# Useful when debugging 'works on my machine' DNS issues caused by local caching
dig @8.8.8.8 api.example.com A

# Trace the full DNS delegation from root servers down to the authoritative nameserver
# This is the gold standard for debugging DNS propagation issues
dig +trace api.example.com

# Check MX records (mail routing) — +short gives just the values, no noise
dig api.example.com MX +short

# Reverse DNS lookup — find the hostname for an IP address
dig -x 93.184.216.34 +short

# ─── HTTP/HTTPS TESTING WITH curl ───────────────────────────────────────

# Follow redirects (-L), show verbose output (-v) — see every step of the connection
curl -Lv https://api.example.com/health

# Time each phase of the connection — invaluable for performance diagnosis
# This custom format prints timing for each phase on separate lines
curl -o /dev/null -s -w "
DNS lookup:        %{time_namelookup}s
TCP connect:       %{time_connect}s
TLS handshake:     %{time_appconnect}s
Time to first byte: %{time_starttransfer}s
Total time:        %{time_total}s
HTTP status code:  %{http_code}
" https://api.example.com/health

# Test an API endpoint with a JSON POST body — simulates what your app does
curl -X POST https://api.example.com/users \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer your-token-here" \
  -d '{"username": "alice", "email": "alice@example.com"}' \
  --max-time 10  # fail after 10 seconds instead of hanging forever

# Test TLS certificate details — check expiry, issuer, and SANs
curl -vI https://api.example.com 2>&1 | grep -E '(subject|issuer|expire|SSL)'

# Skip TLS verification (ONLY for debugging self-signed certs — never in production)
curl -k https://internal-service.local/health

Output

# dig api.example.com A +short

93.184.216.34

# curl timing output

DNS lookup: 0.012s

TCP connect: 0.034s

TLS handshake: 0.089s

Time to first byte: 0.201s

Total time: 0.203s

HTTP status code: 200

# If DNS takes 2+ seconds, your resolver is slow or the record isn't cached

# If TLS handshake takes 500ms+, suspect certificate chain or OCSP issues

# If 'time to first byte' is slow but TLS is fast, the app itself is the bottleneck

🔥Interview Gold: The Layers of a Failed Request

Interviewers love asking 'how would you debug a service that users say is slow?' The layered answer — check DNS with dig, check TCP with ss/telnet, check HTTP timing with curl -w — demonstrates real operational thinking. Always start at the lowest layer (DNS) and work up. Jumping straight to application logs while skipping network diagnostics is the classic junior mistake.

📊 Production Insight

A production outage was caused by a TLS handshake taking 3 seconds because the server was doing OCSP stapling with a slow responder.

curl -w revealed TLS time was 10x normal, but the app logs showed nothing.

Rule: always isolate timing with curl -w before blaming the application code.

🎯 Key Takeaway

dig first, then curl -w, then investigate the slowest phase.

curl -v shows every layer of the connection.

Never jump to app logs without first ruling out DNS and network timing issues.

Which tool for which layer?

IfNeed to verify DNS resolution for a hostname

→

UseUse dig. Start with @8.8.8.8 to bypass local resolver.

IfNeed to test full HTTP/HTTPS path including headers and response

→

UseUse curl with -v or -w for timing.

IfNeed to download a file or mirror a website

→

UseUse wget for its recursive download capability.

IfNeed to check TLS certificate chain and expiry

→

UseUse 'curl -vI https://host' and grep for subject/issuer/expire.

ping, traceroute and tcpdump — Tracing the Path and Catching Packets

Once you've confirmed DNS resolves correctly and the service is listening, the next question is whether packets are actually reaching their destination. ping tells you if a host is reachable and measures round-trip time. traceroute shows you every hop between you and the destination. tcpdump lets you actually capture and inspect raw packets on the wire.

ping is often misused as a binary 'is it up?' test, but it's more nuanced than that. ICMP packets (what ping uses) can be blocked by firewalls while TCP traffic flows fine. A failed ping doesn't mean a service is down — it might just mean ICMP is blocked. Always follow up a failed ping with a TCP-level check.

traceroute reveals the routing path and where latency is introduced. Each hop shows you a router, and timing spikes between hops show you where delays occur. When a cloud VM can't reach an external API, traceroute often reveals the packet dying at a NAT gateway or security group that's silently dropping traffic.

tcpdump is the most powerful of the three but also the most complex. It captures actual packet data, which is essential for diagnosing issues that higher-level tools can't see — like retransmissions, RST floods, or malformed HTTP headers. Always combine it with Wireshark for complex analysis.

packet_path_tracing.shBASH

#!/bin/bash
# Trace packet paths and capture traffic for deep network diagnosis

# ─── PING ───────────────────────────────────────────────────────────────

# Basic connectivity check — send 5 packets, show statistics
ping -c 5 8.8.8.8

# Ping with a larger packet size to test MTU issues (1472 bytes + 28 byte IP/ICMP header = 1500 MTU)
# If this fails but small pings work, you likely have an MTU mismatch (common with VPNs)
ping -c 4 -s 1472 -M do 8.8.8.8  # -M do means 'don't fragment'

# ─── TRACEROUTE ─────────────────────────────────────────────────────────

# Standard traceroute using UDP probes
traceroute api.example.com

# Use TCP SYN probes on port 443 instead of UDP
# More likely to get through firewalls that block ICMP and UDP probes
traceroute -T -p 443 api.example.com

# mtr combines ping and traceroute — shows live packet loss per hop
# This is often the single best tool for diagnosing intermittent routing issues
mtr --report --report-cycles 20 api.example.com

# ─── TCPDUMP ────────────────────────────────────────────────────────────

# Capture all traffic on eth0 — CTRL+C to stop
# -n = don't resolve hostnames (faster), -i = interface
tcpdump -i eth0 -n

# Capture only HTTP traffic (port 80 or 443) to/from a specific host
# 'host' filters by IP in either direction
tcpdump -i eth0 -n host 10.0.1.20 and \( port 80 or port 443 \)

# Capture DNS queries — see what your server is resolving and to which nameserver
tcpdump -i eth0 -n port 53

# Save capture to a file for later analysis in Wireshark
# -w writes raw packets, -C 100 rotates files at 100MB to avoid filling the disk
tcpdump -i eth0 -n -w /tmp/capture.pcap -C 100 host 10.0.1.20

# Capture and print packet payloads in ASCII — see the actual HTTP request/response text
# -A = ASCII output, -s 0 = capture full packet (no truncation)
tcpdump -i eth0 -n -A -s 0 port 8080 | grep -E '(GET|POST|HTTP|Host:)'

Output

# ping -c 5 8.8.8.8

PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.

64 bytes from 8.8.8.8: icmp_seq=1 ttl=118 time=4.21 ms

64 bytes from 8.8.8.8: icmp_seq=2 ttl=118 time=4.18 ms

--- 8.8.8.8 ping statistics ---

5 packets transmitted, 5 received, 0% packet loss

rtt min/avg/max/mdev = 4.18/4.22/4.31/0.04 ms

# traceroute -T -p 443 api.example.com

traceroute to api.example.com (93.184.216.34), 30 hops max

1 10.0.1.1 (10.0.1.1) 0.432 ms

2 203.0.113.1 (203.0.113.1) 3.211 ms

3 * * * <-- asterisks mean this hop is not responding (ICMP filtered)

4 93.184.216.34 12.44 ms

# mtr output (after 20 cycles)

Host Loss% Snt Last Avg Best Wrst

1. 10.0.1.1 0.0% 20 0.4 0.4 0.3 0.6

2. 203.0.113.1 0.0% 20 3.2 3.1 2.9 3.8

3. ??? -- hop doesn't respond to probes

4. 93.184.216.34 0.0% 20 12.4 12.3 12.0 13.1

💡Pro Tip: Asterisks in traceroute Don't Mean Packet Loss

Asterisks ( *) on a traceroute hop mean that router isn't responding to ICMP TTL-exceeded messages — not that your packets aren't passing through it. If the next hop appears and the final destination is reachable, that router is simply configured to drop probe packets silently. Only worry about asterisks if all subsequent hops also show asterisks.

📊 Production Insight

A critical incident: 'ping' to an API returned fine, but curl failed with connection timeouts.

tcpdump on the server showed TCP SYNs arriving but no SYN-ACKs — the server was dropping them.

Turned out the iptables rules had a rule that matched the source IP but not ICMP.

Rule: ping is not a proxy for TCP connectivity. Always test on the same protocol.

🎯 Key Takeaway

tcpdump is the last resort, not the first.

Always use ping -c 5, traceroute -T, mtr before tcpdump.

A failed ping does not mean the service is down — verify with curl or nc.

When to use ping, traceroute or tcpdump

IfYou want to check basic reachability and latency to a host

→

UseUse ping. But remember ICMP may be blocked.

IfYou need to see the full path and identify latency spikes

→

UseUse traceroute -T -p 443 for TCP-based, or mtr for continuous monitoring.

IfYou need to inspect actual packet contents (e.g., malformed packets, retransmissions)

→

UseUse tcpdump with filters. Always write to file with -w and limit size with -C.

IfYou suspect packet loss or MTU issues

→

UseUse ping with -M do and varying -s sizes to find the MTU limit.

Network Performance Monitoring — Bandwidth, Throughput and Bottlenecks

Production servers don't just need connectivity — they need performance. When a service becomes slow, you need to know whether the bottleneck is your server's network interface, the application itself, or somewhere in between. Tools like nload, iftop, iptraf-ng, and nethogs give you real-time bandwidth and per-process traffic data.

nload shows total incoming and outgoing traffic on each interface with a live graph. iftop shows traffic per connection — which remote IPs are consuming the most bandwidth. iptraf-ng adds detailed statistics per protocol and interface. nethogs breaks down traffic per process, so you can see which application is saturating the link.

A common production scenario: a misconfigured backup job or a rogue cron script starts transferring gigabytes of data and saturates the NIC. nethogs reveals 'python3' as the culprit. iftop shows the destination IP is an internal backup server. You then find the backup script is running on the wrong schedule.

For throughput testing, iperf3 is essential. It measures actual TCP/UDP throughput between two hosts, revealing issues like buffer bloat or misconfigured flow control that won't appear in ping.

performance_monitoring.shBASH

#!/bin/bash
# Real-time network performance monitoring tools

# ─── INSTALLATION ───────────────────────────────────────────────────────
# sudo apt install nload iftop iptraf-ng nethogs iperf3 -y

# ─── NLOAD: total bandwidth per interface ───────────────────────────────
# Run with interface name (or all interfaces default)
nload eth0

# ─── IFTOP: bandwidth per connection ────────────────────────────────────
# Shows top talkers; press 'p' to toggle port display
sudo iftop -i eth0

# ─── IPTRAF-NG: per-protocol statistics ────────────────────────────────
sudo iptraf-ng
# Then choose 'IP traffic monitor' or 'Detailed interface statistics'

# ─── NETHOGS: bandwidth per process ────────────────────────────────────
sudo nethogs eth0

# ─── IPERF3: throughput test between two hosts ──────────────────────────
# Start server on host B
iperf3 -s

# Connect from host A (client)
iperf3 -c 10.0.1.20 -t 30  # run for 30 seconds

# Test UDP throughput (includes jitter and packet loss)
iperf3 -c 10.0.1.20 -u -b 100M -t 30

# ─── CHECK INTERFACE ERRORS ────────────────────────────────────────────
# Look at 'errors', 'dropped', 'overruns' counters
ip -s link show eth0

Output

# ip -s link show eth0 (partial)

RX: bytes packets errors dropped overrun mcast

12G 15M 0 0 0 5.2M

TX: bytes packets errors dropped carrier collsns

8.4G 11M 0 0 0 0

# nload output (live updating)

Device eth0 (10.0.1.15):

Incoming: 12.34 Mbps (Cumulative: 1.2 GB)

Outgoing: 5.67 Mbps (Cumulative: 500 MB)

# iperf3 client result example

[ ID] Interval Transfer Bandwidth

[ 4] 0.00-30.00 sec 3.48 GBytes 998 Mbits/sec sender

[ 4] 0.00-30.00 sec 3.48 GBytes 997 Mbits/sec receiver

🔥Bandwidth Monitoring for Incident Response

When the on-call phone rings with 'the API is slow', the first thing after checking CPU and memory is to check network bandwidth. If the interface is saturated, a process-level tool like nethogs will tell you exactly which PID is causing it. In one case, a developer left a large database dump running on a cron job that overlapped with peak traffic — nethogs caught it in 30 seconds.

📊 Production Insight

A burst of outgoing traffic from an unexpected process is the top cause of network latency in production.

nethogs revealed a node.js process sending gigabytes of logs to a centralized logging server.

Rule: when latency spikes without a traffic increase, check for accumulated buffering on the NIC.

🎯 Key Takeaway

When a service slows down, check network bandwidth before blaming the app.

nethogs shows which process is using the interface.

iperf3 tells you the actual throughput capacity, not just ping latency.

Choosing a performance monitoring tool

IfYou need total bandwidth per interface with a quick view

→

UseUse nload. Simple, real-time, no root required for monitoring.

IfYou need to see which remote IPs consume the most bandwidth

→

UseUse iftop. Good for identifying DDOS or heavy client patterns.

IfYou need to know which process is generating traffic

→

UseUse nethogs. Perfect for finding misbehaving applications.

IfYou need to measure maximum achievable throughput between two hosts

→

UseUse iperf3. Essential for network capacity planning and troubleshooting flow control.

DNS Debugging — Why Your App Connects but Resolves to Nothing

DNS looks simple. It isn't. The worst outages I've seen weren't packet loss or firewall rules — they were DNS caching a dead record for 24 hours. Before you blame the network, prove the name resolution is working in isolation.

Start with dig. It bypasses your system resolver and queries the authoritative nameserver directly. If dig google.com returns an IP but ping google.com fails, your local resolver is poisoning the cache. Flush it with resolvectl flush-caches on systemd systems or restart systemd-resolved.

nslookup is the old standby but lies about retries. Use dig +short for scripting. Check TTL values carefully — high TTLs hide failures until clients expire. The host command is great for quick reverse lookups.

For production: always test with dig @1.1.1.1 first. That bypasses your corporate DNS server and tells you if the problem is upstream or local.

dns_debug.shBASH

// io.thecodeforge
# Test DNS resolution bypassing local cache
dig +short google.com @1.1.1.1

# Check authoritative nameserver and TTL
dig google.com NS +short

# Reverse lookup for a known IP
dig -x 8.8.8.8 +short

Output

142.250.80.46

ns1.google.com.

dns.google.

⚠ Production Trap:

nslookup returns cached results from the resolver. Always use dig @<server> with an explicit resolver to see the raw DNS response.

🎯 Key Takeaway

Always isolate DNS from network — dig @1.1.1.1 first, then blame the cable.

Firewall Forensics — iptables Is Dead, nftables Is What You Debug at 3 AM

The old iptables tool is deprecated. Every modern RHEL 9, Debian 12, and Ubuntu 24+ ship with nftables as the default firewall engine. If you still run iptables -L, you're looking at a compatibility layer that lies about actual rules.

Why this matters: I once spent four hours debugging why a Kubernetes node couldn't reach itself on port 8443. The culprit was an nftables set with a typo in the service CIDR. iptables -L showed nothing. nft list ruleset showed the exact broken line.

Learn nft list ruleset to dump all rules. Use nft add rule for temporary fixes during incidents. For persistent changes, edit /etc/nftables.conf and run nft -f.

Key difference: nftables uses tables, chains, and sets. Think of sets as efficient IP address groups. Debug with nft monitor to see packets hitting rules in real time — game changer for chasing phantom drops.

nftables_debug.shBASH

// io.thecodeforge
# Show all active nftables rules
nft list ruleset

# Monitor packets hitting rules in real-time
nft monitor | grep -E 'drop|reject'

# Add a temporary rule to log dropped packets
nft add rule inet filter input log prefix "DROPPED: " drop

Output

table inet filter {

chain input {

type filter hook input priority filter; policy drop;

ct state established,related accept

}

[timestamp] DROPPED: IN=eth0 SRC=10.0.0.5

🔥Migration Note:

iptables wrapper is still installed for compatibility. Check if you're using real nftables with iptables --version — if it says 'nf_tables', you're on the new engine.

🎯 Key Takeaway

Stop using iptables for debug. Run nft list ruleset — that's the source of truth.

● Production incidentPOST-MORTEMseverity: high

The Silent Blackhole: When a New Server Stops Responding After a Router Reboot

Symptom

Server accessible via IPMI/console. ping responses normal, but HTTP clients got timeouts. tcpdump showed no incoming TCP SYN packets despite ping replies. From the server itself, outgoing traffic worked fine.

Assumption

Firewall rules on the server or upstream ACLs are blocking inbound traffic. Engineering team spent 45 minutes reviewing security groups and iptables rules.

Root cause

The router's ARP cache held the old MAC address for the server's IP from a previous deployment. The router forwarded packets to the old MAC, which belonged to a different (now decommissioned) host. The new server's NIC never received the traffic. ARP broadcast was failing because the new server's ARP table was stuck with a stale entry from a duplicate IP conflict during provisioning.

Fix

Cleared ARP cache on both the server and the upstream router. On the server: 'ip neigh flush all'. On the router: cleared ARP entry for the server's IP. Then forced a gratuitous ARP broadcast from the server by bouncing the interface with 'ip link set eth0 down && ip link set eth0 up'. Verified with 'arping -c 3 -I eth0 SERVER_IP'.

Key lesson

Always verify both arp and route tables after server provisioning or network changes.
A single ping reply does not mean the server is reachable on all protocols — ARP is link-layer.
After moving IPs between physical hosts, flush the ARP cache on the local subnet's gateway.

Production debug guideQuick reference for common network failure symptoms and the exact commands to run.4 entries

Symptom · 01

Service unreachable, no response from curl

→

Fix

Check DNS first: dig @8.8.8.8 service.example.com. Then check local listener: ss -tlnp | grep :PORT. Then check remote: nc -zv TARGET_IP PORT. If no response, check route: ip route get TARGET_IP.

Symptom · 02

Intermittent timeouts to an external API

→

Fix

Run curl with timing: curl -w '@timing_format' -o /dev/null -s https://api.example.com. Check for packet loss: mtr --report --report-cycles 20 api.example.com. Compare to a known-good endpoint.

Symptom · 03

Server sees incoming connections but app doesn't

→

Fix

Check ss -tlnp to confirm process is listening on expected IP (0.0.0.0 vs 127.0.0.1). Check app logs for bind errors. Verify kernel reverse path filtering: sysctl net.ipv4.conf.all.rp_filter.

Symptom · 04

High latency to a single hop in traceroute

→

Fix

Run mtr --report --report-cycles 50 TARGET for packet loss per hop. If latency spike is consistent, check interface errors: ip -s link show dev eth0. If the hop is your own router, check ARP table: ip neigh show.

★ Quick Debug Cheat SheetFive-second answers for the most common networking emergencies.

Can't reach any external host−

Immediate action

Check default route and DNS

Commands

ip route show default

dig @8.8.8.8 google.com

Fix now

If default route missing: 'ip route add default via GATEWAY_IP dev eth0'. If DNS fails: check /etc/resolv.conf and restart systemd-resolved.

Port 8080 shows as already in use+

curl returns 'Connection refused'+

DNS resolution is slow (2+ seconds)+

Linux Networking Commands Quick Reference

Command	Layer	Best Use Case	Requires Root?	Modern Alternative
ifconfig	Layer 3 (IP)	Legacy interface config inspection	No	ip addr
ip	Layer 2-3	Interfaces, routes, ARP, namespaces	For changes, yes	Still current — no replacement
netstat	Layer 4 (TCP/UDP)	Port and socket inspection	For -p flag	ss
ss	Layer 4 (TCP/UDP)	Fast socket stats, large systems	For -p flag	Still current — no replacement
ping	Layer 3 (ICMP)	Basic reachability and latency check	No	Still current
traceroute	Layer 3	Routing path and hop latency	No	mtr (combines with ping)
tcpdump	Layer 2-7	Raw packet capture and inspection	Yes	tshark (CLI Wireshark)
curl	Layer 7 (HTTP/S)	API testing, TLS and timing diagnosis	No	Still current
dig	Layer 7 (DNS)	DNS record queries and propagation debug	No	drill (alternative on some distros)
nload	Layer 2-4	Real-time total bandwidth per interface	No	Still current
iftop	Layer 4	Bandwidth per connection	Yes	Still current
nethogs	Layer 4	Bandwidth per process	Yes	Still current
iperf3	Layer 4	Throughput and jitter measurement	On server, yes	Still current

⚙ Quick Reference

7 commands from this guide

File	Command / Code	Purpose
network_interface_inspection.sh	ip addr show	ip vs ifconfig
socket_inspection.sh	ss -tlnp	ss and netstat
connectivity_layer_testing.sh	dig api.example.com A	curl, wget and dig
packet_path_tracing.sh	ping -c 5 8.8.8.8	ping, traceroute and tcpdump
performance_monitoring.sh	nload eth0	Network Performance Monitoring
dns_debug.sh	dig +short google.com @1.1.1.1	DNS Debugging
nftables_debug.sh	nft list ruleset	Firewall Forensics

Key takeaways

ip and ss are the modern replacements for ifconfig and netstat

learn ip addr, ip route, and ss -tlnp as your default starting point on any system

Always debug network failures layer by layer

DNS first with dig, TCP connectivity second with ss/curl, then HTTP/TLS with curl -v, and packet-level last with tcpdump — jumping layers wastes time

tcpdump is your last resort, not your first

always filter by host and port, always write to a file with -w, and cap file size with -C to avoid filling disk on production servers

Asterisks in traceroute and a failed ping are not proof of failure

ICMP is routinely blocked by firewalls while TCP traffic flows freely, so always verify with a TCP-level tool before declaring a host unreachable

When a service slows, check network bandwidth with nload or nethogs before blaming the application

the problem is often a rogue process saturating the interface

INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01SENIOR

A user reports that a web service is intermittently unreachable but your...

Q02SENIOR

What's the difference between a port that shows as LISTEN on 127.0.0.1 v...

Q03JUNIOR

You run traceroute to an external API endpoint and see the third hop ret...

Q04SENIOR

How would you measure the actual available bandwidth between two cloud i...

Q01 of 04SENIOR

A user reports that a web service is intermittently unreachable but your monitoring shows the server is up. Walk me through exactly how you would diagnose this — which commands would you run and in what order?

ANSWER

I'd start by isolating the problem layer by layer: 1. DNS: dig @8.8.8.8 service.example.com to check resolution. 2. TCP connectivity: nc -zv service.example.com 443 from a client outside the cluster. 3. Local listener: ssh into the server and run ss -tlnp | grep :443 to confirm the process is listening. 4. Firewall: check iptables -L and cloud security groups. 5. Application logs: if all network layers work, check the app's logs for errors. If it's intermittent, I'd set up continuous monitoring with curl -w every few seconds and correlate with timing and response codes.

FAQ · 5 QUESTIONS

Frequently Asked Questions

What is the difference between ss and netstat in Linux?

How do I find which process is using a specific port in Linux?

Why does ping fail but the service is still accessible in a browser?

What does 'ss -tlnp' stand for and when would I use it?

How can I capture traffic for later analysis without overloading my production server?

Naren Founder & Principal Engineer

20+ years shipping production infrastructure and CI/CD at scale. Written from production experience, not tutorials.

✓ Verified

production tested

July 19, 2026

last updated

2,466

articles · all by Naren

🔥

That's Linux. Mark it forged?

6 min read · try the examples if you haven't