Skip to content
Home DevOps Docker Networking Deep Dive: Internals, Drivers, and Production Patterns

Docker Networking Deep Dive: Internals, Drivers, and Production Patterns

Where developers are forged. · Structured learning · Free forever.
📍 Part of: Docker → Topic 15 of 18
Docker networking explained from the kernel up — bridge, overlay, macvlan drivers, CNM internals, iptables rules, and production gotchas every DevOps engineer must know.
🔥 Advanced — solid DevOps foundation required
In this tutorial, you'll learn
Docker networking explained from the kernel up — bridge, overlay, macvlan drivers, CNM internals, iptables rules, and production gotchas every DevOps engineer must know.
  • Docker networking is built on Linux kernel primitives: namespaces (isolation), veth pairs (virtual cables), bridges (virtual switches), and iptables (firewall/NAT). Understanding these primitives is essential for production debugging.
  • The default bridge network does not support DNS resolution by container name and lacks isolation. Always use custom bridge networks in production. This is the single most common cause of container connectivity failures.
  • Docker's embedded DNS at 127.0.0.11 resolves service names to container IPs on custom bridge networks. It is network-scoped — containers on different networks cannot resolve each other.
✦ Plain-English analogy ✦ Real code with output ✦ Interview questions
Quick Answer
  • Each container gets its own network namespace (isolated network stack)
  • Virtual Ethernet (veth) pairs connect the container namespace to a Linux bridge
  • iptables rules handle NAT for port publishing and inter-network routing
  • Docker's embedded DNS (127.0.0.11) resolves service names to container IPs
  • bridge: default, single-host, private virtual switch
  • overlay: multi-host via VXLAN tunnels (Swarm, Kubernetes)
  • macvlan: container gets a MAC address on the physical network
  • host: no isolation, container shares host network stack
  • none: no networking at all
🚨 START HERE
Docker Networking Triage Cheat Sheet
First-response commands when a networking issue is reported in production.
🟡Container cannot reach another container by name.
Immediate ActionCheck network membership and DNS resolution.
Commands
docker network inspect <network> | grep -A 20 Containers
docker exec <container> nslookup <target> 127.0.0.11
Fix NowIf DNS fails, containers are on different networks or the default bridge. Create a custom network: docker network create app-net. Reconnect: docker network connect app-net <container>.
🟡Published port not accessible from host or external network.
Immediate ActionVerify port mapping and check for conflicts.
Commands
docker port <container>
ss -tlnp | grep <port>
Fix NowIf port is in use by another process, kill it or change the host-side port. If iptables is blocking, check: sudo iptables -t nat -L DOCKER -n.
🟠High latency between containers on different hosts (overlay network).
Immediate ActionCheck underlay network and VXLAN overhead.
Commands
ping <other-host-ip>
docker network inspect <overlay-network> | grep -i encrypted
Fix NowDisable overlay encryption if not required: docker network create --opt encrypted=false. For latency-critical paths, use macvlan or host networking.
🟡Container has no internet access (cannot pull images, reach external APIs).
Immediate ActionCheck DNS and default gateway inside the container.
Commands
docker exec <container> cat /etc/resolv.conf
docker exec <container> ip route
Fix NowIf resolv.conf is empty or wrong, check Docker daemon DNS config: /etc/docker/daemon.json. If default route is missing, the container network is misconfigured. Restart Docker daemon: systemctl restart docker.
🟡IP address conflict — two containers have the same IP.
Immediate ActionIdentify conflicting containers and their networks.
Commands
docker network inspect <network> | grep -A 5 IPv4Address
docker inspect <container> --format='{{.NetworkSettings.Networks}}'
Fix NowDisconnect and reconnect the affected container: docker network disconnect <network> <container> && docker network connect <network> <container>. If persistent, restart Docker daemon to reset IPAM.
🟠iptables rules are accumulating and slowing down network performance.
Immediate ActionCheck iptables rule count.
Commands
sudo iptables -L -n | wc -l
sudo iptables -t nat -L DOCKER -n | wc -l
Fix NowClean up unused containers and networks: docker system prune. If rule count exceeds 10,000, consider using a different network driver or reducing the number of published ports.
Production IncidentIntermittent Connection Refused Between Containers on Default Bridge NetworkA microservices team migrated from docker-compose (custom networks) to raw docker run commands using the default bridge. Services intermittently could not reach each other by name, causing 15-minute outages during peak traffic.
SymptomThe API container intermittently returned 'connection refused' when connecting to the Redis container. The failure rate was approximately 5% of requests. Restarting either container temporarily fixed the issue. DNS resolution from inside the API container using nslookup redis failed, but connecting by IP address (172.17.0.3) worked.
AssumptionTeam assumed a Redis connection pool issue — perhaps connections were being dropped under load. They increased the Redis maxclients setting and added connection retry logic. The intermittent failures persisted. Second assumption: a Docker daemon bug. They upgraded Docker Engine from 24.x to 25.x. The failures persisted.
Root causeThe team was using docker run without specifying a custom network. Containers on the default bridge network cannot resolve each other by name — Docker's embedded DNS server only works on user-defined bridge networks. The team had been using --link (deprecated) in their original setup, which provided legacy name resolution. When they removed --link and relied on DNS, name resolution broke. The intermittent nature was caused by the application sometimes using cached IP addresses from a previous successful resolution and sometimes attempting fresh DNS lookups that failed.
Fix1. Created a custom bridge network: docker network create app-network. 2. Connected all containers to the custom network: docker run --network app-network. 3. Removed all --link flags. 4. Updated connection strings to use service names (redis:6379 instead of 172.17.0.3:6379). 5. Added a CI validation step that verifies all containers in the compose file share at least one custom network.
Key Lesson
The default bridge network does not support DNS resolution by container name. Only custom bridge networks do.--link is deprecated and should never be used. Custom networks with embedded DNS replace it entirely.Intermittent DNS failures in Docker are almost always caused by mixing default bridge and custom network configurations.Always use custom bridge networks in production — the default bridge is a legacy artifact.Validate network configuration in CI to catch these issues before deployment.
Production Debug GuideFrom connection refused to latency spikes — systematic debugging paths.
Container cannot resolve another container by name.Verify both containers are on the same user-defined network. The default bridge does not support DNS. Check: docker network inspect <network> | grep -A 20 Containers. Test DNS: docker exec <container> nslookup <target> 127.0.0.11.
Container resolves the name but connection is refused.Verify the target service is listening on 0.0.0.0, not 127.0.0.1. Check: docker exec <target> ss -tlnp. Verify you are using the container port, not the host-mapped port. Check if a healthcheck or depends_on race condition is the cause.
Port published to host but not accessible from outside.Check iptables rules: sudo iptables -t nat -L -n | grep DOCKER. Verify the container is running: docker ps. Check if another process is using the host port: ss -tlnp | grep <port>. Check if the host firewall (ufw, firewalld) is blocking the port.
Overlay network has high latency between hosts.Check VXLAN overhead: overlay adds ~100-200 microseconds per packet. Verify the underlay network between hosts: ping between host IPs. Check if the overlay network encryption is enabled (adds overhead). Consider macvlan or host networking for latency-critical paths.
Container can reach external services but external services cannot reach the container.Verify port publishing: docker port <container>. Check if the host has multiple network interfaces and the published port is bound to the wrong interface. Check iptables FORWARD chain: sudo iptables -L FORWARD -n. Verify the Docker daemon is running: systemctl status docker.
Intermittent connection drops between containers.Check if containers are on the default bridge (no DNS, no isolation). Check for IP address conflicts: docker network inspect <network>. Check if the Docker daemon is restarting (causes network namespace recreation). Check MTU mismatch between overlay underlay and container interfaces.

Docker networking failures are the most common source of production outages involving containers. When containers cannot reach each other, when DNS resolution silently fails, when latency spikes for no obvious reason — these are predictable consequences of decisions made at the kernel level that most engineers never inspect because 'it just worked in dev.'

Docker networking solves a hard problem: giving hundreds of isolated processes their own private network stack while still allowing controlled, performant communication with each other and the outside world. The answer involves Linux network namespaces, virtual Ethernet pairs, iptables chains, overlay tunnels, and a pluggable driver architecture — all wired together transparently on every docker run.

Common misconceptions: the default bridge network supports service name DNS (it does not — only custom bridge networks do), port publishing is required for container-to-container communication (it is not — containers on the same network communicate directly via container port), and overlay networks have zero overhead (they add VXLAN encapsulation latency). Understanding these distinctions prevents hours of debugging.

CNM Architecture — How Docker Networking Is Organized

Docker networking is built on the Container Network Model (CNM), which defines three abstractions: sandboxes, endpoints, and networks.

Sandbox: An isolated network stack for a container. Created when a container starts. Contains the container's interfaces, routes, and DNS config. Maps to a Linux network namespace.

Endpoint: A connection point that joins a sandbox to a network. Each endpoint is a veth pair — one end in the container's namespace, the other end attached to the network (bridge, overlay, etc.). A container can have multiple endpoints on different networks.

Network: A group of endpoints that can communicate with each other. Implemented by a driver (bridge, overlay, macvlan). The driver determines how packets are forwarded between endpoints.

The Docker daemon manages all three objects. When you run docker network create, you create a network object. When you run docker run --network mynet, Docker creates a sandbox, an endpoint, and joins the endpoint to the network.

Why this matters: The CNM abstraction allows pluggable drivers. The same container can be on a bridge network (local communication) and an overlay network (multi-host communication) simultaneously. The sandbox isolates the container's view — it sees only the endpoints that are connected to it.

io/thecodeforge/network-inspection.sh · BASH
123456789101112131415161718192021222324252627282930
#!/bin/bash
# Inspect the CNM objects for a running container

# List all networks
docker network ls

# Inspect a specific network — shows endpoints, IPAM, driver
docker network inspect bridge

# Inspect a container's network sandbox
docker inspect <container> --format='{{json .NetworkSettings}}' | python3 -m json.tool

# Show the veth pair for a container
# Find the container's PID
CONTAINER_PID=$(docker inspect --format='{{.State.Pid}}' <container>)

# List network interfaces in the container's namespace
nsenter -t $CONTAINER_PID -n ip addr

# List network interfaces on the host side
ip link show | grep veth

# Show iptables NAT rules for Docker
sudo iptables -t nat -L DOCKER -n -v

# Show iptables filter rules for Docker
sudo iptables -L DOCKER -n -v

# Show iptables rules for inter-container traffic
sudo iptables -L DOCKER-ISOLATION-STAGE-1 -n -v
▶ Output
# docker network ls
NETWORK ID NAME DRIVER SCOPE
a1b2c3d4e5f6 bridge bridge local
f6e5d4c3b2a1 host host local
1a2b3c4d5e6f none null local

# docker network inspect bridge
[
{
"Name": "bridge",
"Id": "a1b2c3d4e5f6...",
"Driver": "bridge",
"IPAM": {
"Config": [
{"Subnet": "172.17.0.0/16", "Gateway": "172.17.0.1"}
]
}
}
]
Mental Model
CNM as a Telephone System
Why does Docker use a pluggable driver architecture instead of a single networking implementation?
  • Different workloads need different networking: single-host (bridge), multi-host (overlay), bare-metal performance (macvlan).
  • A pluggable architecture lets third-party drivers (Calico, Cilium, Weave) integrate without modifying Docker.
  • The CNM abstraction separates the 'what' (connect containers) from the 'how' (bridge, overlay, macvlan).
  • The same container can be on multiple networks with different drivers simultaneously.
📊 Production Insight
The CNM abstraction is what allows Kubernetes to use Docker networking (or more commonly, CNI plugins like Calico and Cilium) without being tied to Docker's implementation. Understanding CNM helps you debug networking issues in both Docker and Kubernetes because the underlying kernel primitives (namespaces, veth pairs, bridges) are the same.
🎯 Key Takeaway
CNM separates networking into sandbox (isolation), endpoint (connection), and network (routing). This abstraction enables pluggable drivers. Understanding CNM is essential for debugging networking issues in both Docker and Kubernetes because the kernel primitives are identical.
Network Driver Selection
IfSingle host, multiple containers that need to communicate
UseUse bridge networking — create a custom bridge network for DNS resolution by service name
IfMultiple hosts, containers need to communicate across hosts
UseUse overlay networking (Docker Swarm) or a CNI plugin (Kubernetes)
IfContainer needs to appear as a physical device on the LAN
UseUse macvlan — container gets its own MAC address and IP on the physical network
IfLatency-critical workload, no isolation needed
UseUse host networking — eliminates all virtual network overhead
IfSecurity scanning, air-gapped batch job
UseUse none networking — no network access at all

Bridge Networking Internals — The Linux Kernel View

Bridge networking is the default and most common Docker network driver. Understanding what happens at the kernel level when two containers on the same bridge network communicate is essential for debugging.

Step 1: Network namespace creation. When a container starts, Docker creates a new Linux network namespace. This namespace has its own interfaces, routing table, and iptables rules — completely isolated from the host and other containers.

Step 2: veth pair creation. Docker creates a virtual Ethernet (veth) pair. One end (vethXXXX) stays in the host's namespace. The other end (eth0) is moved into the container's namespace. They are connected like a virtual cable — packets sent into one end appear on the other.

Step 3: Bridge attachment. The host end of the veth pair is attached to a Linux bridge (br-XXXX). The bridge acts as a virtual switch — it forwards frames between all attached veth interfaces based on MAC address learning.

Step 4: IP address assignment. Docker's IPAM (IP Address Management) assigns an IP address from the bridge's subnet to the container's eth0 interface. The bridge itself gets the gateway IP (e.g., 172.18.0.1).

Step 5: iptables rules. Docker adds iptables rules to handle NAT for published ports, inter-network isolation (DOCKER-ISOLATION chains), and masquerading for outbound traffic.

Packet flow for container-to-container communication: Container A (172.18.0.2) sends a packet to Container B (172.18.0.3). The packet exits Container A's eth0, travels through the veth pair to the bridge. The bridge learns Container A's MAC address, looks up Container B's MAC in its forwarding table, and forwards the frame to Container B's veth host end. The packet travels through the veth pair into Container B's namespace and arrives at Container B's eth0. No iptables NAT is involved — this is direct L2 switching.

Packet flow for port publishing (external access): An external client sends a packet to the host's port 8000. iptables NAT rules (DOCKER chain) rewrite the destination from the host IP to the container's IP (DNAT). The packet enters the bridge and is forwarded to the container. The container processes the request and sends a reply. iptables rewrites the source from the container's IP to the host's IP (SNAT/masquerade). The reply reaches the external client.

io/thecodeforge/bridge-inspection.sh · BASH
1234567891011121314151617181920212223242526272829303132333435363738
#!/bin/bash
# Inspect the Linux kernel primitives behind Docker bridge networking

# 1. Show the Linux bridge Docker created
brctl show
# bridge name     bridge id           STP enabled   interfaces
# br-a1b2c3d4e5f6 8000.0242a1b2c3d4   no            veth1234abcd
#                                                      veth5678efgh

# 2. Show veth pairs — one end in host namespace, other in container
ip link show type veth

# 3. Show network namespaces (one per container)
ip netns list
# Or find by container PID:
CONTAINER_PID=$(docker inspect --format='{{.State.Pid}}' <container>)
ls -la /proc/$CONTAINER_PID/ns/net

# 4. Enter a container's network namespace and inspect
nsenter -t $CONTAINER_PID -n ip addr
nsenter -t $CONTAINER_PID -n ip route
nsenter -t $CONTAINER_PID -n cat /etc/resolv.conf

# 5. Show iptables NAT rules for port publishing
sudo iptables -t nat -L DOCKER -n -v
# Chain DOCKER (2 references)
# pkts bytes target   prot opt in       out     source    destination
#   12  720  DNAT     tcp  --  !br-a1b  *       0.0.0.0/0 0.0.0.0/0  tcp dpt:8000 to:172.18.0.2:8000

# 6. Show inter-container isolation rules
sudo iptables -L DOCKER-ISOLATION-STAGE-1 -n -v
sudo iptables -L DOCKER-ISOLATION-STAGE-2 -n -v

# 7. Show the bridge's forwarding table (MAC learning)
brctl showmacs br-a1b2c3d4e5f6
# port no  mac addr                is local?   ageing timer
#   1      02:42:ac:12:00:02       yes           0.00
#   2      02:42:ac:12:00:03       yes           0.00
▶ Output
# nsenter -t 12345 -n ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536
inet 127.0.0.1/8 scope host lo
2: eth0@if7: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500
inet 172.18.0.2/16 brd 172.18.255.255 scope global eth0

# nsenter -t 12345 -n ip route
default via 172.18.0.1 dev eth0
172.18.0.0/16 dev eth0 proto kernel scope link src 172.18.0.2
Mental Model
Bridge Network as a Physical Switch
Why does bridge networking add latency compared to host networking?
  • Every packet must traverse the veth pair — a virtual cable with kernel-level copying overhead.
  • Published ports require iptables NAT (DNAT + SNAT) — each NAT rule adds processing time.
  • The bridge performs MAC address lookup and forwarding — similar to a physical switch but in software.
  • Combined overhead: ~10-50 microseconds per packet. For most workloads, this is negligible. For latency-critical workloads (trading, gaming), it matters.
📊 Production Insight
The iptables NAT rules for published ports are the most common source of 'works in dev, fails in production' networking issues. Dev environments typically have simple iptables rules. Production hosts may have complex firewall configurations (ufw, firewalld, cloud security groups) that conflict with Docker's iptables rules. Always verify that the host firewall allows traffic to published ports.
🎯 Key Takeaway
Bridge networking uses Linux namespaces (isolation), veth pairs (virtual cables), and a Linux bridge (virtual switch). Published ports use iptables NAT (DNAT/SNAT). Understanding these kernel primitives is essential for debugging networking issues that the Docker CLI cannot diagnose.

Docker DNS Resolution — How Service Names Become IP Addresses

Docker's embedded DNS server is the mechanism that allows containers to reach each other by service name instead of IP address. Understanding how it works — and its limitations — prevents hours of debugging.

How it works: When you create a custom bridge network, Docker runs a DNS server at 127.0.0.11 inside each container on that network. The container's /etc/resolv.conf points to this DNS server. When a container resolves a service name, the request goes to 127.0.0.11, which maps the name to the container's internal IP on that network.

What gets resolved: - Container name: the name you gave the container with --name - Service name (Compose): the service name in docker-compose.yml - Container ID: the full or truncated container ID - Network alias: an additional name set with docker network connect --alias

What does NOT get resolved: - Containers on the default bridge network (no embedded DNS) - Containers on different networks (DNS is network-scoped) - Host names outside the Docker network (use the host's DNS)

The default bridge limitation: The default bridge network does NOT use Docker's embedded DNS. It uses the legacy /etc/hosts file approach, which only supports --link (deprecated). This is why containers on the default bridge cannot resolve each other by name without --link. Custom bridge networks use the embedded DNS server and support name resolution natively.

Round-robin DNS for load balancing: If multiple containers have the same network alias, Docker's DNS returns all IPs in round-robin order. This provides basic client-side load balancing without a dedicated load balancer.

io/thecodeforge/dns-debug.sh · BASH
123456789101112131415161718192021222324252627282930313233343536
#!/bin/bash
# Debug Docker DNS resolution

# 1. Check the container's DNS configuration
docker exec <container> cat /etc/resolv.conf
# Expected output for custom bridge:
# nameserver 127.0.0.11
# options ndots:0

# 2. Test DNS resolution using the embedded DNS server
docker exec <container> nslookup <target-service> 127.0.0.11
# Server:    127.0.0.11
# Address:   127.0.0.11:53
# Name:      target-service
# Address:   172.18.0.3

# 3. Test DNS resolution using the default resolver
docker exec <container> nslookup <target-service>
# If this fails but the above works, the container's
# resolv.conf is misconfigured or the default bridge is in use.

# 4. Check if containers are on the same network
docker inspect <container-a> --format='{{range $k,$v := .NetworkSettings.Networks}}{{$k}} {{end}}'
docker inspect <container-b> --format='{{range $k,$v := .NetworkSettings.Networks}}{{$k}} {{end}}'
# If the networks differ, DNS resolution will fail.

# 5. Test round-robin DNS (multiple containers with same alias)
docker network create test-net
docker run -d --network test-net --network-alias backend nginx:alpine
docker run -d --network test-net --network-alias backend nginx:alpine
docker run --rm --network test-net alpine nslookup backend
# Returns two IP addresses — round-robin load balancing

# 6. Check Docker daemon DNS configuration
cat /etc/docker/daemon.json
# Look for "dns" key — overrides the default upstream DNS
▶ Output
# docker exec api cat /etc/resolv.conf
nameserver 127.0.0.11
options ndots:0

# docker exec api nslookup database 127.0.0.11
Server: 127.0.0.11
Address: 127.0.0.11:53

Non-authoritative answer:
Name: database
Address: 172.18.0.3
Mental Model
Docker DNS as an Internal Phone Book
Why does the default bridge network not support DNS resolution by name?
  • The default bridge predates the embedded DNS server — it was designed before user-defined networks existed.
  • The default bridge uses the legacy /etc/hosts approach, which requires --link to add entries.
  • --link is deprecated because it creates a hard coupling between containers and does not scale.
  • Custom bridge networks use the embedded DNS server (127.0.0.11) which supports dynamic name resolution.
  • Moral: never use the default bridge in production. Always create custom networks.
📊 Production Insight
The DNS resolution limitation of the default bridge is the single most common cause of 'connection refused' errors in Docker networking. Teams start with the default bridge, add containers, and wonder why they cannot communicate by name. The fix is always the same: create a custom bridge network and connect all containers to it. This should be the first thing you check when debugging container connectivity issues.
🎯 Key Takeaway
Docker's embedded DNS at 127.0.0.11 only works on custom bridge networks, not the default bridge. This is the single most common cause of container connectivity failures. Always use custom networks in production. The embedded DNS also supports round-robin for network aliases, providing basic client-side load balancing.
DNS Resolution Debugging
Ifnslookup fails but direct IP connection works
UseContainers are on the default bridge or different networks. Create a custom network and reconnect.
Ifnslookup returns wrong IP
UseAnother container with the same name exists on the network. Check: docker network inspect <network> | grep Name
IfDNS resolution works but connection is refused
UseTarget service is not listening on 0.0.0.0 or is not ready. Check: docker exec <target> ss -tlnp
IfDNS resolution is slow (>100ms)
UseCheck if the container's resolv.conf has external DNS servers that are timing out. The embedded DNS (127.0.0.11) should be the first nameserver.

Overlay Networks — Multi-Host Container Communication

Overlay networks extend Docker networking across multiple hosts using VXLAN tunnels. This is the networking foundation for Docker Swarm and is also used by Kubernetes (via CNI plugins like Flannel and Calico).

How VXLAN works: VXLAN (Virtual Extensible LAN) encapsulates Layer 2 Ethernet frames inside Layer 3 UDP packets. Each overlay network gets a VNI (VXLAN Network Identifier) — like a VLAN ID but with a 24-bit address space (16 million networks vs 4096 VLANs). The encapsulation adds 50 bytes of overhead per packet (outer Ethernet + outer IP + outer UDP + VXLAN header).

Packet flow across hosts: Container A on Host 1 sends a packet to Container B on Host 2. The packet exits Container A's veth pair, reaches the overlay bridge on Host 1. The overlay driver encapsulates the packet in a VXLAN frame with the network's VNI. The encapsulated packet is sent via UDP (port 4789) to Host 2's physical IP. Host 2 decapsulates the packet, strips the VXLAN header, and forwards the inner frame to Container B's veth pair.

Performance impact: VXLAN encapsulation adds ~100-200 microseconds of latency per packet compared to bridge networking. The 50-byte overhead reduces effective MTU (typically 1450 bytes instead of 1500). For high-throughput workloads, this overhead is measurable — overlay networks typically achieve 85-90% of bare-metal throughput.

When to use overlay: Multi-host container communication in Docker Swarm or when you need a flat network across hosts without complex routing. Overlay is the right choice when simplicity matters more than raw performance.

When to avoid overlay: Latency-critical workloads (real-time trading, gaming), high-throughput data pipelines, or environments where the underlay network already provides L3 connectivity (use macvlan or direct routing instead).

io/thecodeforge/overlay-setup.sh · BASH
123456789101112131415161718192021222324252627282930313233343536373839404142434445
#!/bin/bash
# Set up and inspect overlay networks

# Prerequisites: Docker Swarm must be initialized
docker swarm init --advertise-addr $(hostname -I | awk '{print $1}')

# Create an overlay network
docker network create \
  --driver overlay \
  --subnet 10.0.10.0/24 \
  --opt encrypted \
  my-overlay

# Deploy a service that uses the overlay network
docker service create \
  --name api \
  --network my-overlay \
  --replicas 3 \
  -p 8000:8000 \
  io.thecodeforge/api:latest

# Deploy a database service on the same overlay
docker service create \
  --name postgres \
  --network my-overlay \
  --replicas 1 \
  postgres:16-alpine

# The 'api' containers can reach 'postgres' by service name
# regardless of which host they are running on.

# Inspect the overlay network
docker network inspect my-overlay

# Check VXLAN tunnel endpoints
docker network inspect my-overlay --format='{{json .Peers}}' | python3 -m json.tool
# [
#   {"Name": "host-1", "IP": "10.0.0.1"},
#   {"Name": "host-2", "IP": "10.0.0.2"}
# ]

# Check VXLAN overhead — MTU is reduced
# From inside a container on the overlay:
docker exec <container> ip link show eth0
# eth0@if5: mtu 1450  <-- reduced from 1500 due to VXLAN overhead
▶ Output
# docker network inspect my-overlay
[
{
"Name": "my-overlay",
"Driver": "overlay",
"Options": {"encrypted": ""},
"IPAM": {
"Config": [{"Subnet": "10.0.10.0/24"}]
}
}
]
Mental Model
Overlay Network as a VPN Between Hosts
Why does overlay networking add more latency than bridge networking?
  • Bridge networking: packet goes from container -> veth -> bridge -> veth -> container. All on the same host.
  • Overlay networking: packet goes from container -> veth -> overlay bridge -> VXLAN encapsulation -> physical NIC -> network -> physical NIC -> VXLAN decapsulation -> overlay bridge -> veth -> container.
  • The VXLAN encapsulation/decapsulation adds ~100-200 microseconds.
  • The 50-byte overhead reduces effective MTU, causing more packets for large transfers.
  • Encryption (if enabled) adds additional overhead.
📊 Production Insight
The MTU reduction from VXLAN (1500 -> 1450) causes subtle issues with applications that assume 1500-byte MTU. Large packets that fit in 1500 bytes on a bridge network will be fragmented on an overlay network, causing performance degradation and potential packet loss. The fix: set MTU explicitly in the application or configure the overlay network with --opt com.docker.network.driver.mtu=1450.
🎯 Key Takeaway
Overlay networks use VXLAN encapsulation to extend Docker networking across hosts. They add ~100-200 microseconds of latency and reduce MTU from 1500 to 1450. Use overlay for simplicity in Swarm. Use macvlan or direct routing for latency-critical workloads. Always test MTU-sensitive applications on overlay networks before production deployment.
Multi-Host Networking Strategy
IfDocker Swarm, simple multi-host communication
UseUse overlay networking — built-in, simple, supports encryption
IfKubernetes, need advanced networking (network policies, service mesh)
UseUse a CNI plugin (Calico, Cilium, Flannel) — not Docker overlay
IfLatency-critical workload across hosts
UseUse macvlan or direct routing — avoid VXLAN encapsulation overhead
IfHigh-throughput data pipeline across hosts
UseUse host networking on both ends or macvlan — maximize throughput

macvlan and host Networking — When Bridge Is Not Enough

Bridge networking is the right default for most workloads. But two scenarios require different drivers: when a container needs to appear as a physical device on the LAN (macvlan), and when latency must be minimized (host).

macvlan: Assigns a MAC address to the container, making it appear as a separate physical device on the network. The container gets an IP from the physical network's DHCP or static range. Other devices on the LAN can reach the container directly without NAT.

Use cases: legacy applications that require a unique MAC address, network appliances (firewalls, load balancers) that need to be on the physical network, IoT devices that communicate via broadcast.

Limitations: macvlan requires the host NIC to be in promiscuous mode (not allowed on some cloud providers). Containers on different macvlan subnets on the same host cannot communicate with each other (they go out to the physical switch and back, but the switch may not route between them). Requires careful VLAN configuration on the physical network.

host: Removes the network namespace entirely. The container uses the host's network stack directly. No veth pair, no bridge, no iptables NAT. The container can bind to any host port.

Use cases: latency-critical workloads (trading, gaming), network monitoring tools that need raw socket access, containers that need to bind to privileged ports (<1024).

Limitations: no port isolation — two containers cannot bind to the same port. no network isolation — the container can see all host network interfaces and traffic. Security risk — a compromised container has full network access to the host.

io/thecodeforge/macvlan-setup.sh · BASH
1234567891011121314151617181920212223242526272829303132333435363738394041424344
#!/bin/bash
# Set up macvlan networking

# 1. Create a macvlan network
# Replace eth0 with your host's physical interface
# Replace 192.168.1.0/24 with your physical network's subnet
# Replace 192.168.1.1 with your gateway
docker network create -d macvlan \
  --subnet=192.168.1.0/24 \
  --gateway=192.168.1.1 \
  -o parent=eth0 \
  my-macvlan

# 2. Run a container on the macvlan network
docker run -d \
  --name network-device \
  --network my-macvlan \
  --ip 192.168.1.100 \
  io.thecodeforge/network-monitor:latest

# 3. The container is now reachable at 192.168.1.100 from the LAN
# Other devices on the LAN can ping it directly:
# ping 192.168.1.100

# 4. IMPORTANT: Host-to-container communication is broken by default
# with macvlan. The host cannot reach the container on the macvlan
# network because the traffic goes out to the physical switch and
# the switch does not route it back to the same port.
# Fix: create a macvlan shim interface on the host:
sudo ip link add macvlan-shim link eth0 type macvlan mode bridge
sudo ip addr add 192.168.1.99/32 dev macvlan-shim
sudo ip link set macvlan-shim up
sudo ip route add 192.168.1.100/32 dev macvlan-shim

# ── Host networking example ──────────────────────────────────────
# Run with host networking (no network isolation)
docker run -d \
  --name latency-critical \
  --network host \
  io.thecodeforge/trading-engine:latest

# The container shares the host's network stack directly.
# No veth pair, no bridge, no NAT. Minimum latency.
# But: no port isolation, no network isolation.
▶ Output
# docker network inspect my-macvlan
[
{
"Name": "my-macvlan",
"Driver": "macvlan",
"Options": {"parent": "eth0"},
"IPAM": {
"Config": [
{"Subnet": "192.168.1.0/24", "Gateway": "192.168.1.1"}
]
}
}
]
Mental Model
Network Drivers as Isolation Levels
Why can the host not reach a container on a macvlan network by default?
  • macvlan creates a new MAC address for the container on the physical interface.
  • When the host tries to reach the container, the packet goes out through the physical NIC to the switch.
  • The switch sees the source and destination are on the same port and does not route the packet back.
  • The fix is a macvlan shim interface on the host that acts as a bridge for host-to-container traffic.
📊 Production Insight
macvlan is not supported on most cloud providers (AWS, GCP, Azure) because their virtual NICs do not support promiscuous mode or multiple MAC addresses. If you need containers on the physical network in a cloud environment, use a CNI plugin that provides similar functionality (Calico with BGP, Cilium with direct routing). host networking is supported everywhere but should be used sparingly due to the complete loss of network isolation.
🎯 Key Takeaway
macvlan makes containers appear as physical devices on the LAN — useful for legacy applications and network appliances but not supported on most cloud providers. host networking eliminates all network overhead but also eliminates all isolation. Use host for latency-critical workloads only. The host-to-container communication gap with macvlan requires a shim interface workaround.

iptables and Docker — The Firewall Rules You Never See

Docker manages iptables rules automatically to handle port publishing, inter-network isolation, and outbound masquerading. Understanding these rules is essential for debugging networking issues on hosts with complex firewall configurations.

Key iptables chains Docker creates: - DOCKER chain (nat table): DNAT rules for published ports. Maps host port to container IP:port. - DOCKER chain (filter table): Allows traffic to published ports. - DOCKER-ISOLATION-STAGE-1 and STAGE-2: Prevents traffic between different Docker networks. - DOCKER-USER chain: User-defined rules that are evaluated before Docker's rules. This is where you add custom firewall rules.

The conflict with ufw/firewalld: Docker bypasses ufw (Ubuntu) and firewalld (RHEL) by inserting rules directly into the iptables FORWARD chain. This means ufw may show port 8000 as blocked, but Docker's iptables rules allow it anyway. This is a common source of confusion and security gaps.

The DOCKER-USER chain: Docker never modifies this chain. It is the safe place to add custom firewall rules that apply to Docker traffic. Rules in DOCKER-USER are evaluated before Docker's own rules. Use this to restrict which external IPs can access published ports.

Performance impact: Each published port creates at least one DNAT rule and one filter rule. With 100 published ports, the iptables rule chain is evaluated for every packet. On high-throughput hosts, this adds measurable latency. Minimize published ports — use a reverse proxy (nginx, Traefik) that publishes a single port and routes internally.

io/thecodeforge/iptables-audit.sh · BASH
12345678910111213141516171819202122232425262728293031323334
#!/bin/bash
# Audit and manage Docker's iptables rules

# 1. Show all Docker-related NAT rules
sudo iptables -t nat -L DOCKER -n -v --line-numbers
# Chain DOCKER (2 references)
# num  pkts bytes target   prot opt in       out     source    destination
# 1      48  2880 DNAT     tcp  --  !br-abc  *       0.0.0.0/0 0.0.0.0/0  tcp dpt:8000 to:172.18.0.2:8000
# 2      12   720 DNAT     tcp  --  !br-abc  *       0.0.0.0/0 0.0.0.0/0  tcp dpt:5432 to:172.18.0.3:5432

# 2. Show inter-network isolation rules
sudo iptables -L DOCKER-ISOLATION-STAGE-1 -n -v
# This chain prevents containers on different networks from communicating.

# 3. Show the DOCKER-USER chain (your custom rules)
sudo iptables -L DOCKER-USER -n -v --line-numbers

# 4. Add a custom rule to restrict published port access
# Only allow 10.0.0.0/8 to access port 8000
sudo iptables -I DOCKER-USER -i eth0 -s 10.0.0.0/8 -p tcp --dport 8000 -j ACCEPT
sudo iptables -I DOCKER-USER -i eth0 -p tcp --dport 8000 -j DROP
# IMPORTANT: order matters. ACCEPT rule must come before DROP rule.

# 5. Show outbound masquerading (SNAT for container-to-internet)
sudo iptables -t nat -L POSTROUTING -n -v | grep -i masquerade
# MASQUADE  all  --  172.18.0.0/16  !172.18.0.0/16
# This rule allows containers to reach the internet by masquerading
# their source IP as the host's IP.

# 6. Count total Docker-related iptables rules
sudo iptables -S | grep -i docker | wc -l
sudo iptables -t nat -S | grep -i docker | wc -l
# If total exceeds 5000, consider reducing published ports or
# using a reverse proxy.
▶ Output
# sudo iptables -t nat -L DOCKER -n -v
Chain DOCKER (2 references)
pkts bytes target prot opt in out source destination
48 2880 DNAT tcp -- !br-a1b * 0.0.0.0/0 0.0.0.0/0 tcp dpt:8000 to:172.18.0.2:8000
12 720 DNAT tcp -- !br-a1b * 0.0.0.0/0 0.0.0.0/0 tcp dpt:5432 to:172.18.0.3:5432
Mental Model
iptables as a Bouncer at Multiple Doors
Why does Docker bypass ufw/firewalld?
  • Docker inserts iptables rules directly into the FORWARD and nat PREROUTING chains.
  • ufw and firewalld manage rules in the INPUT chain, not FORWARD.
  • Traffic to published ports goes through FORWARD (not INPUT), so ufw rules do not apply.
  • This creates a security gap: ufw shows the port as blocked, but Docker's rules allow it.
  • The fix: use the DOCKER-USER chain for custom rules, or set iptables=false in daemon.json and manage rules manually.
📊 Production Insight
The iptables bypass of ufw/firewalld is a security risk that catches many teams off guard. They configure ufw to block port 5432, but Docker's DNAT rule in the nat table allows it anyway because the traffic never hits the INPUT chain. The fix is to add explicit DROP rules in the DOCKER-USER chain for ports that should not be accessible externally, or to stop publishing database ports entirely and keep them on private networks.
🎯 Key Takeaway
Docker manages iptables rules automatically for port publishing, network isolation, and outbound masquerading. These rules bypass ufw/firewalld, creating security gaps. Use the DOCKER-USER chain for custom firewall rules — Docker never modifies this chain. Minimize published ports to reduce iptables rule count and use a reverse proxy for external access.
iptables Conflict Resolution
Ifufw shows port as blocked but Docker container is accessible
UseDocker's DNAT rules bypass ufw. Add DROP rules in DOCKER-USER chain, or stop publishing the port.
IfNeed to restrict published port access by source IP
UseAdd ACCEPT/DROP rules in DOCKER-USER chain. ACCEPT for allowed IPs, DROP for all others.
Ifiptables rules are accumulating and slowing down traffic
UseClean up unused containers/networks. Use a reverse proxy to consolidate published ports.
IfNeed full control over iptables without Docker interference
UseSet "iptables": false in /etc/docker/daemon.json. Manage all rules manually. Warning: complex.
🗂 Docker Network Drivers Compared
Isolation, performance, and use cases for each driver.
DriverIsolationPerformanceDNS by NameMulti-HostUse Case
bridge (default)Full (namespace)~10-50 us overheadNo (legacy only)NoLegacy, avoid in production
bridge (custom)Full (namespace)~10-50 us overheadYes (embedded DNS)NoSingle-host production
overlayFull (namespace + tunnel)~100-200 us overheadYesYes (VXLAN)Docker Swarm, multi-host
macvlanPartial (LAN visible)Near bare-metalNo (uses LAN DNS)Yes (physical network)Legacy apps, network appliances
hostNoneBare-metalN/A (uses host DNS)N/ALatency-critical workloads
noneCompleteN/ANoNoAir-gapped batch jobs

🎯 Key Takeaways

  • Docker networking is built on Linux kernel primitives: namespaces (isolation), veth pairs (virtual cables), bridges (virtual switches), and iptables (firewall/NAT). Understanding these primitives is essential for production debugging.
  • The default bridge network does not support DNS resolution by container name and lacks isolation. Always use custom bridge networks in production. This is the single most common cause of container connectivity failures.
  • Docker's embedded DNS at 127.0.0.11 resolves service names to container IPs on custom bridge networks. It is network-scoped — containers on different networks cannot resolve each other.
  • Overlay networks use VXLAN encapsulation for multi-host communication. They add ~100-200 microseconds of latency and reduce MTU from 1500 to 1450. Test MTU-sensitive applications before production deployment.
  • Docker's iptables rules bypass ufw and firewalld, creating security gaps. Use the DOCKER-USER chain for custom firewall rules. Minimize published ports and use a reverse proxy for external access.
  • macvlan makes containers appear as physical devices on the LAN. host networking eliminates all network overhead but also eliminates all isolation. Choose based on your isolation and performance requirements.

⚠ Common Mistakes to Avoid

    Using the default bridge network in production
    Symptom

    containers cannot resolve each other by name, intermittent connection failures —

    Fix

    always create custom bridge networks. The default bridge does not support Docker's embedded DNS server and lacks network isolation between containers.

    Publishing database ports to the host
    Symptom

    database is accessible from any machine that can reach the host, bypassing application-level access controls —

    Fix

    never use ports: '5432:5432' for internal services. Keep databases on private networks with no published ports. Only edge services should have published ports.

    Assuming ufw/firewalld blocks Docker-published ports
    Symptom

    ufw shows port as blocked but the container is accessible from external networks —

    Fix

    Docker's iptables rules bypass ufw. Add explicit DROP rules in the DOCKER-USER chain, or stop publishing the port.

    Not considering MTU on overlay networks
    Symptom

    intermittent packet loss, slow transfers, and TCP retransmissions on overlay networks —

    Fix

    VXLAN reduces MTU from 1500 to 1450. Set MTU explicitly in the application or configure the overlay network with --opt com.docker.network.driver.mtu=1450.

    Using localhost or 127.0.0.1 in connection strings between containers
    Symptom

    container cannot reach another container using localhost —

    Fix

    localhost inside a container refers to the container itself. Use the service name or container name as the hostname. The correct DATABASE_URL is postgres://user:pass@postgres_db:5432/dbname, not postgres://user:pass@localhost:5432/dbname.

    Too many published ports creating iptables bloat
    Symptom

    network latency increases as the number of containers and published ports grows —

    Fix

    use a reverse proxy (nginx, Traefik) that publishes a single port and routes internally. Minimize the number of published ports to reduce iptables rule evaluation overhead.

Interview Questions on This Topic

  • QExplain what happens at the Linux kernel level when two containers on the same bridge network communicate. Walk me through the packet flow from source to destination.
  • QWhat is the difference between the default bridge network and a custom bridge network? Why does the default bridge not support DNS resolution by container name?
  • QHow does Docker's embedded DNS server work? Where does it run, and how do containers discover it?
  • QExplain the difference between bridge, overlay, and macvlan network drivers. When would you use each?
  • QYour team configured ufw to block port 5432, but the PostgreSQL container is still accessible from external networks. What is happening and how do you fix it?
  • QWhat is the performance impact of overlay networking compared to bridge networking? What causes the overhead?
  • QHow do iptables rules interact with Docker? What is the DOCKER-USER chain and when would you use it?
  • QA container can reach external APIs but external services cannot reach the container on its published port. Walk me through your debugging process.

Frequently Asked Questions

Why can't my containers communicate by name on the default bridge network?

The default bridge network does not use Docker's embedded DNS server. It relies on the legacy /etc/hosts approach, which only works with the deprecated --link flag. Custom bridge networks use the embedded DNS at 127.0.0.11 and support name resolution natively. Always create custom bridge networks in production.

What is the performance difference between bridge and overlay networking?

Bridge networking adds ~10-50 microseconds per packet due to veth pair and iptables NAT overhead. Overlay networking adds ~100-200 microseconds due to VXLAN encapsulation (50 bytes per packet). For most workloads, this is negligible. For latency-critical workloads (trading, gaming), use host or macvlan networking.

How do I restrict access to a published Docker port by source IP?

Add iptables rules in the DOCKER-USER chain. Docker never modifies this chain, so your rules persist across container restarts. Example: iptables -I DOCKER-USER -i eth0 -s 10.0.0.0/8 -p tcp --dport 8000 -j ACCEPT followed by iptables -I DOCKER-USER -i eth0 -p tcp --dport 8000 -j DROP.

Can I use macvlan networking on AWS, GCP, or Azure?

Generally no. Cloud provider virtual NICs do not support promiscuous mode or multiple MAC addresses, which macvlan requires. For similar functionality in cloud environments, use a CNI plugin (Calico with BGP, Cilium with direct routing) or use the cloud provider's native networking features.

What is the DOCKER-ISOLATION chain in iptables?

DOCKER-ISOLATION-STAGE-1 and STAGE-2 are iptables chains that prevent traffic between different Docker networks. They ensure that containers on network A cannot reach containers on network B unless they share both networks. This is Docker's network-level isolation enforcement.

🔥
Naren Founder & Author

Developer and founder of TheCodeForge. I built this site because I was tired of tutorials that explain what to type without explaining why it works. Every article here is written to make concepts actually click.

← PreviousOptimising Docker ImagesNext →Docker Containerization Explained (2026 Guide)
Forged with 🔥 at TheCodeForge.io — Where Developers Are Forged