Advanced 11 min · March 06, 2026

Kubernetes Network Policies: Default-Deny Egress Blocks DNS

Default-deny egress blocks DNS, causing 30-second timeouts on every request.

N
Naren Founder & Principal Engineer

20+ years shipping production infrastructure and CI/CD at scale. Drawn from code that ran under real load.

Follow
Production
production tested
May 24, 2026
last updated
1,554
articles · all by Naren
 ● Production Incident 🔎 Debug Guide ⚙ Triage Commands
Quick Answer
  • Enforcement is done by the CNI plugin (Calico, Cilium), NOT the API server. Flannel ignores policies silently.
  • Default behavior: if no policy selects a Pod, ALL traffic is allowed in both directions.
  • Once any policy selects a Pod, that direction enters implicit default-deny. Only explicitly whitelisted traffic passes.
  • Policies are additive whitelists. There is no deny rule in the standard API. Multiple policies selecting the same Pod are unioned (OR).
  • iptables-based CNIs (Calico) scale O(n) with rule count. Performance degrades at 1000+ Pods.
  • eBPF-based CNIs (Cilium) scale O(1) with hash maps. Better performance but requires kernel 4.9+.
  • Forgetting DNS egress carve-out when applying default-deny egress. Every service discovery call silently times out after 30 seconds.
✦ Definition~90s read
What is Kubernetes Network Policies?

Kubernetes Network Policies are a cluster-scoped firewall specification that controls pod-to-pod communication at the IP address and port level, enforced by the Container Network Interface (CNI) plugin. They solve a fundamental security problem: by default, all pods in a cluster can communicate freely with each other and the outside world, which is rarely what you want in production.

Imagine your apartment building has no locks on any doors — every tenant can walk into every other apartment freely.

Network Policies let you define granular ingress (inbound) and egress (outbound) rules using label selectors, namespace selectors, and CIDR blocks, effectively creating micro-segmentation within your cluster. The key insight is that these are not firewall rules you write directly — they're declarative Kubernetes resources that the CNI translates into low-level packet filtering rules (iptables, eBPF, or similar) on each node.

A critical and often overlooked trap is that a default-deny egress policy will silently break DNS resolution for all pods in that namespace, because the kube-dns or CoreDNS service runs on a specific IP and port (typically port 53 UDP/TCP) that must be explicitly allowed. Without an egress rule permitting traffic to the DNS service's IP or its label selector, pods can't resolve any hostnames — including service names within the cluster.

This is the single most common production outage caused by Network Policies, and it's why you'll see patterns like - to: [namespaceSelector: {matchLabels: {kubernetes.io/metadata.name: kube-system}}, podSelector: {matchLabels: {k8s-app: kube-dns}}] in every serious policy set.

Network Policies are not a replacement for service meshes (like Istio or Linkerd) or host-level firewalls — they operate at L3/L4 only, cannot filter on application-layer protocols, and have no concept of authentication or encryption. They're best used as a first line of defense for namespace isolation, compliance boundaries (e.g., PCI workloads), and preventing lateral movement.

Alternatives include Calico's extended network policies (which add L7 filtering and deny rules), Cilium's eBPF-based policies (which offer better performance and L7 visibility), and OPA/Gatekeeper for admission-time policy enforcement. For clusters under ~50 nodes, iptables-based enforcement (default in most CNIs) is fine; beyond that, eBPF-based CNIs like Cilium show 2-5x better performance in policy-heavy environments.

Plain-English First

Imagine your apartment building has no locks on any doors — every tenant can walk into every other apartment freely. Kubernetes without Network Policies is exactly that: every Pod can talk to every other Pod by default. Network Policies are the deadbolts you install. You decide which apartments can knock on which doors, and everyone else gets turned away at the hallway.

Most teams get Kubernetes running, deploy their apps, and move on — never realizing their payment service can freely dial their logging sidecar, which can freely dial their database, which can freely reach the internet. That's not paranoia; that's the default. Kubernetes was designed for rapid connectivity, not zero-trust isolation. The moment you run multiple tenants, compliance workloads, or anything that touches PII or financial data, that open-door model becomes a liability.

Network Policies solve this by letting you express intent in YAML: only Pods with this label may reach my database on port 5432, from this namespace only, and my database can reach nothing outbound except DNS. The CNI plugin — not the Kubernetes API server — enforces those rules in the kernel using iptables, eBPF, or nftables depending on your stack. That distinction matters enormously for debugging and performance.

This is not a syntax reference. It covers how policies are evaluated and merged, how to write airtight ingress and egress rules without accidentally blackholing DNS, how to verify enforcement at the network level rather than trusting your YAML applied cleanly, and the production mistakes that silently leave clusters wide open.

Why Default-Deny Egress Breaks DNS

Kubernetes Network Policies are firewall rules that control traffic between pods at the IP address or port level (OSI layer 3 or 4). They are implemented by the CNI plugin (e.g., Calico, Cilium, Weave) and are evaluated per-pod based on label selectors. The core mechanic: a policy selects a set of pods and defines ingress and/or egress rules — if no policy selects a pod, all traffic is allowed; once any policy selects it, all traffic not explicitly permitted is denied.

In practice, applying a default-deny egress policy (a policy that selects all pods with no egress rules) immediately blocks all outbound traffic, including DNS lookups to the cluster's CoreDNS service. This happens because DNS runs on UDP/TCP port 53, and unless your egress rule explicitly allows traffic to the kube-system namespace on port 53, the pod cannot resolve any hostnames. This is a common first-day surprise for teams adopting network policies.

Use default-deny egress when you need to enforce least-privilege networking — for example, in multi-tenant clusters or PCI/HIPAA environments. But always pair it with an explicit egress rule that permits DNS traffic to the CoreDNS service IP or namespace selector. Without that, your pods will fail to resolve service names, causing cascading failures in service discovery, health checks, and external API calls.

DNS is not automatically exempt
Kubernetes Network Policies do not have a built-in exception for DNS. If you apply a default-deny egress policy, you must explicitly allow UDP/TCP 53 to CoreDNS.
Production Insight
A team applied a default-deny egress policy to a namespace running a microservice that calls an external API via hostname. The service started returning 'connection refused' errors because DNS resolution failed — the pod couldn't resolve the external hostname. The symptom was intermittent timeouts and 5xx errors in the API gateway, not a clear DNS failure. Rule of thumb: always add an egress rule for DNS (namespace: kube-system, port: 53, protocol: UDP) before applying any default-deny egress policy.
Key Takeaway
Default-deny egress blocks all outbound traffic, including DNS — you must explicitly allow port 53 to CoreDNS.
Network policies are additive: once a pod is selected by any policy, all unpermitted traffic is denied.
Always test network policies in a non-production namespace first; a misconfigured egress rule can silently break service discovery.
Kubernetes Network Policies: Default-Deny Egress Blocks DNS THECODEFORGE.IO Kubernetes Network Policies: Default-Deny Egress Blocks DNS Flow from default-deny egress to DNS failure and production fixes Default-Deny Egress Policy Blocks all outbound traffic from pods DNS Resolution Failure CoreDNS blocked; pod can't resolve names Precise Egress Rule Allow UDP 53 to kube-dns service IP Verify with nslookup Test from pod inside policy namespace Namespace Isolation Pattern Default-deny per namespace + DNS allow ⚠ Default-deny egress without DNS rule breaks all pod networking Always add egress rule for kube-dns before applying default-deny THECODEFORGE.IO
thecodeforge.io
Kubernetes Network Policies: Default-Deny Egress Blocks DNS
Kubernetes Network Policies

How Network Policy Enforcement Actually Works — The CNI Layer

Here's the thing most tutorials skip: the Kubernetes API server doesn't enforce Network Policies. It just stores them. The actual enforcement happens inside your CNI plugin — Calico, Cilium, Weave, Antrea — which watches the API server for NetworkPolicy objects and translates them into kernel-level firewall rules on each node.

With Calico on older kernels, that means iptables chains per endpoint. With Cilium, it's eBPF programs loaded into the kernel that intercept packets at the socket layer before they ever hit iptables — significantly lower latency and dramatically better observability. With Flannel, enforcement is zero because Flannel doesn't implement Network Policies at all. This is one of the most common production surprises: a team applies policies and believes they're enforced, but their CNI silently ignores them.

Policy evaluation works like a firewall whitelist. If no NetworkPolicy selects a Pod, all traffic is allowed. The moment any policy selects a Pod — via podSelector — that Pod enters an implicit 'default deny' for the traffic directions that policy governs. Multiple policies selecting the same Pod are unioned together: a packet is allowed if it matches any one of them. There's no precedence, no ordering, no 'deny' rule type in the core API. You get whitelisting only, which is both a simplicity win and a constraint you need to design around.

default-deny-all.yamlYAML
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# STEP 1: Apply a default-deny baseline to a namespace.
# This selects ALL pods in the namespace (empty podSelector matches everything)
# and specifies BOTH policyTypes — so both ingress and egress are now default-deny.
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-all-traffic
  namespace: payments
spec:
  podSelector: {}              # Empty selector = matches every Pod in this namespace
  policyTypes:
    - Ingress                  # Explicitly govern inbound traffic
    - Egress                   # Explicitly govern outbound traffic
  # No ingress or egress rules defined here — that's intentional.
  # The absence of rules under a governed policyType means: deny everything.
  # This is your zero-trust baseline. Now you add back only what you need.
Output
networkpolicy.networking.k8s.io/default-deny-all-traffic created
Flannel Won't Enforce Anything
  • Flannel: Provides networking only. No NetworkPolicy enforcement. Zero.
  • Calico: Full NetworkPolicy support via iptables or eBPF (with Calico CNI).
  • Cilium: Full NetworkPolicy support via eBPF. Extended CRDs for L7 policies.
  • Weave: NetworkPolicy support but less performant than Calico/Cilium.
  • Antrea: VMware's CNI with full NetworkPolicy support and traceflow debugging.
Production Insight
The most dangerous state is a cluster that has a policy-enforcing CNI but where the CNI's policy controller is not running. Calico's calico-kube-controllers and Cilium's operator must be healthy for policies to be translated into kernel rules. If the controller crashes, existing rules remain (they are already in the kernel), but new policies are not applied and deleted policies are not removed. Monitor the CNI controller's health as critical infrastructure.
Key Takeaway
The API server stores Network Policies. The CNI enforces them. Flannel ignores them entirely. Always verify your CNI supports enforcement before trusting any policy. Monitor the CNI controller as critical infrastructure.

Writing Precise Ingress and Egress Rules — With the DNS Trap Explained

Once you've applied default-deny, you need to surgically re-open only the traffic paths your application legitimately needs. Ingress rules control what can reach your Pod. Egress rules control what your Pod can reach. Both use the same selector primitives: podSelector, namespaceSelector, and ipBlock, which you can combine with AND logic inside a single from/to entry, or use OR logic across multiple entries.

The subtlety that burns everyone: a from entry with both podSelector AND namespaceSelector means the source must match BOTH selectors simultaneously — it's an AND. Two separate from entries each with their own selector is an OR. The indentation in YAML is load-bearing here. Get it wrong and you either over-permit or under-permit with no error from the API server.

The DNS trap is equally nasty. When you lock down egress, your Pods immediately lose DNS resolution because they can no longer reach CoreDNS on port 53 UDP/TCP. Every connection attempt fails not with a 'connection refused' but with a timeout waiting for DNS — which takes 30 seconds to surface. Always add an explicit egress rule for CoreDNS as part of your default-deny rollout, or you'll wonder why your app is broken when your network policy looks correct.

api-server-network-policy.yamlYAML
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
# This policy governs the 'api-server' Pods in the 'payments' namespace.
# It allows:
#   INGRESS: Only from Pods labeled 'app: frontend' in the 'web' namespace
#   EGRESS:  Only to the PostgreSQL database pods on port 5432
#            AND to CoreDNS on port 53 (critical — without this, DNS breaks)
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: api-server-traffic-rules
  namespace: payments
spec:
  podSelector:
    matchLabels:
      app: api-server
  policyTypes:
    - Ingress
    - Egress

  ingress:
    - from:
        # AND logic: source must be in namespace 'web' AND have label 'app: frontend'
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: web
          podSelector:
            matchLabels:
              app: frontend
      ports:
        - protocol: TCP
          port: 8080

  egress:
    # Rule 1: Allow outbound to PostgreSQL pods only
    - to:
        - podSelector:
            matchLabels:
              app: postgres-primary
      ports:
        - protocol: TCP
          port: 5432

    # Rule 2: Allow DNS resolution — NEVER omit this in an egress-restricted policy
    - to:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: kube-system
          podSelector:
            matchLabels:
              k8s-app: kube-dns
      ports:
        - protocol: UDP
          port: 53
        - protocol: TCP
          port: 53
Output
networkpolicy.networking.k8s.io/api-server-traffic-rules created
AND vs OR is Indentation-Deep
  • Same dash entry with podSelector AND namespaceSelector: source must match BOTH (AND).
  • Separate dash entries with podSelector OR namespaceSelector: source can match EITHER (OR).
  • No from/to clause under a governed policyType: deny all for that direction.
  • Empty from/to clause (from: []): also deny all — same as omitting the clause.
  • ipBlock can be combined with podSelector/namespaceSelector in the same entry (AND).
Production Insight
The ipBlock selector is the only way to restrict traffic to external IPs (non-Pod destinations). However, ipBlock rules interact poorly with NAT. If your Pods go through a NAT gateway to reach the internet, the source IP seen by the destination is the NAT gateway's IP, not the Pod's IP. Conversely, for inbound traffic from outside the cluster via LoadBalancer/NodePort, the source IP seen by the Pod is the node's IP (unless externalTrafficPolicy: Local). Always test ipBlock rules with the actual traffic path, not a theoretical one.
Key Takeaway
AND vs OR is determined by YAML indentation — same dash = AND, separate dashes = OR. The DNS trap is the most common production incident when applying default-deny egress. Always include a CoreDNS carve-out on UDP and TCP port 53.

Verifying Real Enforcement and Debugging Policy Failures in Production

Applying a NetworkPolicy and assuming it works is a mistake you only make once in production. The API server accepts any syntactically valid policy regardless of whether your CNI supports it. You need to verify enforcement at the traffic level, not the YAML level.

The gold-standard test is running a temporary Pod in the source namespace and attempting a connection directly — not through a Service mesh or load balancer that might bypass node-level rules. Use kubectl run with --rm -it to spin up a throwaway Pod, then use curl, nc, or wget to probe the target. A dropped connection times out; a policy-permitted connection either succeeds or returns an application-level error (which is actually what you want to see — it means the packet reached the target).

For Cilium clusters, cilium monitor and the Hubble UI are exceptionally powerful — they show you in real time which policies matched or dropped each flow, with source/destination Pod identity, namespace, and labels. For Calico clusters, calicoctl get networkpolicy and iptables -L -n --line-numbers on the node running your Pod reveal the actual enforced rules. Always test both directions — a policy that allows egress from Pod A to Pod B doesn't automatically allow ingress to Pod B from Pod A unless Pod B also has a matching ingress rule.

verify-network-policy-enforcement.shBASH
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
#!/usr/bin/env bash
# Verify Network Policy enforcement empirically.
# Tests both allowed and blocked paths.

set -euo pipefail

TARGET_NAMESPACE="payments"
TARGET_SERVICE="api-server"
TARGET_PORT="8080"
ALLOWED_SOURCE_NAMESPACE="web"
BLOCKED_SOURCE_NAMESPACE="monitoring"
CONNECT_TIMEOUT_SECONDS="3"

echo "=== Network Policy Enforcement Verification ==="
echo ""

# Test 1: Allowed path
echo "[TEST 1] Allowed ingress: frontend (web) -> api-server (payments)"
ALLOWED_RESULT=$(kubectl run policy-test-allowed \
  --namespace="$ALLOWED_SOURCE_NAMESPACE" \
  --image=curlimages/curl:8.5.0 \
  --restart=Never --rm --quiet -it \
  -- curl --silent --max-time "$CONNECT_TIMEOUT_SECONDS" \
       --output /dev/null --write-out "%{http_code}" \
       "http://${TARGET_SERVICE}.${TARGET_NAMESPACE}.svc.cluster.local:${TARGET_PORT}/health" \
  2>/dev/null || echo "FAILED")

if [ "$ALLOWED_RESULT" = "200" ]; then
  echo "  PASS: HTTP 200 received"
else
  echo "  FAIL: Expected HTTP 200, got '$ALLOWED_RESULT'"
fi

echo ""

# Test 2: Blocked path
echo "[TEST 2] Blocked ingress: prometheus (monitoring) -> api-server (payments)"
BLOCKED_RESULT=$(kubectl run policy-test-blocked \
  --namespace="$BLOCKED_SOURCE_NAMESPACE" \
  --image=curlimages/curl:8.5.0 \
  --restart=Never --rm --quiet -it \
  -- curl --silent --max-time "$CONNECT_TIMEOUT_SECONDS" \
       --output /dev/null --write-out "%{http_code}" \
       "http://${TARGET_SERVICE}.${TARGET_NAMESPACE}.svc.cluster.local:${TARGET_PORT}/health" \
  2>/dev/null || echo "TIMEOUT")

if [ "$BLOCKED_RESULT" = "TIMEOUT" ] || [ "$BLOCKED_RESULT" = "000" ]; then
  echo "  PASS: Connection timed out — blocked source is correctly dropped"
else
  echo "  FAIL: Expected timeout, got '$BLOCKED_RESULT' — policy NOT enforced!"
fi

echo ""
echo "=== Verification Complete ==="
Output
=== Network Policy Enforcement Verification ===
[TEST 1] Allowed ingress: frontend (web) -> api-server (payments)
PASS: HTTP 200 received
[TEST 2] Blocked ingress: prometheus (monitoring) -> api-server (payments)
PASS: Connection timed out — blocked source is correctly dropped
=== Verification Complete ===
Timeout vs Refused
  • Timeout (after 3-30s): Packet was dropped by the CNI. NetworkPolicy is enforcing correctly.
  • Connection refused (immediate): Packet reached the target process. NetworkPolicy is NOT blocking this path.
  • HTTP 200: Packet reached the application and got a valid response. Policy allows this traffic.
  • HTTP 5xx: Packet reached the application but the app returned an error. Policy allows, app has issues.
  • DNS timeout (30s): UDP 53 to CoreDNS is blocked. Check egress rules for DNS carve-out.
Production Insight
Testing Network Policies in CI/CD is essential. Create a pipeline stage that deploys the NetworkPolicy, runs the verification script, and fails the pipeline if enforcement is incorrect. Use a dedicated test namespace with known source and destination Pods. This catches policy regressions before they reach production. For Cilium, integrate Hubble flow logs into your CI to verify policy verdicts programmatically.
Key Takeaway
Never trust that a NetworkPolicy is enforced without empirical testing. Timeout = CNI drop (correct). Refused = application rejection (policy not working). Test both allowed and blocked paths. Automate verification in CI/CD.

Production Patterns: Namespace Isolation, Monitoring Carve-outs and Label Hygiene

In a real multi-tenant cluster, you can't write policies Pod-by-Pod. You need namespace-scoped baselines combined with additive per-workload rules. The pattern that works at scale is: one default-deny policy per namespace applied by your CD pipeline at namespace creation, then application-specific policies delivered alongside each Helm chart or Kustomize overlay.

Monitoring is the most common carve-out needed. Prometheus needs to scrape metrics from every namespace, but you don't want to globally allow all ingress. The clean solution is a namespace label like monitoring.io/allow-scrape: 'true' and a policy in each target namespace that allows ingress from the monitoring namespace on port 9090 or whatever your metrics port is. This keeps control local to the target namespace.

Label hygiene is non-negotiable. Network Policies inherit whatever labels your Pods have — if a developer changes a label during a refactor, the policy selector silently stops matching and the Pod falls back to default-deny behavior with no warning event. Use immutable labels like app: payment-api for security selectors and mutable labels like version: v2 only for routing. Audit your selectors in CI with kubectl get pods -l app=api-server -n payments and fail the pipeline if the expected count is zero.

prometheus-scrape-carveout.yamlYAML
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
# Prometheus scrape carve-out for the 'payments' namespace.
# Complements the default-deny-all policy already in place.
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-prometheus-scrape
  namespace: payments
  labels:
    policy-type: monitoring-carveout
    managed-by: platform-team
spec:
  podSelector:
    matchLabels:
      monitoring.io/expose-metrics: "true"
  policyTypes:
    - Ingress
  ingress:
    - from:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: monitoring
          podSelector:
            matchLabels:
              app: prometheus
      ports:
        - protocol: TCP
          port: 8080
Output
networkpolicy.networking.k8s.io/allow-prometheus-scrape created
Audit Policy Coverage with a One-Liner
  • Security labels (app, tier, team) should be immutable. Enforce with admission webhooks.
  • Routing labels (version, canary, blue-green) should NOT be used in NetworkPolicy selectors.
  • CI check: fail the pipeline if kubectl get pods -l app=<name> returns zero Pods.
  • Namespace labels (kubernetes.io/metadata.name) are auto-applied in Kubernetes 1.21+. Use them for namespaceSelector.
  • Adopt a naming convention: all NetworkPolicy names should include the namespace and workload they govern.
Production Insight
Namespace isolation at scale requires a namespace provisioning pipeline that automatically applies default-deny, DNS carve-out, and monitoring carve-out as non-removable base policies. Use Kyverno or OPA Gatekeeper to enforce that every namespace has a default-deny policy and that no Pod exists without the required security labels. This shifts security left and prevents configuration drift.
Key Takeaway
Namespace-scoped default-deny plus per-workload additive rules is the production pattern. Monitoring carve-outs keep Prometheus working without broad ingress. Label hygiene is non-negotiable — enforce immutable security labels with admission webhooks.

Network Policy Performance: iptables vs eBPF at Scale

The CNI enforcement mechanism directly impacts network latency and control plane load. Understanding the performance characteristics of your CNI is critical for capacity planning and troubleshooting latency issues that appear only at scale.

network-policy-performance-check.shBASH
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
#!/usr/bin/env bash
# Check NetworkPolicy rule count and CNI performance characteristics.
# Run on a node with calicoctl or cilium CLI installed.

set -euo pipefail

# For Calico: count iptables rules per endpoint
echo "=== Calico iptables Rule Count ==="
if command -v calicoctl &> /dev/null; then
  calicoctl get networkpolicy -A -o wide | wc -l
  echo "Total iptables chains on this node:"
  iptables -L -n | grep -c '^Chain'
  echo "Total iptables rules on this node:"
  iptables -L -n | grep -c '^[0-9]'
else
  echo "calicoctl not found — skipping Calico check"
fi

echo ""

# For Cilium: check eBPF policy programs
echo "=== Cilium eBPF Policy Status ==="
if command -v cilium &> /dev/null; then
  cilium status
  echo ""
  echo "Policy verdicts (last 100 flows):"
  cilium monitor --type policy-verdict -n payments | tail -20
else
  echo "cilium CLI not found — skipping Cilium check"
fi

echo ""

# General: check for NetworkPolicy count per namespace
echo "=== NetworkPolicy Distribution ==="
for ns in $(kubectl get namespaces -o jsonpath='{.items[*].metadata.name}'); do
  count=$(kubectl get networkpolicy -n "$ns" --no-headers 2>/dev/null | wc -l)
  if [ "$count" -gt 0 ]; then
    echo "  $ns: $count policies"
  fi
done
Output
=== Calico iptables Rule Count ===
Total iptables chains on this node: 847
Total iptables rules on this node: 12403
=== NetworkPolicy Distribution ===
payments: 12 policies
web: 8 policies
monitoring: 3 policies
iptables O(n) vs eBPF O(1)
  • iptables (Calico default): Sequential rule matching. Degrades at 1000+ Pods per node.
  • eBPF (Cilium, Calico with eBPF dataplane): Hash map lookups. Scales linearly.
  • iptables rule churn: Every policy change triggers iptables-restore on all nodes. Brief packet drops possible during restore.
  • eBPF program updates: Atomic program replacement. No packet drops during policy updates.
  • Kernel requirement: eBPF requires kernel 4.9+ minimum. Full features require 5.10+.
Production Insight
At scale (500+ Pods per node), iptables-based enforcement causes measurable latency increases and CPU overhead from rule traversal. The iptables-restore operation during policy updates can cause brief packet drops (1-5ms). For latency-sensitive workloads, use Cilium's eBPF dataplane. Monitor cilium_datapath_conntrack_gc_entries and iptables_restore_duration_seconds to detect enforcement bottlenecks.
Key Takeaway
iptables enforcement scales O(n) with rule count. eBPF enforcement scales O(1) with hash maps. At 500+ Pods per node, the difference is measurable. For latency-sensitive workloads, use Cilium. Monitor enforcement overhead as part of capacity planning.

The Two Kinds of Pod Isolation (And Why Default-Deny Is a Lie)

Network policies don't add security—they remove connectivity. That sounds backwards until you understand how Kubernetes handles pod isolation. When you create a NetworkPolicy that selects a pod, three things happen: First, the CNI drops all traffic that doesn't match an explicit allow rule. Second, the pod becomes 'isolated' in Kubernetes terms. Third, and this is where teams get burned: isolation only applies to interfaces the policy explicitly selects. Any traffic that bypasses a selected interface—like hostNetwork pods or traffic routed through the node itself—sails right through your 'default deny' policy. The official docs call this selective isolation. I call it a footgun. If you have a pod running with hostNetwork: true, no amount of NetworkPolicy magic touches its traffic. You need node-level firewall rules (iptables, nftables) or a service mesh sidecar to intercept that path. The two sorts of isolation are: namespace-wide isolation (select all pods in the namespace) and per-pod isolation (select specific labels). You can use both to build layered security, but never assume a default-deny policy covers the entire attack surface. It covers the pod network only.

isolation-footgun.ymlYAML
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
// io.thecodeforge — devops tutorial

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-ingress
  namespace: payments-api
spec:
  podSelector: {}
  policyTypes:
  - Ingress
---
# This pod is NOT protected by the above policy
apiVersion: v1
kind: Pod
metadata:
  name: vault-agent
  namespace: payments-api
spec:
  hostNetwork: true  # <-- bypasses all network policies
  containers:
  - name: vault
    image: hashicorp/vault:1.15
Output
Pod vault-agent created (network policy does not apply)
Traffic to/from hostNetwork pods is invisible to CNI layer
Production Trap:
Never run critical workloads with hostNetwork: true unless you have node-level firewall rules. Network policies will give you false confidence. Add a Kyverno policy to reject hostNetwork pods in sensitive namespaces.
Key Takeaway
Network policies only isolate pod-to-pod traffic on the cluster network. hostNetwork, node ports, and external load balancers bypass them entirely.

Default Policies: The Three You Must Write Before Breakfast

Teams that skip default policies end up debugging why their payments namespace can talk to prod databases at 3 AM. Kubernetes applies no default isolation—zero. A pod can shout into the void or whisper to any other pod until you write a policy. There are three defaults you need in every production cluster: default-deny-all-ingress, default-deny-all-egress, and an allow-dns-egress for kube-dns. The first two are simple: an empty podSelector with empty rules blocks everything. The third one is where 90% of teams fail. They write a default-deny egress, deploy it, and their pods can't resolve service names. DNS runs on UDP port 53 to the kube-dns service IP (usually 10.96.0.10). If your egress default-deny doesn't include an allow rule for that IP on UDP 53, you get to learn what a CoreDNS timeout looks like in production. Not fun. The pattern is: deny-all-ingress, deny-all-egress, then allow-DNS-egress to the cluster DNS IP. Everything else graduates to explicit allow rules. This is zero trust for the pod network. It's not optional. It's the baseline that every security audit will ask about. And yes, you should apply these default policies to every namespace via a cluster-wide controller or Open Policy Agent rule—because humans forget to write them for new namespaces.

golden-defaults.ymlYAML
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
// io.thecodeforge — devops tutorial

# Default-deny-all ingress
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-ingress
  namespace: payments-api
spec:
  podSelector: {}
  policyTypes:
  - Ingress
---
# Default-deny-all egress + DNS allow
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-egress-plus-dns
  namespace: payments-api
spec:
  podSelector: {}
  policyTypes:
  - Egress
  egress:
  - to:
    - ipBlock:
        cidr: 10.96.0.10/32  # cluster DNS IP
    ports:
    - port: 53
      protocol: UDP
Output
NetworkPolicy default-deny-ingress created
NetworkPolicy default-deny-egress-plus-dns created
(All pods in payments-api now blocked except DNS resolution)
Senior Shortcut:
Use Kyverno generate rules or a mutating admission webhook to auto-apply these three default policies to every namespace on creation. The CNI (Calico, Cilium) usually handles the enforcement, but the policy definitions are still your job to write.
Key Takeaway
A production-ready cluster needs exactly three default policies: deny-all-ingress, deny-all-egress, and allow-DNS-egress. Apply them to every namespace automatically.

Stop Guessing: How NetworkPolicy Treats Already-Open Connections

You apply a deny-all NetworkPolicy and expect instant silence. What happens to TCP sockets that were already established before the policy landed? They stay open. Kubernetes NetworkPolicy is connection-unaware — it evaluates packets, not sessions. The CNI plugin (Calico, Cilium, etc.) tracks conntrack entries from iptables or eBPF maps. Existing connections in ESTABLISHED state bypass new ingress/egress rules because the conntrack entry was created before your policy existed.

This is a production trap. If you roll out a strict default-deny policy during business hours, pre-existing SSH sessions, database connections, or monitoring scrapes won't be cut off immediately. They linger until the connection times out or the application tears it down. The fix: drain traffic or restart pods after policy changes. Don't trust a policy audit that shows blocked packets while live connections still flow. Conntrack doesn't lie, but it has memory.

existing-connections-bypass.ymlYAML
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
// io.thecodeforge — devops tutorial

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: deny-all
spec:
  podSelector: {}
  policyTypes:
  - Ingress
  - Egress
  # Existing ESTABLISHED connections survive this.
  # Check conntrack on nodes:
  # $ conntrack -L -p tcp --state ESTABLISHED
  # Or for Cilium:
  # $ cilium bpf conntrack list | grep ESTABLISHED
Output
No direct output. Inspect conntrack flow:
$ conntrack -L -p tcp --state ESTABLISHED | head -5
tcp 6 432000 ESTABLISHED src=10.0.0.1 dst=10.0.0.2 sport=443 dport=8443
Production Trap: Conntrack Memory Leaks Policy Gaps
Existing connections survive policy change. Always restart workloads after applying restrictive NetworkPolicies, else you'll validate only new connections and miss live sessions.
Key Takeaway
NetworkPolicy doesn't kill existing TCP flows — conntrack keeps them alive until timeout. Restart pods after policy changes.

Port Range Targeting: Why Your Micro-Service Needs It and Your Devs Don't

The ports field in NetworkPolicy accepts a single port or a named port on a pod. But some applications — like gRPC servers on ephemeral high ports or monitoring agents scanning port ranges — require targeting a contiguous range. Kubernetes 1.25+ added endPort to the ports spec. Without it, you'd write five separate ingress rules for ports 30000-30004. With endPort, you write one line.

The WHY: Port ranges reduce policy bloat and misconfiguration risk. If your service listens on ports 8080-8090, a single rule with port: 8080, endPort: 8090 covers all. But be brutal about scope — don't open a range because you're lazy. Define named ports on pods and reference those instead. Named ports are self-documenting and survive port renumbering. Reserve endPort for cases where you truly cannot predict the exact port (e.g., sidecar injection, dynamic service mesh ports). If your devs ask for a port range 'just in case', push back. They're asking for a security hole.

port-range-policy.ymlYAML
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
// io.thecodeforge — devops tutorial

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-metrics-range
spec:
  podSelector:
    matchLabels:
      app: prometheus
  policyTypes:
  - Ingress
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          kubernetes.io/metadata.name: monitoring
    ports:
    - protocol: TCP
      port: 9090
      endPort: 9100  # covers 9090, 9091, ..., 9100 (11 ports)
Output
After applying:
$ kubectl get networkpolicy allow-metrics-range -o yaml | grep -A4 "ports:"
ports:
- endPort: 9100
port: 9090
protocol: TCP
Senior Shortcut: Prefer Named Ports Over Ranges
Named ports (containerPort with a name) make policies human-readable. Use endPort only for dynamic ranges like service mesh envoy admin endpoints or ephemeral debug ports.
Key Takeaway
Use endPort to collapse port-range rules into one. But name your ports first — ranges breed chaos.

Explicitly Allow Necessary Pod-to-Pod Communications

Default-deny policies are the safe foundation, but they break your application unless you explicitly permit the exact pod-to-pod traffic your services depend on. The mistake is writing permissive catch-all rules like 'allow all from namespace X'. Instead, define precise policies that match pod labels, ports, and protocols your microservices actually use. For example, an API pod should only accept connections from a frontend pod on port 8080, not from a database pod. To discover these requirements, audit your application's connection map: list every pod that talks to every other pod, then encode each edge as a distinct network policy. This prevents accidental exposure and ensures that when a new pod is deployed, it cannot communicate unless a matching policy exists. The rule is simple: every allowed conversation must be intentional and visible in your YAML files, not implicitly inherited from namespace membership.

explicit-allow-pod-pod.ymlYAML
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
// io.thecodeforge — devops tutorial
// Explicit allow: frontend → API on port 8080
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-frontend-to-api
spec:
  podSelector:
    matchLabels:
      app: api
  policyTypes:
  - Ingress
  ingress:
  - from:
    - podSelector:
        matchLabels:
          tier: frontend
    ports:
    - protocol: TCP
      port: 8080
Output
Frontend → API traffic allowed. All other ingress to API pods is denied.
Production Trap:
A common fix is adding a single 'allow-all-inside-namespace' rule. This bypasses fine-grained control and creates a backdoor for every new pod, defeating default-deny isolation.
Key Takeaway
Every allowed pod-to-pod communication must be explicitly described by a pod selector and port rule — no implicit trust.

Summary: Network Policies Are Contracts, Not Filters

Kubernetes Network Policies enforce identity-based, least-privilege networking between pods. They are not firewall ACLs; they are declarative contracts enforced by the CNI plugin. The core pattern is: default deny all traffic, then explicitly allow only the minimal paths required by your application. You must handle the DNS trap — allow egress to CoreDNS or the cluster DNS service IP — and understand that already-established connections are not retroactively cut. For production, adopt namespace isolation, label hygiene, and eBPF-based CNIs for scale. Never rely on namespace-level allow rules; use fine-grained pod selectors. Port range targeting improves security for microservices with dynamic ports. Finally, always test enforcement with real probes: deploy a debug pod and confirm denied packets are actually dropped. Network policies are the single source of truth for pod communication; if your app breaks after adding them, your policy is incomplete, not the feature.

summary-default-deny.ymlYAML
1
2
3
4
5
6
7
8
9
10
11
// io.thecodeforge — devops tutorial
// Minimal default-deny baseline
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-all
spec:
  podSelector: {}
  policyTypes:
  - Ingress
  - Egress
Output
All ingress and egress blocked. Add explicit allow policies per service conversation.
Key Insight:
Network policies are your safe default. If a pod cannot reach another, your policy is missing a rule — this is correct behavior, not a bug.
Key Takeaway
Start with default-deny, then write explicit allow rules for each required communication path — no shortcuts, no namespace-wide permits.
● Production incidentPOST-MORTEMseverity: high

Default-Deny Egress Without DNS Carve-Out: Cluster-Wide Service Discovery Failure

Symptom
All services returned HTTP 503 or connection timeout errors. Readiness probes passed (they used localhost or IP addresses). Application logs showed 'could not resolve host' and 'DNS resolution failed' errors with 30-second delays per request. Prometheus targets all showed DOWN. No Kubernetes events were generated.
Assumption
A bad deployment was pushed that broke the application code. Or CoreDNS was down.
Root cause
The team applied a default-deny NetworkPolicy with both Ingress and Egress in policyTypes. The policy had no egress rules defined, which meant all outbound traffic was blocked — including UDP and TCP port 53 to CoreDNS. Every Pod in every namespace could no longer resolve DNS names. Services that connected to other services by DNS name (e.g., postgres.payments.svc.cluster.local) failed immediately. Services that connected by IP address continued to work, which made the failure pattern appear random.
Fix
1. Immediately patched every default-deny policy to include a DNS egress carve-out to CoreDNS on UDP/TCP port 53. 2. Created a namespace bootstrap template that always includes the DNS carve-out as a non-removable base rule. 3. Added a CI check that validates every NetworkPolicy with Egress in policyTypes has at least one egress rule targeting kube-system/kube-dns on port 53. 4. Documented the DNS trap in the team's runbook with a bold warning.
Key lesson
  • Default-deny egress blocks DNS by default. Always add a carve-out for CoreDNS on UDP and TCP port 53.
  • DNS failure manifests as 30-second timeouts, not immediate errors. This makes it look like a latency problem, not a connectivity problem.
  • Readiness probes that use localhost or IP addresses pass even when DNS is broken. Use DNS-based probes to catch this.
  • Test default-deny egress in staging with a curl-based smoke test before applying to production.
  • CI validation of NetworkPolicy completeness prevents this class of incident entirely.
Production debug guideSymptom-first investigation path for Network Policy issues in production.5 entries
Symptom · 01
Connection timeout between two Pods that should be able to communicate.
Fix
Check if a NetworkPolicy selects the destination Pod. If yes, verify the source Pod's labels match the policy's podSelector. Check namespace labels match namespaceSelector. Test with a throwaway curl Pod from the source namespace.
Symptom · 02
All Pods in a namespace have 30-second DNS resolution delays.
Fix
Check for a default-deny egress policy without a DNS carve-out. Verify CoreDNS is reachable from the Pod: kubectl exec into the Pod and run 'nslookup kubernetes.default.svc.cluster.local'. If it times out, the egress policy is blocking UDP/TCP 53.
Symptom · 03
NetworkPolicy applied but traffic still flows freely.
Fix
Verify the CNI supports NetworkPolicy. Run 'kubectl get pods -n kube-system | grep -E calico|cilium|weave|antrea'. If only Flannel is present, policies are silently ignored. If a policy-enforcing CNI is present, check that the policy's podSelector actually matches running Pods.
Symptom · 04
Traffic blocked from a source that should be allowed.
Fix
Check AND vs OR logic in the from/to selectors. Fields within the same list entry are ANDed. Separate entries are ORed. A common mistake is putting podSelector and namespaceSelector in the same entry (AND) when they should be separate entries (OR).
Symptom · 05
Intermittent connectivity between Pods after applying NetworkPolicy.
Fix
Check if Pods have the expected labels. A deployment rollout may have created new Pods with different labels. Run 'kubectl get pods -n <ns> --show-labels' and compare against the policy's podSelector. Label mismatches cause silent default-deny.
★ Network Policy Triage CommandsRapid commands to isolate Network Policy enforcement issues.
Connection timeout between Pods.
Immediate action
Test connectivity directly from source Pod to destination Pod IP.
Commands
kubectl exec -n <src-ns> <src-pod> -- curl -s --max-time 3 http://<dst-pod-ip>:<port>/health
kubectl get networkpolicy -n <dst-ns> -o yaml | grep -A 20 podSelector
Fix now
If curl times out, a NetworkPolicy is blocking the traffic. Check if source Pod labels match the policy's from selector. If no policy exists, check if the CNI supports NetworkPolicy.
DNS resolution failing (30-second timeouts).+
Immediate action
Check for egress policies blocking port 53.
Commands
kubectl exec -n <ns> <pod> -- nslookup kubernetes.default.svc.cluster.local
kubectl get networkpolicy -n <ns> -o json | jq '.items[] | select(.spec.policyTypes[]=="Egress") | {name: .metadata.name, egressRules: .spec.egress}'
Fix now
If DNS fails and an egress policy exists without a port 53 rule, add a DNS carve-out targeting kube-system/kube-dns on UDP/TCP 53.
Policy applied but not enforced.+
Immediate action
Verify CNI supports NetworkPolicy.
Commands
kubectl get pods -n kube-system | grep -E 'calico|cilium|weave|antrea'
kubectl get networkpolicy -n <ns> -o json | jq '.items[].spec.podSelector'
Fix now
If no policy-enforcing CNI is present, install Calico or Cilium. If CNI is present, check that podSelector matches actual Pod labels.
Traffic from unexpected source reaching Pod.+
Immediate action
Check for missing default-deny baseline.
Commands
kubectl get networkpolicy -n <ns> -o json | jq '.items[] | select(.spec.podSelector=={})'
kubectl get networkpolicy -n <ns> -o json | jq '.items[] | select(.spec.podSelector.matchLabels.app=="<target-app>") | .metadata.name'
Fix now
If no default-deny policy exists, apply one. If a policy exists but the Pod's labels don't match any policy's podSelector, the Pod is ungoverned and all traffic is allowed.
AND vs OR logic confusion in policy.+
Immediate action
Inspect the from/to entries and their indentation.
Commands
kubectl get networkpolicy <name> -n <ns> -o yaml | grep -A 30 'ingress\|egress'
kubectl exec -n <wrong-ns> <wrong-pod> -- curl -s --max-time 3 http://<target>:<port>/health
Fix now
Same dash (-) entry with multiple fields = AND. Separate dash entries = OR. Restructure the YAML to match intended access matrix.
CNI Plugin Network Policy Enforcement Comparison
AspectCalico (iptables mode)Calico (eBPF mode)Cilium (eBPF mode)Flannel
Policy enforcement layeriptables chains per endpointeBPF programs at TC layereBPF programs at socket/TC layerNone — ignores all policies
Observabilityiptables rule counters, calicoctlcalicoctl, BPF map inspectionHubble UI, per-flow policy verdict loggingN/A
Performance at scaleO(n) rule matching — degrades at 1000+ PodsO(1) hash maps — scales linearlyO(1) hash maps — scales linearlyN/A
Layer 7 policiesNot supported in core APINot supported in core APISupported natively (HTTP method, path, gRPC)N/A
DNS-based egressRequires GlobalNetworkPolicy (proprietary)Requires GlobalNetworkPolicy (proprietary)Built-in DNS-aware egressN/A
Kernel requirementAny Linux kernelKernel 5.10+Kernel 4.9+ (5.10+ for full features)Any Linux kernel
NetworkPolicy API supportFull complianceFull complianceFull compliance + extended CRDsNo support
Packet drop behavioriptables DROP — silent timeouteBPF DROP — silent timeouteBPF DROP — silent timeout, Hubble shows itN/A — packets always pass
Policy update mechanismiptables-restore — brief packet drops possibleAtomic BPF program replacementAtomic BPF program replacementN/A
Production maturityBattle-tested since 2016Maturing — GA in Calico v3.13+Rapidly maturing — preferred for new clusters 2022+Legacy — not recommended for production with security requirements

Key takeaways

1
The Kubernetes API server stores Network Policies but never enforces them
enforcement lives entirely in your CNI plugin. If your CNI doesn't support policies (Flannel), they are silently ignored.
2
An empty podSelector in a NetworkPolicy matches ALL Pods in the namespace
not zero Pods. This is how you write a namespace-wide default-deny baseline.
3
Multiple from/to entries are ORed together. Fields within a single entry are ANDed. This YAML indentation distinction determines your actual security posture and produces no errors when wrong.
4
Always include a DNS egress carve-out (UDP+TCP port 53 to kube-dns) before rolling out default-deny egress, or every service discovery call will silently time out after 30 seconds.
5
iptables enforcement scales O(n) with rule count. eBPF enforcement scales O(1). At 500+ Pods per node, the difference is measurable. Choose your CNI accordingly.
6
Label hygiene is non-negotiable. Use immutable labels for security selectors. Enforce with admission webhooks. Audit in CI.
7
Test Network Policies empirically with curl-based smoke tests. Timeout = CNI drop (correct). Refused = application rejection (policy not working).
8
Namespace-scoped default-deny plus per-workload additive rules is the production pattern. Automate namespace provisioning with base policies.
INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

FAQ · 6 QUESTIONS

Frequently Asked Questions

01
Does a Kubernetes Network Policy affect traffic between Pods in the same namespace?
02
Can Kubernetes Network Policies block traffic from outside the cluster?
03
What happens if two Network Policies select the same Pod with conflicting rules?
04
How do I verify that my Network Policies are actually being enforced?
05
What is the DNS trap and how do I avoid it?
06
Should I use Calico or Cilium for Network Policy enforcement?
N
Naren Founder & Principal Engineer

20+ years shipping production infrastructure and CI/CD at scale. Drawn from code that ran under real load.

Follow
Verified
production tested
May 24, 2026
last updated
1,554
articles · all by Naren
🔥

That's Kubernetes. Mark it forged?

11 min read · try the examples if you haven't

Previous
Kubernetes Architecture Explained
11 / 12 · Kubernetes
Next
Service Mesh — Istio Basics