Kubernetes Network Policies: Zero-Trust Segmentation in Production
- The Kubernetes API server stores Network Policies but never enforces them — enforcement lives entirely in your CNI plugin. If your CNI doesn't support policies (Flannel), they are silently ignored.
- An empty podSelector in a NetworkPolicy matches ALL Pods in the namespace — not zero Pods. This is how you write a namespace-wide default-deny baseline.
- Multiple from/to entries are ORed together. Fields within a single entry are ANDed. This YAML indentation distinction determines your actual security posture and produces no errors when wrong.
- Enforcement is done by the CNI plugin (Calico, Cilium), NOT the API server. Flannel ignores policies silently.
- Default behavior: if no policy selects a Pod, ALL traffic is allowed in both directions.
- Once any policy selects a Pod, that direction enters implicit default-deny. Only explicitly whitelisted traffic passes.
- Policies are additive whitelists. There is no deny rule in the standard API. Multiple policies selecting the same Pod are unioned (OR).
- iptables-based CNIs (Calico) scale O(n) with rule count. Performance degrades at 1000+ Pods.
- eBPF-based CNIs (Cilium) scale O(1) with hash maps. Better performance but requires kernel 4.9+.
- Forgetting DNS egress carve-out when applying default-deny egress. Every service discovery call silently times out after 30 seconds.
Connection timeout between Pods.
kubectl exec -n <src-ns> <src-pod> -- curl -s --max-time 3 http://<dst-pod-ip>:<port>/healthkubectl get networkpolicy -n <dst-ns> -o yaml | grep -A 20 podSelectorDNS resolution failing (30-second timeouts).
kubectl exec -n <ns> <pod> -- nslookup kubernetes.default.svc.cluster.localkubectl get networkpolicy -n <ns> -o json | jq '.items[] | select(.spec.policyTypes[]=="Egress") | {name: .metadata.name, egressRules: .spec.egress}'Policy applied but not enforced.
kubectl get pods -n kube-system | grep -E 'calico|cilium|weave|antrea'kubectl get networkpolicy -n <ns> -o json | jq '.items[].spec.podSelector'Traffic from unexpected source reaching Pod.
kubectl get networkpolicy -n <ns> -o json | jq '.items[] | select(.spec.podSelector=={})'kubectl get networkpolicy -n <ns> -o json | jq '.items[] | select(.spec.podSelector.matchLabels.app=="<target-app>") | .metadata.name'AND vs OR logic confusion in policy.
kubectl get networkpolicy <name> -n <ns> -o yaml | grep -A 30 'ingress\|egress'kubectl exec -n <wrong-ns> <wrong-pod> -- curl -s --max-time 3 http://<target>:<port>/healthProduction Incident
Production Debug GuideSymptom-first investigation path for Network Policy issues in production.
Most teams get Kubernetes running, deploy their apps, and move on — never realizing their payment service can freely dial their logging sidecar, which can freely dial their database, which can freely reach the internet. That's not paranoia; that's the default. Kubernetes was designed for rapid connectivity, not zero-trust isolation. The moment you run multiple tenants, compliance workloads, or anything that touches PII or financial data, that open-door model becomes a liability.
Network Policies solve this by letting you express intent in YAML: only Pods with this label may reach my database on port 5432, from this namespace only, and my database can reach nothing outbound except DNS. The CNI plugin — not the Kubernetes API server — enforces those rules in the kernel using iptables, eBPF, or nftables depending on your stack. That distinction matters enormously for debugging and performance.
This is not a syntax reference. It covers how policies are evaluated and merged, how to write airtight ingress and egress rules without accidentally blackholing DNS, how to verify enforcement at the network level rather than trusting your YAML applied cleanly, and the production mistakes that silently leave clusters wide open.
How Network Policy Enforcement Actually Works — The CNI Layer
Here's the thing most tutorials skip: the Kubernetes API server doesn't enforce Network Policies. It just stores them. The actual enforcement happens inside your CNI plugin — Calico, Cilium, Weave, Antrea — which watches the API server for NetworkPolicy objects and translates them into kernel-level firewall rules on each node.
With Calico on older kernels, that means iptables chains per endpoint. With Cilium, it's eBPF programs loaded into the kernel that intercept packets at the socket layer before they ever hit iptables — significantly lower latency and dramatically better observability. With Flannel, enforcement is zero because Flannel doesn't implement Network Policies at all. This is one of the most common production surprises: a team applies policies and believes they're enforced, but their CNI silently ignores them.
Policy evaluation works like a firewall whitelist. If no NetworkPolicy selects a Pod, all traffic is allowed. The moment any policy selects a Pod — via podSelector — that Pod enters an implicit 'default deny' for the traffic directions that policy governs. Multiple policies selecting the same Pod are unioned together: a packet is allowed if it matches any one of them. There's no precedence, no ordering, no 'deny' rule type in the core API. You get whitelisting only, which is both a simplicity win and a constraint you need to design around.
# STEP 1: Apply a default-deny baseline to a namespace. # This selects ALL pods in the namespace (empty podSelector matches everything) # and specifies BOTH policyTypes — so both ingress and egress are now default-deny. apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: default-deny-all-traffic namespace: payments spec: podSelector: {} # Empty selector = matches every Pod in this namespace policyTypes: - Ingress # Explicitly govern inbound traffic - Egress # Explicitly govern outbound traffic # No ingress or egress rules defined here — that's intentional. # The absence of rules under a governed policyType means: deny everything. # This is your zero-trust baseline. Now you add back only what you need.
- Flannel: Provides networking only. No NetworkPolicy enforcement. Zero.
- Calico: Full NetworkPolicy support via iptables or eBPF (with Calico CNI).
- Cilium: Full NetworkPolicy support via eBPF. Extended CRDs for L7 policies.
- Weave: NetworkPolicy support but less performant than Calico/Cilium.
- Antrea: VMware's CNI with full NetworkPolicy support and traceflow debugging.
Writing Precise Ingress and Egress Rules — With the DNS Trap Explained
Once you've applied default-deny, you need to surgically re-open only the traffic paths your application legitimately needs. Ingress rules control what can reach your Pod. Egress rules control what your Pod can reach. Both use the same selector primitives: podSelector, namespaceSelector, and ipBlock, which you can combine with AND logic inside a single from/to entry, or use OR logic across multiple entries.
The subtlety that burns everyone: a from entry with both podSelector AND namespaceSelector means the source must match BOTH selectors simultaneously — it's an AND. Two separate from entries each with their own selector is an OR. The indentation in YAML is load-bearing here. Get it wrong and you either over-permit or under-permit with no error from the API server.
The DNS trap is equally nasty. When you lock down egress, your Pods immediately lose DNS resolution because they can no longer reach CoreDNS on port 53 UDP/TCP. Every connection attempt fails not with a 'connection refused' but with a timeout waiting for DNS — which takes 30 seconds to surface. Always add an explicit egress rule for CoreDNS as part of your default-deny rollout, or you'll wonder why your app is broken when your network policy looks correct.
# This policy governs the 'api-server' Pods in the 'payments' namespace. # It allows: # INGRESS: Only from Pods labeled 'app: frontend' in the 'web' namespace # EGRESS: Only to the PostgreSQL database pods on port 5432 # AND to CoreDNS on port 53 (critical — without this, DNS breaks) apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: api-server-traffic-rules namespace: payments spec: podSelector: matchLabels: app: api-server policyTypes: - Ingress - Egress ingress: - from: # AND logic: source must be in namespace 'web' AND have label 'app: frontend' - namespaceSelector: matchLabels: kubernetes.io/metadata.name: web podSelector: matchLabels: app: frontend ports: - protocol: TCP port: 8080 egress: # Rule 1: Allow outbound to PostgreSQL pods only - to: - podSelector: matchLabels: app: postgres-primary ports: - protocol: TCP port: 5432 # Rule 2: Allow DNS resolution — NEVER omit this in an egress-restricted policy - to: - namespaceSelector: matchLabels: kubernetes.io/metadata.name: kube-system podSelector: matchLabels: k8s-app: kube-dns ports: - protocol: UDP port: 53 - protocol: TCP port: 53
- Same dash entry with podSelector AND namespaceSelector: source must match BOTH (AND).
- Separate dash entries with podSelector OR namespaceSelector: source can match EITHER (OR).
- No from/to clause under a governed policyType: deny all for that direction.
- Empty from/to clause (from: []): also deny all — same as omitting the clause.
- ipBlock can be combined with podSelector/namespaceSelector in the same entry (AND).
Verifying Real Enforcement and Debugging Policy Failures in Production
Applying a NetworkPolicy and assuming it works is a mistake you only make once in production. The API server accepts any syntactically valid policy regardless of whether your CNI supports it. You need to verify enforcement at the traffic level, not the YAML level.
The gold-standard test is running a temporary Pod in the source namespace and attempting a connection directly — not through a Service mesh or load balancer that might bypass node-level rules. Use kubectl run with --rm -it to spin up a throwaway Pod, then use curl, nc, or wget to probe the target. A dropped connection times out; a policy-permitted connection either succeeds or returns an application-level error (which is actually what you want to see — it means the packet reached the target).
For Cilium clusters, cilium monitor and the Hubble UI are exceptionally powerful — they show you in real time which policies matched or dropped each flow, with source/destination Pod identity, namespace, and labels. For Calico clusters, calicoctl get networkpolicy and iptables -L -n --line-numbers on the node running your Pod reveal the actual enforced rules. Always test both directions — a policy that allows egress from Pod A to Pod B doesn't automatically allow ingress to Pod B from Pod A unless Pod B also has a matching ingress rule.
#!/usr/bin/env bash # Verify Network Policy enforcement empirically. # Tests both allowed and blocked paths. set -euo pipefail TARGET_NAMESPACE="payments" TARGET_SERVICE="api-server" TARGET_PORT="8080" ALLOWED_SOURCE_NAMESPACE="web" BLOCKED_SOURCE_NAMESPACE="monitoring" CONNECT_TIMEOUT_SECONDS="3" echo "=== Network Policy Enforcement Verification ===" echo "" # Test 1: Allowed path echo "[TEST 1] Allowed ingress: frontend (web) -> api-server (payments)" ALLOWED_RESULT=$(kubectl run policy-test-allowed \ --namespace="$ALLOWED_SOURCE_NAMESPACE" \ --image=curlimages/curl:8.5.0 \ --restart=Never --rm --quiet -it \ -- curl --silent --max-time "$CONNECT_TIMEOUT_SECONDS" \ --output /dev/null --write-out "%{http_code}" \ "http://${TARGET_SERVICE}.${TARGET_NAMESPACE}.svc.cluster.local:${TARGET_PORT}/health" \ 2>/dev/null || echo "FAILED") if [ "$ALLOWED_RESULT" = "200" ]; then echo " PASS: HTTP 200 received" else echo " FAIL: Expected HTTP 200, got '$ALLOWED_RESULT'" fi echo "" # Test 2: Blocked path echo "[TEST 2] Blocked ingress: prometheus (monitoring) -> api-server (payments)" BLOCKED_RESULT=$(kubectl run policy-test-blocked \ --namespace="$BLOCKED_SOURCE_NAMESPACE" \ --image=curlimages/curl:8.5.0 \ --restart=Never --rm --quiet -it \ -- curl --silent --max-time "$CONNECT_TIMEOUT_SECONDS" \ --output /dev/null --write-out "%{http_code}" \ "http://${TARGET_SERVICE}.${TARGET_NAMESPACE}.svc.cluster.local:${TARGET_PORT}/health" \ 2>/dev/null || echo "TIMEOUT") if [ "$BLOCKED_RESULT" = "TIMEOUT" ] || [ "$BLOCKED_RESULT" = "000" ]; then echo " PASS: Connection timed out — blocked source is correctly dropped" else echo " FAIL: Expected timeout, got '$BLOCKED_RESULT' — policy NOT enforced!" fi echo "" echo "=== Verification Complete ==="
[TEST 1] Allowed ingress: frontend (web) -> api-server (payments)
PASS: HTTP 200 received
[TEST 2] Blocked ingress: prometheus (monitoring) -> api-server (payments)
PASS: Connection timed out — blocked source is correctly dropped
=== Verification Complete ===
- Timeout (after 3-30s): Packet was dropped by the CNI. NetworkPolicy is enforcing correctly.
- Connection refused (immediate): Packet reached the target process. NetworkPolicy is NOT blocking this path.
- HTTP 200: Packet reached the application and got a valid response. Policy allows this traffic.
- HTTP 5xx: Packet reached the application but the app returned an error. Policy allows, app has issues.
- DNS timeout (30s): UDP 53 to CoreDNS is blocked. Check egress rules for DNS carve-out.
Production Patterns: Namespace Isolation, Monitoring Carve-outs and Label Hygiene
In a real multi-tenant cluster, you can't write policies Pod-by-Pod. You need namespace-scoped baselines combined with additive per-workload rules. The pattern that works at scale is: one default-deny policy per namespace applied by your CD pipeline at namespace creation, then application-specific policies delivered alongside each Helm chart or Kustomize overlay.
Monitoring is the most common carve-out needed. Prometheus needs to scrape metrics from every namespace, but you don't want to globally allow all ingress. The clean solution is a namespace label like monitoring.io/allow-scrape: 'true' and a policy in each target namespace that allows ingress from the monitoring namespace on port 9090 or whatever your metrics port is. This keeps control local to the target namespace.
Label hygiene is non-negotiable. Network Policies inherit whatever labels your Pods have — if a developer changes a label during a refactor, the policy selector silently stops matching and the Pod falls back to default-deny behavior with no warning event. Use immutable labels like app: payment-api for security selectors and mutable labels like version: v2 only for routing. Audit your selectors in CI with kubectl get pods -l app=api-server -n payments and fail the pipeline if the expected count is zero.
# Prometheus scrape carve-out for the 'payments' namespace. # Complements the default-deny-all policy already in place. apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: allow-prometheus-scrape namespace: payments labels: policy-type: monitoring-carveout managed-by: platform-team spec: podSelector: matchLabels: monitoring.io/expose-metrics: "true" policyTypes: - Ingress ingress: - from: - namespaceSelector: matchLabels: kubernetes.io/metadata.name: monitoring podSelector: matchLabels: app: prometheus ports: - protocol: TCP port: 8080
- Security labels (app, tier, team) should be immutable. Enforce with admission webhooks.
- Routing labels (version, canary, blue-green) should NOT be used in NetworkPolicy selectors.
- CI check: fail the pipeline if
kubectl get pods -l app=<name>returns zero Pods. - Namespace labels (kubernetes.io/metadata.name) are auto-applied in Kubernetes 1.21+. Use them for namespaceSelector.
- Adopt a naming convention: all NetworkPolicy names should include the namespace and workload they govern.
Network Policy Performance: iptables vs eBPF at Scale
The CNI enforcement mechanism directly impacts network latency and control plane load. Understanding the performance characteristics of your CNI is critical for capacity planning and troubleshooting latency issues that appear only at scale.
#!/usr/bin/env bash # Check NetworkPolicy rule count and CNI performance characteristics. # Run on a node with calicoctl or cilium CLI installed. set -euo pipefail # For Calico: count iptables rules per endpoint echo "=== Calico iptables Rule Count ===" if command -v calicoctl &> /dev/null; then calicoctl get networkpolicy -A -o wide | wc -l echo "Total iptables chains on this node:" iptables -L -n | grep -c '^Chain' echo "Total iptables rules on this node:" iptables -L -n | grep -c '^[0-9]' else echo "calicoctl not found — skipping Calico check" fi echo "" # For Cilium: check eBPF policy programs echo "=== Cilium eBPF Policy Status ===" if command -v cilium &> /dev/null; then cilium status echo "" echo "Policy verdicts (last 100 flows):" cilium monitor --type policy-verdict -n payments | tail -20 else echo "cilium CLI not found — skipping Cilium check" fi echo "" # General: check for NetworkPolicy count per namespace echo "=== NetworkPolicy Distribution ===" for ns in $(kubectl get namespaces -o jsonpath='{.items[*].metadata.name}'); do count=$(kubectl get networkpolicy -n "$ns" --no-headers 2>/dev/null | wc -l) if [ "$count" -gt 0 ]; then echo " $ns: $count policies" fi done
Total iptables chains on this node: 847
Total iptables rules on this node: 12403
=== NetworkPolicy Distribution ===
payments: 12 policies
web: 8 policies
monitoring: 3 policies
- iptables (Calico default): Sequential rule matching. Degrades at 1000+ Pods per node.
- eBPF (Cilium, Calico with eBPF dataplane): Hash map lookups. Scales linearly.
- iptables rule churn: Every policy change triggers iptables-restore on all nodes. Brief packet drops possible during restore.
- eBPF program updates: Atomic program replacement. No packet drops during policy updates.
- Kernel requirement: eBPF requires kernel 4.9+ minimum. Full features require 5.10+.
cilium_datapath_conntrack_gc_entries and iptables_restore_duration_seconds to detect enforcement bottlenecks.| Aspect | Calico (iptables mode) | Calico (eBPF mode) | Cilium (eBPF mode) | Flannel |
|---|---|---|---|---|
| Policy enforcement layer | iptables chains per endpoint | eBPF programs at TC layer | eBPF programs at socket/TC layer | None — ignores all policies |
| Observability | iptables rule counters, calicoctl | calicoctl, BPF map inspection | Hubble UI, per-flow policy verdict logging | N/A |
| Performance at scale | O(n) rule matching — degrades at 1000+ Pods | O(1) hash maps — scales linearly | O(1) hash maps — scales linearly | N/A |
| Layer 7 policies | Not supported in core API | Not supported in core API | Supported natively (HTTP method, path, gRPC) | N/A |
| DNS-based egress | Requires GlobalNetworkPolicy (proprietary) | Requires GlobalNetworkPolicy (proprietary) | Built-in DNS-aware egress | N/A |
| Kernel requirement | Any Linux kernel | Kernel 5.10+ | Kernel 4.9+ (5.10+ for full features) | Any Linux kernel |
| NetworkPolicy API support | Full compliance | Full compliance | Full compliance + extended CRDs | No support |
| Packet drop behavior | iptables DROP — silent timeout | eBPF DROP — silent timeout | eBPF DROP — silent timeout, Hubble shows it | N/A — packets always pass |
| Policy update mechanism | iptables-restore — brief packet drops possible | Atomic BPF program replacement | Atomic BPF program replacement | N/A |
| Production maturity | Battle-tested since 2016 | Maturing — GA in Calico v3.13+ | Rapidly maturing — preferred for new clusters 2022+ | Legacy — not recommended for production with security requirements |
🎯 Key Takeaways
- The Kubernetes API server stores Network Policies but never enforces them — enforcement lives entirely in your CNI plugin. If your CNI doesn't support policies (Flannel), they are silently ignored.
- An empty podSelector in a NetworkPolicy matches ALL Pods in the namespace — not zero Pods. This is how you write a namespace-wide default-deny baseline.
- Multiple from/to entries are ORed together. Fields within a single entry are ANDed. This YAML indentation distinction determines your actual security posture and produces no errors when wrong.
- Always include a DNS egress carve-out (UDP+TCP port 53 to kube-dns) before rolling out default-deny egress, or every service discovery call will silently time out after 30 seconds.
- iptables enforcement scales O(n) with rule count. eBPF enforcement scales O(1). At 500+ Pods per node, the difference is measurable. Choose your CNI accordingly.
- Label hygiene is non-negotiable. Use immutable labels for security selectors. Enforce with admission webhooks. Audit in CI.
- Test Network Policies empirically with curl-based smoke tests. Timeout = CNI drop (correct). Refused = application rejection (policy not working).
- Namespace-scoped default-deny plus per-workload additive rules is the production pattern. Automate namespace provisioning with base policies.
⚠ Common Mistakes to Avoid
Interview Questions on This Topic
- QA NetworkPolicy is applied to a namespace with default-deny-all. A developer reports their Pod can reach the database but can't resolve any hostnames. What's wrong and how do you fix it without relaxing security?
- QExplain the difference between a podSelector and a namespaceSelector in a 'from' clause. What does it mean when both appear in the same list entry versus as separate entries? Give an example of when you'd need each.
- QYour team applies a NetworkPolicy that looks correct but packets are still flowing between Pods that should be blocked. Walk me through how you'd diagnose whether this is a CNI issue, a policy syntax issue, or a label mismatch.
- QExplain how Network Policy enforcement differs between Calico (iptables) and Cilium (eBPF). When would you choose one over the other?
- QWhat happens if two Network Policies select the same Pod? Can one policy override another? How do you achieve deny semantics in the standard Kubernetes API?
- QA Pod is receiving traffic from an unexpected namespace. The NetworkPolicy looks correct. What are the three most likely causes?
- QHow would you design a Network Policy strategy for a multi-tenant cluster with 50 namespaces? What automation would you put in place?
- QExplain the difference between timeout and connection refused in the context of Network Policy enforcement. Why does this distinction matter for debugging?
- QHow do ipBlock rules interact with NAT gateways and externalTrafficPolicy? When might ipBlock rules not work as expected?
- QDescribe how you would test Network Policies in CI/CD. What would your verification script check?
Frequently Asked Questions
Does a Kubernetes Network Policy affect traffic between Pods in the same namespace?
Yes, absolutely. Network Policies apply to all Pod-to-Pod traffic regardless of whether the source and destination are in the same namespace or different ones. A default-deny policy with an empty podSelector will block same-namespace traffic too, and you will need explicit ingress rules to re-permit it.
Can Kubernetes Network Policies block traffic from outside the cluster?
For traffic entering via a Service of type LoadBalancer or NodePort, the source IP seen by the Pod is typically the node's IP due to SNAT — not the original client IP. This means ipBlock rules targeting external IPs may not work as expected unless you set externalTrafficPolicy: Local on the Service. Network Policies work best for Pod-to-Pod east-west traffic. Perimeter security for north-south traffic belongs in a separate ingress controller or cloud firewall.
What happens if two Network Policies select the same Pod with conflicting rules?
There are no conflicts because Network Policies are purely additive whitelists — there is no deny rule in the standard API. If two policies both select the same Pod, their rules are unioned: a packet is allowed if it satisfies any matching rule from any policy. You can never use a second policy to override and deny something a first policy allows. For deny semantics, you need CNI-specific extensions like Calico's GlobalNetworkPolicy or Cilium's CiliumNetworkPolicy.
How do I verify that my Network Policies are actually being enforced?
Use empirical testing: run a throwaway Pod in the source namespace with kubectl run --rm -it and attempt a connection to the target. A timeout means the CNI is dropping packets (policy enforced). A connection refused means the packet reached the target (policy not enforced). For Cilium, use cilium monitor --type policy-verdict for real-time policy decision logging. For Calico, use iptables -L -n on the node to inspect enforced rules.
What is the DNS trap and how do I avoid it?
When you apply default-deny egress, all outbound traffic is blocked — including DNS resolution on UDP/TCP port 53 to CoreDNS. Every DNS lookup times out after 30 seconds, making services appear broken. The fix: always include an egress rule in your default-deny policy that allows traffic to kube-system Pods labeled k8s-app:kube-dns on both UDP and TCP port 53. Template this into your namespace provisioning pipeline.
Should I use Calico or Cilium for Network Policy enforcement?
Calico with iptables is battle-tested and works on any kernel but scales O(n) with rule count. Cilium with eBPF scales O(1) and supports L7 policies but requires kernel 4.9+. For new clusters in 2024+, Cilium is the preferred choice for its performance, observability (Hubble), and extended policy capabilities. For existing clusters on older kernels, Calico with the eBPF dataplane is a good middle ground.
Developer and founder of TheCodeForge. I built this site because I was tired of tutorials that explain what to type without explaining why it works. Every article here is written to make concepts actually click.