Advanced 4 min · June 21, 2026

Jenkins Kubernetes Deployment: Build Agents That Survive Production Chaos

Q: How do I deploy Jenkins on Kubernetes for production?

Use the official Helm chart with a custom values.yaml that sets persistent volume for /var/jenkins_home, resource limits, and JCasC config. Build a custom controller image with pinned plugins. Never use default settings.

Q: What's the difference between Jenkins Kubernetes plugin and static agents?

Kubernetes plugin spins up agent pods on demand, scales to zero when idle, and uses cluster resources efficiently. Static agents are pre-provisioned VMs that are always running. Use Kubernetes for dynamic workloads, static agents for predictable, long-running builds.

Q: How do I configure Jenkins agent pod templates for different build tools?

Define multiple pod templates in JCasC or Jenkinsfile, each with different container images (e.g., maven, node, python). Use labels to match pipelines to templates. Set resource requests per template to match tool requirements.

Q: What happens to running builds if the Jenkins controller pod restarts?

Agent pods lose connection and builds fail. To mitigate, use the 'Pod Retention' setting to 'On Failure' so agent logs persist. For critical builds, implement build checkpointing or use an external build cache. Consider using a StatefulSet for the controller to preserve pod identity.

Deploy Jenkins on Kubernetes with battle-tested patterns: avoid pod eviction, config drift, and plugin hell.

Naren Founder & Principal Engineer

20+ years shipping production infrastructure and CI/CD at scale. Lessons pulled from things that broke in production.

✓ Production

production tested

June 21, 2026

last updated

1,577

articles · all by Naren

● Production Incident 🔎 Debug Guide

⚡Quick Answer

Deploy Jenkins on Kubernetes using the official Helm chart with persistent volumes for /var/jenkins_home, configure the Kubernetes plugin to spin up agent pods from custom images, and set resource limits to prevent noisy neighbor builds from starving the cluster.

✦ Definition~90s read

What is Jenkins Kubernetes Deployment?

Jenkins Kubernetes Deployment means running Jenkins controllers and agents as containers in a Kubernetes cluster, using the Kubernetes plugin to dynamically provision build agents as pods. It's not just containerizing Jenkins—it's about making CI/CD elastic, resilient, and cloud-native.

★

Imagine you run a food truck (Jenkins) that needs a kitchen to cook orders (build jobs).

Plain-English First

Imagine you run a food truck (Jenkins) that needs a kitchen to cook orders (build jobs). Instead of owning a fixed kitchen, you rent space in a shared commercial kitchen (Kubernetes). When an order comes in, you grab a prep station (agent pod), cook, clean up, and leave. If the kitchen gets busy, you can rent more stations automatically. But if you leave dirty dishes (build artifacts) or take too long, the kitchen manager (Kubernetes scheduler) kicks you out.

You've containerized everything except your CI/CD pipeline. That's like building a race car and parking it in a horse stable. Running Jenkins on bare metal or VMs in 2024 is a self-inflicted wound—you're managing pets when you should be managing cattle. The Kubernetes plugin promises dynamic agents, but the default setup will burn you with pod evictions, config drift, and plugin incompatibilities at 3 AM. I've seen a payment service go down because a Jenkins agent pod consumed all node memory during a parallel test run. By the end of this, you'll deploy Jenkins on Kubernetes with production-hardened configurations that survive node failures, resource contention, and plugin upgrades without waking you up.

Why Not Just Run Jenkins on a VM? The Hidden Costs

Before Kubernetes, Jenkins ran on a single VM. That VM was a pet—you patched it, backed it up, and prayed it didn't die. Scaling meant cloning the VM and manually configuring agents. The real cost wasn't the VM; it was the operational overhead of maintaining a stateful CI server. With Kubernetes, you get self-healing, auto-scaling, and declarative config. But the trade-off is complexity: you now manage persistent volumes, pod networking, and plugin compatibility with container images. If your team has fewer than 10 developers, a simple Docker Compose setup might be faster. But for any serious CI/CD, Kubernetes is the only sane choice.

jenkins-controller-deployment.yamlYAML

// io.thecodeforge — DevOps tutorial
apiVersion: apps/v1
kind: Deployment
metadata:
  name: jenkins-controller
spec:
  replicas: 1
  selector:
    matchLabels:
      app: jenkins-controller
  template:
    metadata:
      labels:
        app: jenkins-controller
    spec:
      containers:
      - name: jenkins
        image: jenkins/jenkins:lts-jdk11
        ports:
        - containerPort: 8080
        - containerPort: 50000  # agent listener
        env:
        - name: JENKINS_OPTS
          value: "--handlerCountMax=100"  # prevent connection overload
        - name: JENKINS_JAVA_OPTIONS
          value: "-Xmx2g -Xms2g -Djenkins.install.runSetupWizard=false"
        volumeMounts:
        - name: jenkins-home
          mountPath: /var/jenkins_home
        resources:
          requests:
            memory: "2Gi"
            cpu: "1"
          limits:
            memory: "3Gi"
            cpu: "2"
      volumes:
      - name: jenkins-home
        persistentVolumeClaim:
          claimName: jenkins-home-pvc

Output

Deployment 'jenkins-controller' created. Pod starts with 2GB heap, listens on 8080 and 50000.

Production Trap: Single Replica Controller

Jenkins controller is stateful—don't run more than one replica without shared home and active/passive setup. The official Helm chart defaults to 1 replica. If you need HA, use the Jenkins High Availability plugin with a database-backed config. Otherwise, you'll get split-brain builds.

The Kubernetes Plugin: Dynamic Agents Done Right (and Wrong)

The Kubernetes plugin lets Jenkins spin up agent pods on demand. The default config is a trap: it uses the 'jnlp' image that's huge (800MB+) and slow to start. Worse, it doesn't clean up idle pods, so you pay for resources you don't use. The fix: use a custom agent image with only the tools you need (e.g., maven, docker, kubectl). Set 'Pod Retention' to 'Never' to delete pods after build. And always set resource requests—without them, your builds can starve the cluster. I've seen a team's entire dev namespace get evicted because a Jenkins agent pod consumed all CPU with no limits.

kubernetes-agent-pod-template.yamlYAML

// io.thecodeforge — DevOps tutorial
apiVersion: v1
kind: Pod
spec:
  containers:
  - name: jnlp
    image: jenkins/inbound-agent:4.13-1-jdk11
    args: ["$(JENKINS_SECRET)", "$(JENKINS_NAME)"]
  - name: maven
    image: maven:3.8.6-openjdk-11-slim
    command:
    - cat
    tty: true
    resources:
      requests:
        memory: "512Mi"
        cpu: "500m"
      limits:
        memory: "1Gi"
        cpu: "1"
  - name: docker
    image: docker:20.10-dind
    securityContext:
      privileged: true
    resources:
      requests:
        memory: "256Mi"
        cpu: "250m"
      limits:
        memory: "512Mi"
        cpu: "500m"
  nodeSelector:
    ci-workload: "true"  # pin agents to dedicated nodes

Output

Pod template with 3 containers: jnlp (Jenkins agent), maven (build tool), docker (Docker-in-Docker). Resource requests prevent overcommit.

Senior Shortcut: Use Pod Templates with Multiple Containers

Instead of one container with all tools, split into sidecars. This speeds up image pulls—only changed containers are re-pulled. Also, you can update tool versions independently without rebuilding the agent image.

Persistent Volumes: Don't Lose Your Jenkins Home

Jenkins stores everything in /var/jenkins_home: jobs, configs, build logs, plugin binaries. If that disappears, you lose your CI history. On Kubernetes, you must attach a PersistentVolumeClaim (PVC) to the controller pod. Use a ReadWriteOnce volume with a backup strategy. I've seen teams use emptyDir for 'ephemeral' Jenkins—then wonder why all jobs vanished after a pod restart. The gotcha: if your PVC is deleted, the data is gone. Always set reclaimPolicy: Retain on the PV. And for god's sake, back up /var/jenkins_home to S3 or GCS daily.

jenkins-home-pvc.yamlYAML

// io.thecodeforge — DevOps tutorial
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: jenkins-home-pvc
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi
  storageClassName: standard  # use your cluster's default
---
apiVersion: v1
kind: PersistentVolume
metadata:
  name: jenkins-home-pv
spec:
  capacity:
    storage: 10Gi
  accessModes:
  - ReadWriteOnce
  persistentVolumeReclaimPolicy: Retain  # don't delete data if PVC is removed
  hostPath:
    path: /data/jenkins-home  # for local testing; use cloud disk in production

Output

PVC 'jenkins-home-pvc' created, bound to PV 'jenkins-home-pv' with Retain policy.

Never Do This: Use hostPath in Production

hostPath ties the pod to a specific node. If that node dies, Jenkins is down until the pod is rescheduled—and data may be lost. Use cloud provider disks (EBS, GCE PD) or NFS-backed storage for production.

Resource Limits: Stop Noisy Neighbor Builds

Without resource limits, a single build can consume all node memory and trigger OOM kills of other pods. The Kubernetes plugin lets you set requests and limits per agent pod template. But here's the catch: if you set limits too low, builds fail with 'Container killed by OOM killer'. Too high, and you waste cluster capacity. The rule of thumb: set requests to the average usage, limits to the peak. Monitor with kubectl top pods. I once debugged a build that failed intermittently—turns out the agent pod had no memory limits and was sharing a node with a memory-hungry database. Set limits, and always test with your heaviest build.

Jenkinsfile-resource-limitsGROOVY

// io.thecodeforge — DevOps tutorial
pipeline {
    agent {
        kubernetes {
            yaml '''
apiVersion: v1
kind: Pod
spec:
  containers:
  - name: jnlp
    image: jenkins/inbound-agent:4.13-1-jdk11
    resources:
      requests:
        memory: "256Mi"
        cpu: "250m"
      limits:
        memory: "512Mi"
        cpu: "500m"
'''
        }
    }
    stages {
        stage('Build') {
            steps {
                sh 'mvn clean package -DskipTests'
            }
        }
    }
}

Output

Pipeline runs inside a pod with resource requests 256Mi/250m and limits 512Mi/500m.

Interview Gold: How Does Jenkins Handle Resource Contention?

Jenkins doesn't—Kubernetes does. The scheduler places pods based on requests. If a node runs out of resources, lower-priority pods get evicted. Always set priorityClassName on agent pods to avoid eviction of critical builds.

Plugin Management: The Dependency Hell You Inherit

Jenkins plugins are a mess. They depend on each other, and version mismatches cause startup failures. On Kubernetes, you can't just SSH in and fix plugins—you rebuild the image. The solution: use a custom controller image with pinned plugin versions. The official Jenkins image lets you install plugins via install-plugins.sh at build time. Never install plugins at runtime via the UI—that creates a snowflake container. I've seen a team's Jenkins fail to start because a plugin update pulled in a newer version of a dependency that broke another plugin. Pin everything.

Dockerfile.jenkins-controllerDOCKERFILE

// io.thecodeforge — DevOps tutorial
FROM jenkins/jenkins:lts-jdk11

# Install plugins at build time - pin versions
RUN jenkins-plugin-cli --plugins \
    kubernetes:1.31.3 \
    workflow-aggregator:2.6 \
    git:4.11.0 \
    configuration-as-code:1.55 \
    blueocean:1.25.3

# Disable setup wizard
ENV JENKINS_OPTS="--handlerCountMax=100"
ENV JENKINS_JAVA_OPTIONS="-Xmx2g -Xms2g -Djenkins.install.runSetupWizard=false"

# Copy custom config
COPY jenkins.yaml /var/jenkins_home/jenkins.yaml

Output

Image built with pinned plugins. No runtime plugin installation.

The Classic Bug: Plugin Version Conflicts

Fix: Rebuild the image with compatible versions. Use jenkins-plugin-cli --list to check dependency tree. Always test plugin upgrades in a staging environment.

Configuration as Code: Stop Clicking Around the UI

Jenkins UI configuration is not reproducible. One wrong click and your CI is broken. The Configuration as Code (JCasC) plugin lets you define everything in YAML. On Kubernetes, mount the YAML as a ConfigMap. This way, you can version control your Jenkins config and rebuild from scratch in minutes. The gotcha: some plugins don't support JCasC—you'll need to use the Groovy init script for those. I've seen a team lose a day because they forgot to configure the Kubernetes plugin credentials in JCasC, and agents couldn't connect.

jenkins.yamlYAML

// io.thecodeforge — DevOps tutorial
jenkins:
  systemMessage: "Jenkins configured by JCasC - do not edit via UI"
  numExecutors: 0  # disable built-in executor
  clouds:
  - kubernetes:
      name: "kubernetes"
      serverUrl: "https://kubernetes.default.svc.cluster.local"
      skipTlsVerify: false
      namespace: "jenkins-agents"
      jenkinsUrl: "http://jenkins-controller:8080"
      containerCap: 10  # max concurrent agents
      templates:
      - name: "default-agent"
        label: "k8s-agent"
        nodeUsageMode: NORMAL
        idleMinutes: 5  # terminate after 5 min idle
        yaml: |
          apiVersion: v1
          kind: Pod
          spec:
            containers:
            - name: jnlp
              image: jenkins/inbound-agent:4.13-1-jdk11
              resources:
                requests:
                  memory: "256Mi"
                  cpu: "250m"
                limits:
                  memory: "512Mi"
                  cpu: "500m"
credentials:
  system:
    domainCredentials:
    - credentials:
      - kubernetesServiceAccount:
          scope: GLOBAL
          id: "k8s-sa"
          description: "Service account for Kubernetes plugin"

Output

Jenkins configured with Kubernetes cloud, agent template, and service account credentials.

Senior Shortcut: Validate JCasC Before Applying

Use curl -X POST http://jenkins:8080/configuration-as-code/check to validate your YAML. This catches syntax errors before they break the controller.

Security: Don't Leave the Back Door Open

Jenkins on Kubernetes exposes two ports: 8080 (UI/API) and 50000 (agent listener). The agent listener should never be exposed externally—agents connect from inside the cluster. Use a ClusterIP service for 50000. For 8080, use an Ingress with TLS and authentication. The default Jenkins setup has no security—anyone can access the UI. I've seen a company's Jenkins exposed to the internet with no auth, allowing anyone to run arbitrary builds. Always enable security via JCasC and use Kubernetes secrets for credentials.

jenkins-service.yamlYAML

// io.thecodeforge — DevOps tutorial
apiVersion: v1
kind: Service
metadata:
  name: jenkins-controller
spec:
  type: ClusterIP  # not LoadBalancer - use Ingress for external access
  ports:
  - name: http
    port: 8080
    targetPort: 8080
  - name: agent
    port: 50000
    targetPort: 50000
  selector:
    app: jenkins-controller
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: jenkins-ingress
  annotations:
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
spec:
  tls:
  - hosts:
    - jenkins.example.com
    secretName: jenkins-tls
  rules:
  - host: jenkins.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: jenkins-controller
            port:
              number: 8080

Output

Service exposes Jenkins internally; Ingress provides TLS termination and external access.

Production Trap: Exposing Port 50000 Externally

Port 50000 is unencrypted by default. If exposed, anyone can connect as an agent and execute arbitrary code. Always keep it ClusterIP. Use SSH or TLS for agent communication if agents are outside the cluster.

Monitoring: Know When Your CI Is Dying

Jenkins on Kubernetes can fail silently: pods restart, builds hang, agents fail to connect. You need monitoring. Use Prometheus to scrape Jenkins metrics (via the Prometheus plugin) and set up alerts for: queue length > 10, build failure rate > 5%, agent connection errors. Also monitor Kubernetes pod status—if agent pods are in CrashLoopBackOff, you have a problem. I once debugged a case where Jenkins was 'running' but no builds executed—the Kubernetes plugin had lost connectivity to the API server due to a expired service account token.

prometheus-service-monitor.yamlYAML

// io.thecodeforge — DevOps tutorial
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: jenkins-monitor
  labels:
    release: prometheus
spec:
  selector:
    matchLabels:
      app: jenkins-controller
  endpoints:
  - port: http
    path: /prometheus  # Prometheus plugin endpoint
    interval: 30s
  namespaceSelector:
    matchNames:
    - jenkins

Output

Prometheus scrapes Jenkins metrics every 30 seconds.

Interview Gold: How Do You Detect a Dead Jenkins Agent?

Monitor the 'jenkins_agent_online' metric. If it drops to zero for a sustained period, alert. Also check Kubernetes pod status—if agent pods are terminating with 'Error' or 'OOMKilled', investigate resource limits.

● Production incidentPOST-MORTEMseverity: high

The 4GB Container That Kept Dying

Symptom

Jenkins controller pod OOM-killed every 6 hours during peak build times. Builds failed with 'Connection refused' from the controller.

Assumption

Assumed memory leak in a plugin. Restarted Jenkins daily as a workaround.

Root cause

The default JVM heap size (-Xmx) was set to 512MB via the official Helm chart. The controller was processing 200+ concurrent agent connections and running pipeline scripts that consumed heap for Groovy closures. The pod's memory limit was 4GB, but JVM never used more than 512MB before hitting GC overhead limit and crashing.

Fix

Set JENKINS_JAVA_OPTIONS=-Xmx2g -Xms2g in the controller deployment. Also set --handlerCountMax=100 in Jenkins CLI args to limit concurrent connections. Pod memory limit reduced to 3GB.

Key lesson

Never trust default JVM settings in containerized Jenkins.
Always set -Xmx to at least 50% of the pod memory limit.

Production debug guideSystematic recovery paths for the failure modes engineers actually hit.3 entries

Symptom · 01

Agent pods stuck in 'Pending' for >2 minutes

→

Fix

1. Run kubectl describe pod <agent-pod-name> to check events. 2. Look for 'Insufficient memory' or 'Insufficient cpu'. 3. Increase cluster node count or reduce agent resource requests. 4. If using nodeSelector, verify node labels match.

Symptom · 02

Jenkins UI returns 503 or connection refused

→

Fix

1. Check controller pod status: kubectl get pods -l app=jenkins-controller. 2. If pod is running, check logs: kubectl logs <pod-name>. 3. Look for 'OutOfMemoryError' or 'Port already in use'. 4. If OOM, increase memory limits and JVM heap. 5. If port conflict, ensure no other service uses 8080.

Symptom · 03

Builds fail with 'Error: Connection pool exhausted'

→

Fix

1. Check Jenkins system log for 'Unable to connect to agent'. 2. Verify agent pod is running: kubectl get pods -n jenkins-agents. 3. Check network policies: agent pods must reach controller on port 50000. 4. Increase 'Wait for pod to be running' timeout in Kubernetes plugin config to 300s. 5. If using custom agent image, ensure JNLP agent is correctly configured.

Feature / Aspect	VM-Based Jenkins	Kubernetes Jenkins
Scaling	Manual, add agents via SSH	Automatic, pods spin up on demand
Resource Utilization	Static, often over-provisioned	Dynamic, pay-per-build
Disaster Recovery	Full VM backup, slow restore	PVC backup, fast pod restart
Plugin Management	Manual updates via UI	Immutable image, rebuild on change
Learning Curve	Low	Moderate (Kubernetes concepts)
Cost for Small Teams	Lower (single VM)	Higher (cluster overhead)

Key takeaways

Always set JVM heap size (-Xmx) to at least 50% of the pod memory limit—defaults will OOM you.

Pin plugin versions in a custom Docker image—never install plugins at runtime via the UI.

Use JCasC to define Jenkins configuration in YAML, mounted as a ConfigMap—reproducible and version-controlled.

The Kubernetes plugin is powerful but defaults are dangerous

set resource requests, pod retention to 'Never', and container cap.

INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

FAQ · 4 QUESTIONS

Frequently Asked Questions

How do I deploy Jenkins on Kubernetes for production?

What's the difference between Jenkins Kubernetes plugin and static agents?

How do I configure Jenkins agent pod templates for different build tools?

What happens to running builds if the Jenkins controller pod restarts?

Naren Founder & Principal Engineer

20+ years shipping production infrastructure and CI/CD at scale. Lessons pulled from things that broke in production.

✓ Verified

production tested

June 21, 2026

last updated

1,577

articles · all by Naren

🔥

That's Jenkins. Mark it forged?

4 min read · try the examples if you haven't