Advanced 3 min · June 21, 2026

Jenkins Distributed Builds and Agents: Scale CI/CD Without Losing Your Sanity

Jenkins distributed builds and agents explained with production patterns, failure modes, and scaling strategies.

N
Naren Founder & Principal Engineer

20+ years shipping production infrastructure and CI/CD at scale. Lessons pulled from things that broke in production.

Follow
Production
production tested
June 21, 2026
last updated
1,577
articles · all by Naren
 ● Production Incident 🔎 Debug Guide
Quick Answer

To set up a Jenkins agent, install the agent.jar on the worker machine, connect it via SSH or JNLP, and label it in Jenkins. Use agents to run builds in parallel, on different platforms, or in isolated environments.

✦ Definition~90s read
What is Jenkins Distributed Builds and Agents?

Jenkins distributed builds offload jobs from the master node to agent machines. Agents execute build steps, freeing the master to orchestrate. This architecture scales CI/CD horizontally and isolates workloads.

Think of the Jenkins master as a restaurant manager who takes orders and coordinates the kitchen.
Plain-English First

Think of the Jenkins master as a restaurant manager who takes orders and coordinates the kitchen. Agents are the line cooks — they do the actual cooking. Without agents, the manager has to cook too, slowing everything down. With agents, you can add more cooks (agents) to handle more orders (builds) simultaneously.

Your Jenkins master is a single point of failure. I've seen a monolithic Jenkins instance choke during a code freeze — 200 developers pushing builds, the master's executor pool exhausted, builds queued for hours. The fix wasn't more RAM. It was distributed agents. Jenkins distributed builds let you scale horizontally: add agent machines to handle the load, run builds on different OSes, and isolate resource-hungry jobs. After this article, you'll be able to design a secure, resilient agent fleet, debug connection failures, and avoid the rookie mistakes that take down production pipelines.

Why You Need Distributed Builds: The Master's Breaking Point

A single Jenkins master with 10 executors can handle maybe 50 developers. Beyond that, builds queue, the UI lags, and the master's JVM runs out of Metaspace. Distributed builds solve this by offloading execution to agents. The master only schedules jobs and serves the UI. Agents do the heavy lifting — compiling, testing, packaging. This separation also lets you run builds on different platforms (Linux, Windows, macOS) without polluting the master's environment. Without agents, you're one runaway build away from taking down the entire CI system.

agent-connection-check.devopsDEVOPS
1
2
3
4
5
6
7
8
9
10
// io.thecodeforge — DevOps tutorial

# Check agent status from master CLI
java -jar jenkins-cli.jar -s http://jenkins-master:8080 -auth admin:token list-nodes

# Output shows connected agents and their executors
# Example output:
# master (executors: 2, busy: 0, idle: 2)
# linux-agent-1 (executors: 4, busy: 2, idle: 2)
# windows-agent-1 (executors: 2, busy: 0, idle: 2)
Output
master (executors: 2, busy: 0, idle: 2)
linux-agent-1 (executors: 4, busy: 2, idle: 2)
windows-agent-1 (executors: 2, busy: 0, idle: 2)
Production Trap:
Never run resource-intensive builds (e.g., Docker builds, large test suites) on the master. If the master's executor pool is exhausted, the UI becomes unresponsive and you can't even restart builds. Always use agents for heavy lifting.

Agent Connection Protocols: SSH vs JNLP vs WebSocket

Jenkins supports three agent connection protocols. SSH agents are the most reliable in production — they use a persistent SSH connection, survive network blips, and don't require a separate port. JNLP agents (Java Web Start) are legacy and require a TCP port for inbound connections — a security nightmare. WebSocket agents are the modern replacement for JNLP, using the same HTTP port as the master. I recommend SSH for permanent agents and WebSocket for ephemeral agents (e.g., Kubernetes pods). Avoid JNLP unless you're stuck on an ancient Jenkins version.

ssh-agent-setup.shDEVOPS
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
// io.thecodeforge — DevOps tutorial

# On agent machine, create a jenkins user
sudo useradd -m -s /bin/bash jenkins
sudo mkdir -p /home/jenkins/.ssh
sudo chmod 700 /home/jenkins/.ssh

# Copy master's public key to agent's authorized_keys
echo "ssh-rsa AAAAB3NzaC1yc2E..." | sudo tee /home/jenkins/.ssh/authorized_keys
sudo chmod 600 /home/jenkins/.ssh/authorized_keys
sudo chown -R jenkins:jenkins /home/jenkins/.ssh

# Test SSH from master
ssh -i /var/lib/jenkins/.ssh/id_rsa jenkins@agent-hostname 'java -version'

# In Jenkins UI: Manage Jenkins > Nodes > New Node
# Name: linux-agent-1
# Remote root directory: /home/jenkins/agent
# Launch method: Launch agents via SSH
# Host: agent-hostname
# Credentials: SSH key (private key from master)
# Host Key Verification Strategy: Non verifying Verification Strategy (or Known Hosts File)
Output
openjdk version "11.0.18" 2023-01-17 LTS
OpenJDK Runtime Environment (build 11.0.18+10-LTS)
OpenJDK 64-Bit Server VM (build 11.0.18+10-LTS, mixed mode, sharing)
Security Alert:
Using 'Non verifying Verification Strategy' is convenient but insecure. In production, use 'Known Hosts File Verification Strategy' and pre-populate the known_hosts file. Otherwise, a man-in-the-middle attack can compromise your agent.

Agent Labels: The Key to Build Routing

Labels are tags you assign to agents. They let you route specific jobs to specific agents based on requirements like OS, architecture, or installed tools. For example, label a Windows agent with 'windows' and a Linux agent with 'linux'. Then in your pipeline, use agent { label 'linux' } to ensure the build runs on Linux. Without labels, Jenkins picks any available agent, which can cause builds to fail due to missing dependencies. Labels also enable parallelism: you can have multiple agents with the same label and Jenkins will distribute jobs among them.

JenkinsfileDEVOPS
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
// io.thecodeforge — DevOps tutorial

pipeline {
    agent none
    stages {
        stage('Build on Linux') {
            agent { label 'linux' }
            steps {
                sh 'make build'
            }
        }
        stage('Test on Windows') {
            agent { label 'windows' }
            steps {
                bat 'msbuild.exe /t:Rebuild'
            }
        }
        stage('Deploy') {
            agent { label 'linux && production' }
            steps {
                sh 'deploy.sh'
            }
        }
    }
}
Output
Pipeline runs: Build on a linux agent, Test on a windows agent, Deploy on an agent with both 'linux' and 'production' labels.
Senior Shortcut:
Use label expressions like 'linux && highmem' to target agents with multiple attributes. Avoid spaces in label names — they're treated as separate labels. Use underscores or hyphens instead.

Securing Agent-Master Communication

The agent-master channel carries sensitive data: credentials, source code, build artifacts. If an attacker compromises an agent, they can exfiltrate secrets. Mitigations: use SSH agents (encrypted channel), enable agent-to-master security (CSRF protection), and restrict what agents can do. In Jenkins, enable 'Disable remember me' and use agent tokens. For Kubernetes agents, use ServiceAccounts with minimal RBAC. Never run agents as root — use a dedicated user with least privilege.

jenkins-security-config.groovyDEVOPS
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
// io.thecodeforge — DevOps tutorial

// In Jenkins script console or init.groovy.d
import jenkins.model.*
import hudson.security.*

Jenkins jenkins = Jenkins.getInstanceOrNull()

// Disable JNLP agents (insecure)
jenkins.setSlaveAgentPort(-1)

// Enable agent-to-master security
jenkins.setAgentToMasterAccessControl(true)

// Set CSRF protection
jenkins.setCrumbIssuer(new hudson.security.csrf.DefaultCrumbIssuer(true))

// Restrict agent usage to specific users
def strategy = new hudson.security.FullControlOnceLoggedInAuthorizationStrategy()
strategy.setAllowAnonymousRead(false)
jenkins.setAuthorizationStrategy(strategy)

jenkins.save()
Output
Jenkins configured with JNLP disabled, agent-to-master security enabled, CSRF protection on, and anonymous access blocked.
Never Do This:
Don't expose the JNLP agent port (default 50000) to the internet. If you must use JNLP, restrict inbound traffic to known agent IPs. Better yet, switch to SSH or WebSocket agents.

Scaling Agents with Kubernetes Plugin

The Kubernetes plugin spins up ephemeral agent pods on demand. This is the holy grail for elastic CI/CD: no idle agents, no manual provisioning. Each build gets a fresh, isolated environment. Configuration involves a Jenkins URL, Kubernetes cluster credentials, and a pod template. The pod template defines containers (e.g., jnlp, maven, docker) and resource limits. I've seen teams cut agent costs by 70% using this approach. But it introduces complexity: pod startup latency, image pull times, and network egress costs.

kubernetes-pod-template.yamlDEVOPS
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
// io.thecodeforge — DevOps tutorial

apiVersion: v1
kind: Pod
spec:
  containers:
  - name: jnlp
    image: jenkins/inbound-agent:latest
    args: ["$(JENKINS_SECRET)", "$(JENKINS_AGENT_NAME)"]
    resources:
      requests:
        memory: "256Mi"
        cpu: "200m"
      limits:
        memory: "512Mi"
        cpu: "500m"
  - name: maven
    image: maven:3.8-openjdk-11
    command:
    - cat
    tty: true
    resources:
      requests:
        memory: "1Gi"
        cpu: "500m"
      limits:
        memory: "2Gi"
        cpu: "1"
  - name: docker
    image: docker:20.10-dind
    securityContext:
      privileged: true
    resources:
      requests:
        memory: "512Mi"
        cpu: "200m"
      limits:
        memory: "1Gi"
        cpu: "500m"
  serviceAccountName: jenkins-agent
Output
A Kubernetes pod with three containers: jnlp (Jenkins agent), maven (build tool), and docker (Docker-in-Docker for container builds).
Interview Gold:
The Kubernetes plugin uses the JNLP protocol internally. That's fine because the connection stays within the cluster. But if your Jenkins master is outside the cluster, you'll need to expose the JNLP port — which is a security risk. Consider using the inbound-agent image with WebSocket mode instead.

Monitoring Agent Health and Performance

Agents die silently. A disconnected agent doesn't show up in build failures — it just causes builds to queue indefinitely. Monitor agent status using Jenkins API, Prometheus exporter, or custom scripts. Key metrics: executor count, queue length, agent response time. Set up alerts for agents that go offline. Also monitor disk space on agents — a full disk causes mysterious build failures. I've seen a build fail because /tmp filled up with Docker layers. Add a cron job to clean up old workspaces.

agent-monitor.shDEVOPS
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
// io.thecodeforge — DevOps tutorial

#!/bin/bash
# Monitor agent status via Jenkins API
JENKINS_URL="http://jenkins-master:8080"
API_TOKEN="your-api-token"

# Get list of agents and their status
curl -s "$JENKINS_URL/computer/api/json?tree=computer[displayName,offline]" \
  --user "admin:$API_TOKEN" | jq '.computer[] | {name: .displayName, offline: .offline}'

# Output:
# {
#   "name": "master",
#   "offline": false
# }
# {
#   "name": "linux-agent-1",
#   "offline": true
# }

# Alert if any agent is offline
if curl -s "$JENKINS_URL/computer/api/json?tree=computer[displayName,offline]" \
  --user "admin:$API_TOKEN" | jq -e '.computer[] | select(.offline == true)' > /dev/null; then
  echo "Some agents are offline!"
  # Send alert via Slack, email, etc.
fi
Output
{
"name": "master",
"offline": false
}
{
"name": "linux-agent-1",
"offline": true
}
Some agents are offline!
Senior Shortcut:

Troubleshooting Agent Connection Issues

Agent disconnections are the most common production issue. Symptoms: builds stuck in queue, 'Agent is offline' errors, or 'Connection was broken' in logs. First, check the agent's log (on the agent machine, look at jenkins-agent.log). Common causes: network timeout, JVM crash, or credential expiry. For SSH agents, verify the SSH key is still valid. For JNLP agents, check the TCP port is reachable. I once spent hours debugging an agent that disconnected every 30 minutes — turned out the agent's network had a firewall that closed idle connections after 5 minutes. The fix: set the SSH ClientAliveInterval to 60 seconds.

ssh-keepalive-configDEVOPS
1
2
3
4
5
6
7
8
9
10
11
12
13
14
// io.thecodeforge — DevOps tutorial

# On the Jenkins master's SSH config (~/.ssh/config)
Host agent-hostname
  ServerAliveInterval 60
  ServerAliveCountMax 3
  TCPKeepAlive yes

# On the agent's SSH server (/etc/ssh/sshd_config)
ClientAliveInterval 60
ClientAliveCountMax 3

# Restart SSH on agent
sudo systemctl restart sshd
Output
SSH keepalive configured. The master sends a keepalive packet every 60 seconds. If 3 consecutive keepalives fail, the connection is dropped and Jenkins will attempt to reconnect.
The Classic Bug:
If you use a load balancer in front of Jenkins agents, ensure it doesn't have a shorter idle timeout than your SSH keepalive interval. Otherwise, the load balancer will drop the connection first, causing random disconnections.

When Not to Use Distributed Builds

Distributed builds add complexity. If you have fewer than 10 developers and builds complete in under 5 minutes, a single master with 4 executors is fine. Also, if your builds are I/O-bound (e.g., large file transfers), adding agents won't help — the bottleneck is the network or storage. In that case, optimize the build process first. Finally, if your team lacks DevOps support, the overhead of managing agents (updates, security, monitoring) might outweigh the benefits. Start simple, scale when you feel the pain.

Interview Gold:
Interviewers love asking: 'When would you NOT use distributed builds?' The answer: when build times are acceptable, team size is small, or the overhead of agent management exceeds the benefit. Always measure before scaling.
● Production incidentPOST-MORTEMseverity: high

The 4GB Container That Kept Dying

Symptom
A Docker-based agent kept crashing with OOMKilled during Maven builds. Builds failed randomly, no pattern.
Assumption
Team assumed the container needed more memory. Increased from 4GB to 8GB. Still crashed.
Root cause
The agent was running with default JVM heap settings. Maven spawned a forked compiler process that, combined with the agent JVM, exceeded the container's memory limit. The OOM killer targeted the container, not the JVM.
Fix
Set JVM heap limits on the agent: -Xmx512m -Xms256m. Also set MAVEN_OPTS=-Xmx512m. This kept total memory under 2GB.
Key lesson
  • Always constrain JVM heap inside containers.
  • Container memory limits don't control JVM heap — you must set -Xmx explicitly.
Production debug guideSystematic recovery paths for the failure modes engineers actually hit.3 entries
Symptom · 01
Agent shows as offline in Jenkins UI
Fix
1. Check agent machine is running and network reachable. 2. Check agent log at /var/log/jenkins/jenkins-agent.log. 3. Verify SSH key or JNLP token hasn't expired. 4. Restart agent service: sudo systemctl restart jenkins-agent. 5. If using SSH, test connection from master: ssh -i key jenkins@agent-hostname.
Symptom · 02
Builds stuck in queue, no agents available
Fix
1. Check agent count and executor usage: curl -s http://jenkins:8080/computer/api/json. 2. Look for agents that are offline or busy. 3. Check if labels match job requirements. 4. Increase executor count on existing agents or add new agents. 5. Check if agent is in 'temporarily offline' mode.
Symptom · 03
Agent disconnects randomly during builds
Fix
1. Check network stability between master and agent. 2. Enable SSH keepalive (ServerAliveInterval=60). 3. Check agent JVM heap usage: jstat -gc <pid>. 4. Increase JVM heap if garbage collection is excessive. 5. Check for firewall or load balancer idle timeouts.
Feature / AspectSSH AgentJNLP AgentWebSocket Agent
Connection directionMaster -> Agent (outbound)Agent -> Master (inbound)Agent -> Master (inbound)
Port required22 (SSH)50000 (JNLP)8080 (HTTP)
SecurityHigh (encrypted, key-based)Low (plaintext, token-based)Medium (encrypted via HTTPS)
Firewall friendlyYes (outbound only)No (inbound port needed)Yes (uses same port as UI)
Ephemeral supportNo (persistent connection)Yes (can reconnect)Yes (built for ephemeral)
Recommended usePermanent agentsLegacy / avoidKubernetes / cloud agents

Key takeaways

1
Distributed builds scale CI/CD horizontally and isolate workloads from the master.
2
Always use SSH agents for permanent agents and WebSocket for ephemeral ones
avoid JNLP.
3
Labels are your routing mechanism; use them to match builds to the right environment.
4
Monitor agent health and disk space proactively
silent agent failures waste developer time.
5
Start simple
a single master with a few agents. Scale when you measure the pain.
INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

FAQ · 4 QUESTIONS

Frequently Asked Questions

01
How do I set up a Jenkins agent on a remote machine?
02
What's the difference between a Jenkins agent and a node?
03
How do I run a Jenkins build on a specific agent?
04
Why does my Jenkins agent keep disconnecting?
N
Naren Founder & Principal Engineer

20+ years shipping production infrastructure and CI/CD at scale. Lessons pulled from things that broke in production.

Follow
Verified
production tested
June 21, 2026
last updated
1,577
articles · all by Naren
🔥

That's Jenkins. Mark it forged?

3 min read · try the examples if you haven't

Previous
Jenkins Security and RBAC
19 / 23 · Jenkins
Next
Jenkins Configuration as Code (JCasC)