Beginner 6 min · June 21, 2026

Jenkins Controller and Agent: Stop Running Everything on One Machine

Jenkins controller and agent architecture explained.

N
Naren Founder & Principal Engineer

20+ years shipping production infrastructure and CI/CD at scale. Written from production experience, not tutorials.

Follow
Production
production tested
June 21, 2026
last updated
1,577
articles · all by Naren
 ● Production Incident 🔎 Debug Guide
Quick Answer

The Jenkins controller is the central server that schedules jobs and stores configuration. Agents (formerly slaves) are remote machines that execute those jobs. You connect agents to the controller via SSH, JNLP, or web start. This lets you run builds on different OSes, scale horizontally, and isolate workloads.

✦ Definition~90s read
What is Jenkins Architecture?

Jenkins uses a controller-agent architecture where the controller manages job scheduling, configuration, and UI, while agents execute build tasks on separate machines. This decouples orchestration from execution, enabling parallel builds across diverse environments.

Think of the controller as the head chef in a busy kitchen.
Plain-English First

Think of the controller as the head chef in a busy kitchen. They plan the menu, take orders, and decide which dishes go to which station. The agents are the line cooks — each specializes in a different task (grill, pastry, etc.) and works independently. If you only had the head chef cooking, they'd be overwhelmed and everything would slow down. By delegating to multiple line cooks, the kitchen serves more orders faster.

Most Jenkins setups I've seen in production start the same way: someone installs Jenkins on a single VM, runs everything there, and it works fine — until it doesn't. The controller gets overloaded, builds queue up, and a single rogue job can take down the entire CI/CD pipeline. I've watched a payments team lose an entire release day because a memory leak in a test suite killed the Jenkins process. Don't be that team.

The controller-agent pattern solves this by separating the brain from the brawn. The controller handles lightweight orchestration — scheduling, authentication, UI — while agents do the heavy lifting: compiling, testing, packaging. This isn't just about scaling; it's about survival. Without it, you can't run builds on different platforms, you can't isolate untrusted jobs, and you can't recover from a single point of failure.

By the end of this article, you'll be able to set up a Jenkins controller with multiple agents, configure agent security, and diagnose the most common production failures — including the one that cost a fintech startup 6 hours of downtime last year.

Why You Need a Separate Controller and Agent

Running everything on the controller is like using your laptop as a production server. It works until you need to deploy at 3 AM and your laptop runs out of battery. The controller is the nervous system — it should be lean, responsive, and always available. Agents are the muscles — they do the heavy lifting and can be swapped out when they fail.

Without agents, every build competes for the same CPU, memory, and disk I/O. A single memory-intensive test suite can starve the controller's web UI, making it impossible to cancel the build or check logs. I've seen this bring down a CI pipeline for a 200-person engineering org because no one could access the Jenkins UI to kill the runaway job.

Agents also let you run builds on different platforms. Want to test on Windows, Linux, and macOS? Spin up an agent for each. Want to isolate untrusted pipeline code? Run it on a disposable agent in a Docker container. The controller never executes arbitrary code — it only orchestrates.

controller-agent-setup.shDEVOPS
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
// io.thecodeforge — DevOps tutorial

# Step 1: On the controller, generate agent connection details
# Go to Manage Jenkins > Nodes > New Node
# Name: linux-agent-01
# Type: Permanent Agent
# Remote root directory: /home/jenkins/agent
# Labels: linux docker
# Launch method: Launch agent via SSH
# Host: 192.168.1.101
# Credentials: SSH key (jenkins-agent-key)
# Host Key Verification Strategy: Known hosts file verification

# Step 2: On the agent machine, install Java (required for agent.jar)
sudo apt-get install -y openjdk-11-jre-headless

# Step 3: Create the agent user and workspace directory
sudo useradd -m -s /bin/bash jenkins
sudo mkdir -p /home/jenkins/agent
sudo chown jenkins:jenkins /home/jenkins/agent

# Step 4: The controller will push agent.jar via SSH and launch it automatically
# Verify agent is connected:
# java -jar jenkins-cli.jar -s http://controller:8080/ list-nodes
# Output should show 'linux-agent-01' with status 'Online'
Output
Agent linux-agent-01 connected successfully.
Number of executors: 2
Mode: Normal
Labels: linux docker
Production Trap: Running executors on the controller
Never set the built-in node's executors to anything above 0 in production. I've seen a controller crash because a build consumed all available file handles. Set 'Number of executors' to 0 on the built-in node and use dedicated agents exclusively.

Agent Connection Methods: SSH vs JNLP vs Web Start

You have three ways to connect an agent to the controller. Each has trade-offs. SSH is the most common for permanent agents — the controller pushes the agent.jar and manages the connection. JNLP (Java Web Start) is for agents that can't accept inbound SSH connections, like Windows machines behind a firewall. Web Start is deprecated but still seen in legacy setups.

SSH is preferred because it's encrypted by default, supports key-based auth, and the controller handles reconnection automatically. JNLP requires the agent to initiate the connection, which is useful when the agent is in a different network segment. But JNLP agents need manual restart if the connection drops — they don't auto-reconnect without extra configuration.

I've seen teams use JNLP because 'it's easier' — then they wonder why agents go offline after a network blip. Use SSH unless you have a specific reason not to. If you must use JNLP, wrap the agent launch in a systemd service with Restart=always.

jnlp-agent.serviceDEVOPS
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
// io.thecodeforge — DevOps tutorial

# systemd service for JNLP agent with auto-reconnect
# Save as /etc/systemd/system/jenkins-agent.service

[Unit]
Description=Jenkins Agent (JNLP)
After=network.target

[Service]
User=jenkins
Group=jenkins
WorkingDirectory=/home/jenkins/agent
ExecStart=/usr/bin/java -jar /home/jenkins/agent/agent.jar -jnlpUrl http://controller:8080/computer/linux-agent-01/jenkins-agent.jnlp -secret @/home/jenkins/agent/secret.txt -workDir /home/jenkins/agent
Restart=always
RestartSec=10

[Install]
WantedBy=multi-user.target

# Enable and start:
# sudo systemctl daemon-reload
# sudo systemctl enable jenkins-agent
# sudo systemctl start jenkins-agent
Output
● jenkins-agent.service - Jenkins Agent (JNLP)
Loaded: loaded (/etc/systemd/system/jenkins-agent.service; enabled; vendor preset: enabled)
Active: active (running) since Mon 2024-01-15 10:23:45 UTC; 2h 30min ago
Main PID: 12345 (java)
Tasks: 15 (limit: 4915)
Memory: 256.0M
CGroup: /system.slice/jenkins-agent.service
└─12345 java -jar /home/jenkins/agent/agent.jar -jnlpUrl ...
Senior Shortcut: Use SSH agents with Docker
For ephemeral build environments, use the Docker plugin with SSH agents. Spin up a container, run the build, then destroy it. This avoids agent drift and ensures clean state every time. Set 'Disconnect after idle time' to 10 minutes to free resources.

Configuring Agent Labels for Targeted Builds

Labels are how you tell Jenkins which agent should run a specific job. Without labels, Jenkins picks any available agent — which is fine for simple setups, but dangerous when you need specific tools or environments. For example, a Docker build must run on an agent with Docker installed. A Windows build needs a Windows agent.

Labels are free-form strings. You can assign multiple labels to an agent (e.g., 'linux docker high-mem'). In your pipeline, use the label directive to constrain where the job runs. If no agent matches the label, the job waits indefinitely — which is why you should always have a fallback or timeout.

I've seen a team label their agents 'production' and 'testing', then accidentally run a destructive database migration on the production agent because the pipeline didn't specify a label. Always label explicitly, and never rely on the default 'any' agent for sensitive jobs.

JenkinsfileGROOVY
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
// io.thecodeforge — DevOps tutorial

pipeline {
    agent none  // Don't run on any agent by default
    
    stages {
        stage('Build') {
            agent { label 'linux && docker' }  // Must match both labels
            steps {
                sh 'docker build -t myapp:latest .'
            }
        }
        stage('Test') {
            agent { label 'linux && high-mem' }  // Requires high memory agent
            steps {
                sh 'mvn test -Xmx4g'
            }
        }
        stage('Deploy') {
            agent { label 'production' }  // Only runs on production-tagged agent
            steps {
                sh 'ansible-playbook deploy.yml'
            }
        }
    }
}
Output
Running on linux-agent-01 in /home/jenkins/agent/workspace/myapp
[Pipeline] stage
[Pipeline] { (Build)
[Pipeline] sh
+docker build -t myapp:latest .
...
[Pipeline] }
[Pipeline] // stage
[Pipeline] stage
[Pipeline] { (Test)
[Pipeline] sh
+mvn test -Xmx4g
...
[Pipeline] }
[Pipeline] // stage
[Pipeline] stage
[Pipeline] { (Deploy)
[Pipeline] sh
+ansible-playbook deploy.yml
...
[Pipeline] }
[Pipeline] // stage
[Pipeline] End of Pipeline
Finished: SUCCESS
Never Do This: Using 'any' agent for production deployments
If your pipeline doesn't specify a label for the deploy stage, Jenkins might run it on a random agent — possibly one without the right credentials or network access. Always pin deployment stages to a specific agent label. I've seen a deploy run on a Windows agent that didn't have SSH keys — the job failed silently and the team thought the deployment succeeded.

Scaling Agents Horizontally with Cloud Plugins

When your build demand spikes — say, during a release day — you don't want to manually spin up agents. Cloud plugins (EC2, Kubernetes, Azure VM) let Jenkins provision agents on demand. The controller detects a queued job, launches a new agent, runs the job, then terminates the agent after a timeout.

This is the gold standard for CI/CD at scale. You pay only for what you use, and you never have idle agents wasting resources. The Kubernetes plugin is especially popular: each build runs in a pod with ephemeral storage, so no workspace cleanup needed.

But cloud agents introduce latency. Spinning up a VM takes 30-60 seconds. For short jobs, that overhead might exceed the build time. Use a hybrid approach: keep a pool of warm agents (e.g., 2-3 always-on) for quick jobs, and use cloud agents for spikes.

kubernetes-agent.yamlYAML
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
// io.thecodeforge — DevOps tutorial

# Jenkins Kubernetes plugin configuration (via JCasC)
# This defines a pod template for Maven builds

apiVersion: v1
kind: Pod
spec:
  containers:
  - name: jnlp
    image: jenkins/inbound-agent:latest
    args: ['$(JENKINS_SECRET)', '$(JENKINS_NAME)']
    resources:
      requests:
        memory: "256Mi"
        cpu: "200m"
      limits:
        memory: "512Mi"
        cpu: "500m"
  - name: maven
    image: maven:3.8-openjdk-11
    command:
    - cat
    tty: true
    resources:
      requests:
        memory: "1Gi"
        cpu: "500m"
      limits:
        memory: "2Gi"
        cpu: "1"
    env:
    - name: MAVEN_OPTS
      value: "-Xmx1536m"
  nodeSelector:
    kubernetes.io/os: linux
  serviceAccountName: jenkins-agent
Output
Pod 'jenkins-agent-abc123' created.
Container 'jnlp' started.
Container 'maven' started.
Job 'myapp-build #42' running on pod 'jenkins-agent-abc123'.
Production Trap: Cloud agent startup timeout

Securing the Controller-Agent Connection

The controller-agent channel carries sensitive data: source code, credentials, deployment keys. If an attacker compromises an agent, they can exfiltrate secrets or inject malicious builds. You must secure the connection.

SSH agents use the controller's SSH key to authenticate. Protect that key with a passphrase and store it in Jenkins credentials with restricted scope. For JNLP agents, use a secret token that's unique per agent. Never reuse secrets across agents.

Beyond authentication, encrypt the traffic. SSH is encrypted by default. For JNLP, use HTTPS for the controller URL and enable TCP encryption if using the TCP agent port. Also, run agents in isolated environments — don't give them access to production networks unless necessary.

I've seen a company where an agent had access to the production database because it was on the same VLAN. A compromised build script dumped the entire user table. Isolate agents in a separate subnet with strict firewall rules.

agent-security-checklist.shDEVOPS
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
// io.thecodeforge — DevOps tutorial

# Security checklist for Jenkins agents

# 1. Use SSH keys with passphrase (not password auth)
ssh-keygen -t ed25519 -f /var/lib/jenkins/.ssh/agent-key -N "your-passphrase"

# 2. Restrict agent user to minimal commands (in ~jenkins/.ssh/authorized_keys)
command="/usr/local/bin/jenkins-agent-wrapper.sh",no-port-forwarding,no-X11-forwarding,no-agent-forwarding,no-pty ssh-ed25519 AAA...

# 3. Run agent in a container with read-only root filesystem
docker run --read-only --tmpfs /tmp --tmpfs /home/jenkins/agent ...

# 4. Use Jenkins credentials binding for secrets (never in plaintext)
withCredentials([string(credentialsId: 'deploy-key', variable: 'DEPLOY_KEY')]) {
    sh 'echo $DEPLOY_KEY | ssh-add -'
}

# 5. Enable agent-to-controller security: Manage Jenkins > Configure Global Security > Agents > TCP port for inbound agents: Fixed (50000)
# Use firewall to allow only agent IPs to port 50000
Output
Agent security configuration applied.
SSH key pair generated.
Authorized_keys restricted.
Container running with read-only filesystem.
The Classic Bug: Agent secret exposed in logs
If you use JNLP and pass the secret as a command-line argument, it appears in ps aux output. Use @secret.txt to read from a file instead. Also, never echo the secret in build logs — mask it with echo '***'.

Monitoring Agent Health and Performance

Agents die. Networks blip. Disks fill up. You need to know when an agent goes offline before a developer complains. Jenkins provides monitoring plugins (Monitoring, Metrics) that expose agent status via API and UI.

Set up alerts for agent disconnection. Use the Jenkins CLI or API to check agent status periodically. For example, a cron job that runs java -jar jenkins-cli.jar list-nodes and alerts if any agent is offline for more than 5 minutes.

Also monitor agent resource usage. A build that consumes 100% CPU for an hour might indicate an infinite loop. Use the 'Monitoring' plugin to track CPU, memory, and disk on each agent. Set thresholds and trigger notifications.

I've seen a build that wrote gigabytes of logs to the workspace, filling the agent's disk and causing all subsequent builds to fail with 'No space left on device'. Set workspace cleanup policies and disk usage alerts.

monitor-agents.shBASH
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
// io.thecodeforge — DevOps tutorial

#!/bin/bash
# Cron job to check agent status and alert if offline

CONTROLLER_URL="http://jenkins-controller:8080"
CLI_JAR="/opt/jenkins-cli.jar"
ALERT_EMAIL="devops@company.com"

# Get list of agents and their status
java -jar $CLI_JAR -s $CONTROLLER_URL list-nodes | while read line; do
    agent=$(echo $line | awk '{print $1}')
    status=$(echo $line | awk '{print $2}')
    if [ "$status" != "Online" ]; then
        echo "Agent $agent is $status" | mail -s "Jenkins Agent Offline" $ALERT_EMAIL
    fi
done

# Check disk usage on each agent via SSH (requires key-based auth)
for agent in agent1 agent2; do
    usage=$(ssh jenkins@$agent 'df -h /home/jenkins/agent | tail -1 | awk "{print \$5}"' | sed 's/%//')
    if [ "$usage" -gt 90 ]; then
        echo "Agent $agent disk usage at ${usage}%" | mail -s "Jenkins Agent Disk Full" $ALERT_EMAIL
    fi
done
Output
Agent agent1 is Online
Agent agent2 is Offline
Alert sent: Agent agent2 is Offline
Agent agent1 disk usage at 45%
Agent agent2 disk usage at 92%
Alert sent: Agent agent2 disk usage at 92%
Senior Shortcut: Use Jenkins Metrics API for Prometheus
Install the 'Metrics' plugin and expose /metrics endpoint. Configure Prometheus to scrape it, and set up Grafana dashboards for agent uptime, executor count, and queue length. This is the only way to spot trends before they become incidents.

Troubleshooting Common Agent Failures

Agents fail in predictable ways. Here are the top three I've seen in production:

  1. Connection refused: The agent machine is down, or the SSH port is blocked. Check network connectivity and firewall rules. Use telnet agent-ip 22 to test SSH.
  2. Authentication failure: The SSH key changed or the agent secret expired. Regenerate the key/secret and update the agent configuration. For SSH, verify the public key is in ~jenkins/.ssh/authorized_keys.
  3. Out of disk space: Builds accumulate workspace files. Set up a cron job to clean workspaces older than 7 days. Use the 'Workspace Cleanup Plugin' to delete workspace after each build.
  4. Java version mismatch: The agent requires Java 8 or 11, but the controller expects a different version. Check the agent's Java version with java -version. Use the same major version as the controller.

I've debugged an agent that disconnected every 30 minutes. Turned out the agent's JVM was running out of memory because the -Xmx was set too low for the agent process itself. Increased it from 64m to 256m and the disconnects stopped.

troubleshoot-agent.shBASH
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
// io.thecodeforge — DevOps tutorial

# Diagnostic script for agent issues

# 1. Check if agent process is running
ps aux | grep agent.jar

# 2. Check agent logs (usually in agent workspace)
tail -100 /home/jenkins/agent/remoting.log

# 3. Test connectivity from controller to agent
ssh -i /var/lib/jenkins/.ssh/agent-key jenkins@agent-ip 'echo OK'

# 4. Check Java version on agent
java -version 2>&1 | grep version

# 5. Check disk space on agent
ssh jenkins@agent-ip 'df -h /home/jenkins/agent'

# 6. Restart agent service (if using systemd)
sudo systemctl restart jenkins-agent
sudo systemctl status jenkins-agent

# 7. Force reconnect from controller UI
# Manage Jenkins > Nodes > [Agent] > Mark node as temporarily offline > Mark node online again
Output
jenkins 12345 0.5 2.3 /usr/bin/java -jar /home/jenkins/agent/agent.jar ...
...
INFO: Connected to controller via SSH
openjdk version "11.0.21" 2023-10-17 LTS
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 50G 45G 5G 90% /
● jenkins-agent.service - Jenkins Agent
Active: active (running) since ...
Interview Gold: What happens when an agent disconnects mid-build?
The controller marks the build as 'aborted' and frees the executor. The agent process continues running locally but becomes orphaned. The workspace is not cleaned up automatically. Use the 'Checkout into a subdirectory' option to isolate workspaces, and set up a cron job to kill orphaned processes older than 1 hour.
● Production incidentPOST-MORTEMseverity: high

The 4GB Container That Kept Dying

Symptom
A Java microservice build consistently failed with 'OutOfMemoryError: Java heap space' after 15 minutes, even though the agent had 4GB RAM.
Assumption
The team assumed the build needed more memory and tried increasing heap limits to 3GB.
Root cause
The agent was running on a container with 4GB total RAM, but the Jenkins agent process itself consumed ~1.5GB for its own overhead (JVM + workspace cache). The build job's Maven process was limited to 2GB heap, but the OS killed it when total memory exceeded the container limit. The real issue: the agent JVM had no memory limits set, so it competed with the build.
Fix
Set JVM heap limits on the agent launch command: -Xmx512m -Xms256m for the agent process. Then set build tool memory limits explicitly (e.g., MAVEN_OPTS=-Xmx2g). Also added -XX:+UseContainerSupport for JDK 10+ to respect container limits.
Key lesson
  • Always cap the agent JVM memory.
  • The agent doesn't need gigabytes — it's just a relay.
  • Starve the agent, feed the build.
Production debug guideSystematic recovery paths for the failure modes engineers actually hit.3 entries
Symptom · 01
Agent shows 'Offline' in UI, but machine is running
Fix
1. SSH to agent and check agent process: ps aux | grep agent.jar. 2. If not running, restart agent service: sudo systemctl restart jenkins-agent. 3. Check agent logs: tail -100 /home/jenkins/agent/remoting.log. 4. Verify network connectivity from controller to agent: ssh -i /var/lib/jenkins/.ssh/agent-key jenkins@agent-ip 'echo OK'. 5. If SSH fails, check firewall rules and SSH key permissions.
Symptom · 02
Builds stuck in queue, no agent available
Fix
1. Check agent labels: java -jar jenkins-cli.jar -s http://controller:8080/ list-nodes. 2. Ensure at least one agent has the required label. 3. If using cloud agents, check cloud plugin logs for provisioning errors. 4. Increase agent count or add more executors. 5. As a temporary workaround, add a label to an existing agent that matches the job requirement.
Symptom · 03
Agent disconnects every few minutes
Fix
1. Check agent JVM memory: ps aux | grep agent.jar and look for -Xmx. 2. Increase agent heap: add -Xmx512m to launch command. 3. Check network stability: ping controller-ip for packet loss. 4. If using JNLP, wrap agent in systemd with Restart=always. 5. Enable remoting logging: add -Djava.util.logging.config.file=/path/to/logging.properties to agent JVM args.
Feature / AspectSSH AgentJNLP Agent
Connection directionController initiatesAgent initiates
EncryptionSSH (built-in)HTTPS (if controller uses HTTPS)
Auto-reconnectYes (controller retries)No (agent must restart)
Firewall friendlyRequires inbound SSH (port 22)Works with outbound-only (port 443)
Secret managementSSH keySecret token (file or arg)
Best forPermanent agents on trusted networkEphemeral agents behind NAT/firewall

Key takeaways

1
Always set the built-in node's executors to 0
the controller should never run builds.
2
Cap agent JVM memory aggressively (256MB-512MB)
the agent is a relay, not a build tool.
3
Use labels to pin jobs to specific environments
never rely on the default 'any' agent for production deployments.
4
Cloud agents are great for spikes, but keep a pool of warm agents for quick jobs to avoid startup latency.
INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

FAQ · 4 QUESTIONS

Frequently Asked Questions

01
What is the difference between Jenkins controller and agent?
02
How do I add a new agent to Jenkins?
03
Why is my Jenkins agent showing as offline?
04
Can I run Jenkins without agents?
N
Naren Founder & Principal Engineer

20+ years shipping production infrastructure and CI/CD at scale. Written from production experience, not tutorials.

Follow
Verified
production tested
June 21, 2026
last updated
1,577
articles · all by Naren
🔥

That's Jenkins. Mark it forged?

6 min read · try the examples if you haven't

Previous
Jenkins Installation and Setup
3 / 23 · Jenkins
Next
Jenkins Freestyle Jobs