Advanced 5 min · June 21, 2026

Jenkins Backup and Disaster Recovery: The Only Guide That Won't Fail You at 3 AM

Jenkins backup and disaster recovery: stop losing jobs and configs.

N
Naren Founder & Principal Engineer

20+ years shipping production infrastructure and CI/CD at scale. Lessons pulled from things that broke in production.

Follow
Production
production tested
June 21, 2026
last updated
1,577
articles · all by Naren
 ● Production Incident 🔎 Debug Guide
Quick Answer

Back up your entire JENKINS_HOME directory plus the XML config files for jobs, nodes, and credentials. Use the ThinBackup plugin for scheduled backups to a remote filesystem or S3. For disaster recovery, restore JENKINS_HOME on a fresh Jenkins instance with matching plugin versions, then reload configuration from disk.

✦ Definition~90s read
What is Jenkins Backup and Disaster Recovery?

Jenkins backup and disaster recovery is the process of preserving your Jenkins master's configuration, job definitions, build history, and credentials so you can restore a working instance after hardware failure, data corruption, or accidental deletion. It's not just a cron job — it's a strategy that accounts for incremental vs. full backups, plugin compatibility, and restore order.

Think of Jenkins as a chef's kitchen.
Plain-English First

Think of Jenkins as a chef's kitchen. Your JENKINS_HOME is the pantry, fridge, and recipe book all in one. If the kitchen burns down, you need a copy of the recipe book (job configs), the pantry inventory (plugin list), and the chef's notes (build history). Backing up only the recipes is useless if you don't know which spices you had. A good backup strategy is like having a second kitchen ready to go — same layout, same ingredients, same recipes.

Your Jenkins master just died. The disk is corrupted. The backup you thought was running? It wasn't. Now you're rebuilding 400 jobs from memory, and the CTO is asking why the deployment pipeline is down. I've seen this exact scene play out at three different companies. The problem isn't that Jenkins is hard to back up — it's that everyone assumes the default setup is enough. It's not.

Jenkins stores everything in JENKINS_HOME: job configs, plugin binaries, credentials, build logs, and the massive XML files that define your entire CI/CD. Lose that directory, and you lose weeks of work. The standard advice — 'just copy the folder' — misses the critical details: file locks, incremental changes, and the order of restoration that prevents plugin version mismatches from breaking your pipelines.

By the end of this guide, you'll have a battle-tested backup and disaster recovery plan that handles partial failures, scales to hundreds of jobs, and can be restored in under 30 minutes. You'll know exactly which files to back up, which to exclude, and how to test your recovery without causing a production incident.

What Exactly Lives in JENKINS_HOME — and What You Actually Need

JENKINS_HOME is the single source of truth for your Jenkins instance. It's a directory (default ~/.jenkins) containing everything: job configs as XML, plugin binaries (.jpi files), credentials encrypted with a master key, build logs, and the massive config.xml that defines system settings. The rookie mistake is backing up the whole directory blindly. That includes caches, workspace files, and temp data that bloat the backup and slow restoration.

Here's what you must back up: jobs/ (each subfolder's config.xml), plugins/ (the .jpi files and plugins.xml), secrets/ (credentials and master key), users/ (user configs), nodes/ (agent definitions), and the root config.xml. What you can skip: workspace/ (checked out source code — rebuild from SCM), builds/ (if you don't need history — but most teams do), and logs/ (regeneratable).

Production trap: If you use the Jenkins database plugin for external job storage, back up that database separately. JENKINS_HOME won't have the job data.

jenkins_backup_critical_files.shBASH
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
#!/bin/bash
# io.thecodeforge — DevOps tutorial
# Backup only critical JENKINS_HOME files — exclude workspace and caches

JENKINS_HOME="/var/lib/jenkins"
BACKUP_DIR="/backup/jenkins/$(date +%Y%m%d_%H%M%S)"
mkdir -p "$BACKUP_DIR"

# Critical directories — these define your instance
for dir in jobs plugins secrets users nodes; do
  rsync -a --delete "$JENKINS_HOME/$dir" "$BACKUP_DIR/"
done

# Root config files
cp "$JENKINS_HOME/config.xml" "$BACKUP_DIR/"
cp "$JENKINS_HOME/plugins.xml" "$BACKUP_DIR/"  # plugin list for version matching
cp "$JENKINS_HOME/secret.key" "$BACKUP_DIR/"    # master encryption key
cp "$JENKINS_HOME/identity.key" "$BACKUP_DIR/"  # instance identity

# Build records — optional but often needed
# rsync -a --delete "$JENKINS_HOME/jobs/*/builds" "$BACKUP_DIR/jobs/"

echo "Backup completed to $BACKUP_DIR"
Output
Backup completed to /backup/jenkins/20250315_143022
Never Do This:
Don't back up workspace/ — it contains SCM checkouts that can be hundreds of GB. Restoring them is pointless because Jenkins will re-checkout on next build. You'll waste storage and restore time.

ThinBackup Plugin: The Only Backup Plugin You Should Trust

The ThinBackup plugin is the de facto standard for Jenkins backups. It's lightweight, supports incremental backups, and can push to remote locations via SCP or S3. But it has sharp edges. The default configuration backs up everything — including workspaces — unless you explicitly exclude them. I've seen a 10GB backup balloon to 200GB because someone forgot to add workspace exclusion.

Here's the configuration that works in production: set backup directory to an external mount (NFS or EFS). Enable 'Backup build records' if you need history. Exclude 'workspace' and 'builds' if you're short on space. Set a cron schedule — I recommend daily full backups with hourly incremental if your jobs change frequently. The plugin uses a simple file copy, so it's safe to run while Jenkins is live, but avoid running during peak build hours to prevent I/O contention.

Production gotcha: ThinBackup doesn't handle file locks well. If a job is running when the backup starts, its config.xml might be in an inconsistent state. Solution: use the 'Pause Jenkins before backup' option — it drains executors and quiesces the master before copying.

thinbackup_config.xmlXML
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
<!-- io.thecodeforge — DevOps tutorial -->
<!-- ThinBackup plugin configuration snippet for Jenkins global config -->

<backupConfig>
  <backupDir>/mnt/nfs/jenkins-backups</backupDir>  <!-- External mount, not local -->
  <fullBackupSchedule>H 2 * * *</fullBackupSchedule>  <!-- Daily full backup at 2 AM -->
  <diffBackupSchedule>H * * * *</diffBackupSchedule>  <!-- Hourly incremental -->
  <backupBuildResults>true</backupBuildResults>  <!-- Include build records -->
  <backupPluginArchives>true</backupPluginArchives>  <!-- Include .jpi files -->
  <excludePatterns>
    <pattern>workspace/**</pattern>  <!-- Exclude SCM workspaces -->
    <pattern>builds/**</pattern>     <!-- Exclude build logs if not needed -->
  </excludePatterns>
  <pauseJenkinsBeforeBackup>true</pauseJenkinsBeforeBackup>  <!-- Critical: quiesce first -->
  <maxNumberOfBackups>14</maxNumberOfBackups>  <!-- Keep 14 full backups -->
</backupConfig>
Output
No direct output — config is applied via Jenkins UI or XML file upload.
Senior Shortcut:
Test your backup by restoring it to a staging Jenkins instance. Don't wait for a disaster. Use the 'Restore' button in ThinBackup — it will ask you to restart Jenkins after restore. Always verify that all jobs appear and credentials work.

S3 and Cloud Storage: Offsite Backups That Survive a Datacenter Fire

Local backups are useless if the server room floods. You need offsite backups. The ThinBackup plugin supports S3 via the 'S3 Backup' plugin or you can use a cron job with aws s3 sync. I prefer the latter because it gives you full control over the sync logic and doesn't depend on Jenkins being up.

Here's the pattern: use ThinBackup for local incremental backups, then a separate cron job that syncs the backup directory to S3 with versioning enabled. This gives you two layers: fast local restore for small failures, and durable remote backup for catastrophic events. Set S3 lifecycle rules to move old backups to Glacier after 30 days and delete after 90.

Production trap: S3 eventual consistency can cause issues if you're reading a backup while it's being written. Always use S3 Object Lock or versioning to prevent overwrites. And never use the same bucket for multiple Jenkins instances — you'll get cross-contamination.

sync_to_s3.shBASH
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
#!/bin/bash
# io.thecodeforge — DevOps tutorial
# Sync local ThinBackup directory to S3 with versioning

LOCAL_BACKUP_DIR="/mnt/nfs/jenkins-backups"
S3_BUCKET="s3://my-jenkins-backups/production/"

# Sync with delete flag to mirror local state
aws s3 sync "$LOCAL_BACKUP_DIR" "$S3_BUCKET" \
  --delete \
  --storage-class STANDARD_IA \
  --exclude "*.tmp" \
  --exclude "workspace/*"

# Check exit code
if [ $? -eq 0 ]; then
  echo "$(date): Sync to S3 completed successfully"
else
  echo "$(date): Sync failed — check AWS credentials and permissions"
  exit 1
fi
Output
Tue Mar 15 02:00:01 UTC 2025: Sync to S3 completed successfully
Production Trap:
S3 sync with --delete will remove files from S3 that no longer exist locally. If your local backup directory gets corrupted or accidentally deleted, the next sync will delete your remote backups too. Mitigation: enable S3 versioning and set a lifecycle rule to keep deleted markers for 30 days.

Disaster Recovery: The Exact Steps to Restore a Dead Jenkins Master

When disaster strikes, panic is your enemy. Follow this exact sequence. First, provision a new Jenkins server with the same OS and Jenkins version as the original. Install the same plugins — use the plugins.xml from your backup to get the exact versions. Do NOT start Jenkins yet. Copy your backup JENKINS_HOME over the fresh installation. Ensure file permissions match (jenkins user). Start Jenkins. It will detect the existing config and skip the setup wizard.

If you get errors about plugin versions, you likely restored plugins that don't match the Jenkins core version. Fix: delete the plugins directory and let Jenkins re-download them from the update center — but this may change behavior. Better: maintain a plugin version manifest and pin versions in your backup.

Production gotcha: The secret.key file must match the original master. If you lose it, all credentials become undecryptable. Store a copy of secret.key in a secure vault (e.g., HashiCorp Vault) separate from the backup. If you don't have it, you'll need to manually re-enter all credentials — a nightmare for 50+ jobs.

restore_jenkins.shBASH
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
#!/bin/bash
# io.thecodeforge — DevOps tutorial
# Restore Jenkins from backup — run on fresh Jenkins server before first start

JENKINS_HOME="/var/lib/jenkins"
BACKUP_PATH="/backup/jenkins/20250315_143022"

# Stop Jenkins if running
systemctl stop jenkins

# Backup the fresh JENKINS_HOME (just in case)
mv "$JENKINS_HOME" "${JENKINS_HOME}.orig"

# Restore from backup
cp -a "$BACKUP_PATH" "$JENKINS_HOME"

# Fix permissions
chown -R jenkins:jenkins "$JENKINS_HOME"

# Start Jenkins
systemctl start jenkins

# Wait for startup
echo "Waiting for Jenkins to start..."
sleep 30
curl -s http://localhost:8080/login | grep -q "Jenkins" && echo "Restore successful" || echo "Check logs at /var/log/jenkins/jenkins.log"
Output
Waiting for Jenkins to start...
Restore successful
The Classic Bug:

Incremental vs Full Backups: When to Use Each

Full backups copy everything every time. They're simple but wasteful — most of JENKINS_HOME doesn't change daily. Incremental backups only copy changed files, saving time and storage. ThinBackup supports differential backups (changes since last full) — I recommend this for daily use.

But here's the catch: incremental backups are only as good as the last full backup. If your full backup is corrupted, all incrementals are useless. Always run a weekly full backup and verify its integrity. I've seen teams lose weeks of job configs because their monthly full backup failed silently and they only noticed when the server died.

Production rule: Full backup weekly, incremental daily. Store full backups for 4 weeks, incrementals for 2 weeks. Test restore from a full+incremental chain quarterly.

verify_backup_integrity.shBASH
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
#!/bin/bash
# io.thecodeforge — DevOps tutorial
# Verify backup integrity by checking file counts and sizes

BACKUP_DIR="/backup/jenkins/20250315_143022"

# Check critical files exist
for file in config.xml plugins.xml secret.key; do
  if [ ! -f "$BACKUP_DIR/$file" ]; then
    echo "ERROR: Missing $file"
    exit 1
  fi
done

# Count job configs
JOB_COUNT=$(find "$BACKUP_DIR/jobs" -name "config.xml" | wc -l)
echo "Job configs: $JOB_COUNT"

# Check backup size
SIZE=$(du -sh "$BACKUP_DIR" | cut -f1)
echo "Backup size: $SIZE"

# Verify XML validity (quick check)
for xml in $(find "$BACKUP_DIR" -name "*.xml"); do
  xmllint --noout "$xml" 2>/dev/null || echo "WARNING: Invalid XML in $xml"
done

echo "Backup integrity check passed"
Output
Job configs: 127
Backup size: 2.3G
Backup integrity check passed
Senior Shortcut:
Automate backup verification with a cron job that runs after each backup. If the check fails, send an alert to PagerDuty. Don't wait for a restore to discover your backup is broken.

When Not to Use ThinBackup — and What to Use Instead

ThinBackup is great for small to medium instances (< 500 jobs). But for large-scale Jenkins deployments with thousands of jobs or heavy build history, it becomes slow and resource-intensive. The backup process can take hours, and the restore process is all-or-nothing — you can't selectively restore a single job.

Alternatives: Use filesystem snapshots (LVM or ZFS) for near-instant backups. They're atomic and don't require Jenkins to pause. For cloud instances, use EBS snapshots. For job-level granularity, use the Jenkins Job Export plugin or store job configs in version control (Jenkins Job DSL). I've seen teams manage 2000+ jobs with Job DSL and never need a full backup — they just regenerate everything from code.

When to avoid ThinBackup: if your backup window exceeds 30 minutes, if you need per-job restore, or if you're running Jenkins in a container with ephemeral storage. In those cases, invest in infrastructure-as-code for Jenkins config.

lvm_snapshot_backup.shBASH
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
#!/bin/bash
# io.thecodeforge — DevOps tutorial
# LVM snapshot backup for Jenkins — atomic and fast

JENKINS_LV="/dev/vg_jenkins/lv_jenkins_home"
SNAPSHOT_NAME="jenkins_snap_$(date +%Y%m%d_%H%M%S)"
SNAPSHOT_SIZE="10G"  # Adjust based on change rate

# Create snapshot (takes seconds)
lvcreate -L $SNAPSHOT_SIZE -s -n $SNAPSHOT_NAME $JENKINS_LV

# Mount snapshot read-only
mkdir -p /mnt/jenkins_snap
mount -o ro /dev/vg_jenkins/$SNAPSHOT_NAME /mnt/jenkins_snap

# Backup from snapshot
rsync -a /mnt/jenkins_snap/ /backup/jenkins/$(date +%Y%m%d)/

# Cleanup
umount /mnt/jenkins_snap
lvremove -f /dev/vg_jenkins/$SNAPSHOT_NAME

echo "Snapshot backup completed"
Output
Snapshot backup completed
When to Use This:
Use LVM snapshots when your backup window is tight (< 5 minutes) and you need crash-consistency. Works best with Jenkins paused briefly during snapshot creation (seconds).
● Production incidentPOST-MORTEMseverity: high

The 4GB Container That Kept Dying

Symptom
Jenkins master container OOM-killed every 48 hours. Restart fixed it temporarily. No obvious memory leak in jobs.
Assumption
Assumed a plugin memory leak. Upgraded all plugins — no change.
Root cause
The ThinBackup plugin was configured to keep 30 daily backups inside the container. Each backup was ~150MB. After 30 days, the container's 4GB memory limit was exceeded because the backup process loaded the entire JENKINS_HOME into memory before writing. The OOM killer struck during the backup window.
Fix
Changed backup retention to 7 days. Moved backup target to an external NFS mount. Added a cron job to rsync backups off-container. Set Jenkins memory limit to 8GB with -Xmx4g.
Key lesson
  • Never store backups inside the Jenkins container.
  • Always back up to external storage, and set retention to match your recovery time objective, not your disk space.
Production debug guideSystematic recovery paths for the failure modes engineers actually hit.3 entries
Symptom · 01
Jenkins fails to start after restore: 'java.lang.ClassNotFoundException: hudson.plugins.git.GitSCM'
Fix
1. Check that the Git plugin is installed. 2. Restore plugins/ directory from backup. 3. If missing, install the same version from update center. 4. Restart Jenkins.
Symptom · 02
Backup process hangs or times out
Fix
1. Check disk I/O on JENKINS_HOME. 2. Exclude workspace/ and builds/ from backup. 3. Reduce backup retention to 7 days. 4. Move backup target to faster storage (SSD or NFS).
Symptom · 03
S3 sync fails with 'Access Denied'
Fix
1. Verify IAM role has s3:PutObject and s3:ListBucket permissions. 2. Check bucket policy doesn't deny access. 3. Ensure the instance profile is attached to the EC2 instance. 4. Test with aws s3 ls s3://bucket-name.
Feature / AspectThinBackup PluginFilesystem Snapshot (LVM/ZFS)
Backup speedMinutes to hours (file copy)Seconds (snapshot creation)
Restore granularityAll or nothingAll or nothing (but can mount snapshot)
Jenkins downtime requiredOptional pause (recommended)Pause for snapshot only (seconds)
Storage efficiencyIncremental supportFull snapshot each time (use with rsync)
Best forSmall to medium instances (<500 jobs)Large instances or tight backup windows
CostFree pluginRequires LVM/ZFS setup

Key takeaways

1
Back up JENKINS_HOME minus workspace and caches
use ThinBackup or LVM snapshots for consistency.
2
Always store secret.key separately
without it, credentials are unrecoverable.
3
Test your restore process quarterly on a staging instance
a backup you can't restore is worthless.
4
For large instances, skip file-level backup and use infrastructure-as-code with Job DSL instead.
INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

FAQ · 4 QUESTIONS

Frequently Asked Questions

01
How do I back up Jenkins jobs individually?
02
What's the difference between ThinBackup and the S3 Backup plugin?
03
How do I restore a single job from a full backup?
04
What happens if I lose the secret.key file?
N
Naren Founder & Principal Engineer

20+ years shipping production infrastructure and CI/CD at scale. Lessons pulled from things that broke in production.

Follow
Verified
production tested
June 21, 2026
last updated
1,577
articles · all by Naren
🔥

That's Jenkins. Mark it forged?

5 min read · try the examples if you haven't

Previous
Jenkins Monitoring with Prometheus
22 / 23 · Jenkins
Next
Jenkins Pipeline Best Practices