Jenkins Backup and Disaster Recovery: The Only Guide That Won't Fail You at 3 AM
Jenkins backup and disaster recovery: stop losing jobs and configs.
20+ years shipping production infrastructure and CI/CD at scale. Lessons pulled from things that broke in production.
Back up your entire JENKINS_HOME directory plus the XML config files for jobs, nodes, and credentials. Use the ThinBackup plugin for scheduled backups to a remote filesystem or S3. For disaster recovery, restore JENKINS_HOME on a fresh Jenkins instance with matching plugin versions, then reload configuration from disk.
Think of Jenkins as a chef's kitchen. Your JENKINS_HOME is the pantry, fridge, and recipe book all in one. If the kitchen burns down, you need a copy of the recipe book (job configs), the pantry inventory (plugin list), and the chef's notes (build history). Backing up only the recipes is useless if you don't know which spices you had. A good backup strategy is like having a second kitchen ready to go — same layout, same ingredients, same recipes.
Your Jenkins master just died. The disk is corrupted. The backup you thought was running? It wasn't. Now you're rebuilding 400 jobs from memory, and the CTO is asking why the deployment pipeline is down. I've seen this exact scene play out at three different companies. The problem isn't that Jenkins is hard to back up — it's that everyone assumes the default setup is enough. It's not.
Jenkins stores everything in JENKINS_HOME: job configs, plugin binaries, credentials, build logs, and the massive XML files that define your entire CI/CD. Lose that directory, and you lose weeks of work. The standard advice — 'just copy the folder' — misses the critical details: file locks, incremental changes, and the order of restoration that prevents plugin version mismatches from breaking your pipelines.
By the end of this guide, you'll have a battle-tested backup and disaster recovery plan that handles partial failures, scales to hundreds of jobs, and can be restored in under 30 minutes. You'll know exactly which files to back up, which to exclude, and how to test your recovery without causing a production incident.
What Exactly Lives in JENKINS_HOME — and What You Actually Need
JENKINS_HOME is the single source of truth for your Jenkins instance. It's a directory (default ~/.jenkins) containing everything: job configs as XML, plugin binaries (.jpi files), credentials encrypted with a master key, build logs, and the massive config.xml that defines system settings. The rookie mistake is backing up the whole directory blindly. That includes caches, workspace files, and temp data that bloat the backup and slow restoration.
Here's what you must back up: jobs/ (each subfolder's config.xml), plugins/ (the .jpi files and plugins.xml), secrets/ (credentials and master key), users/ (user configs), nodes/ (agent definitions), and the root config.xml. What you can skip: workspace/ (checked out source code — rebuild from SCM), builds/ (if you don't need history — but most teams do), and logs/ (regeneratable).
Production trap: If you use the Jenkins database plugin for external job storage, back up that database separately. JENKINS_HOME won't have the job data.
ThinBackup Plugin: The Only Backup Plugin You Should Trust
The ThinBackup plugin is the de facto standard for Jenkins backups. It's lightweight, supports incremental backups, and can push to remote locations via SCP or S3. But it has sharp edges. The default configuration backs up everything — including workspaces — unless you explicitly exclude them. I've seen a 10GB backup balloon to 200GB because someone forgot to add workspace exclusion.
Here's the configuration that works in production: set backup directory to an external mount (NFS or EFS). Enable 'Backup build records' if you need history. Exclude 'workspace' and 'builds' if you're short on space. Set a cron schedule — I recommend daily full backups with hourly incremental if your jobs change frequently. The plugin uses a simple file copy, so it's safe to run while Jenkins is live, but avoid running during peak build hours to prevent I/O contention.
Production gotcha: ThinBackup doesn't handle file locks well. If a job is running when the backup starts, its config.xml might be in an inconsistent state. Solution: use the 'Pause Jenkins before backup' option — it drains executors and quiesces the master before copying.
S3 and Cloud Storage: Offsite Backups That Survive a Datacenter Fire
Local backups are useless if the server room floods. You need offsite backups. The ThinBackup plugin supports S3 via the 'S3 Backup' plugin or you can use a cron job with aws s3 sync. I prefer the latter because it gives you full control over the sync logic and doesn't depend on Jenkins being up.
Here's the pattern: use ThinBackup for local incremental backups, then a separate cron job that syncs the backup directory to S3 with versioning enabled. This gives you two layers: fast local restore for small failures, and durable remote backup for catastrophic events. Set S3 lifecycle rules to move old backups to Glacier after 30 days and delete after 90.
Production trap: S3 eventual consistency can cause issues if you're reading a backup while it's being written. Always use S3 Object Lock or versioning to prevent overwrites. And never use the same bucket for multiple Jenkins instances — you'll get cross-contamination.
Disaster Recovery: The Exact Steps to Restore a Dead Jenkins Master
When disaster strikes, panic is your enemy. Follow this exact sequence. First, provision a new Jenkins server with the same OS and Jenkins version as the original. Install the same plugins — use the plugins.xml from your backup to get the exact versions. Do NOT start Jenkins yet. Copy your backup JENKINS_HOME over the fresh installation. Ensure file permissions match (jenkins user). Start Jenkins. It will detect the existing config and skip the setup wizard.
If you get errors about plugin versions, you likely restored plugins that don't match the Jenkins core version. Fix: delete the plugins directory and let Jenkins re-download them from the update center — but this may change behavior. Better: maintain a plugin version manifest and pin versions in your backup.
Production gotcha: The secret.key file must match the original master. If you lose it, all credentials become undecryptable. Store a copy of secret.key in a secure vault (e.g., HashiCorp Vault) separate from the backup. If you don't have it, you'll need to manually re-enter all credentials — a nightmare for 50+ jobs.
Incremental vs Full Backups: When to Use Each
Full backups copy everything every time. They're simple but wasteful — most of JENKINS_HOME doesn't change daily. Incremental backups only copy changed files, saving time and storage. ThinBackup supports differential backups (changes since last full) — I recommend this for daily use.
But here's the catch: incremental backups are only as good as the last full backup. If your full backup is corrupted, all incrementals are useless. Always run a weekly full backup and verify its integrity. I've seen teams lose weeks of job configs because their monthly full backup failed silently and they only noticed when the server died.
Production rule: Full backup weekly, incremental daily. Store full backups for 4 weeks, incrementals for 2 weeks. Test restore from a full+incremental chain quarterly.
When Not to Use ThinBackup — and What to Use Instead
ThinBackup is great for small to medium instances (< 500 jobs). But for large-scale Jenkins deployments with thousands of jobs or heavy build history, it becomes slow and resource-intensive. The backup process can take hours, and the restore process is all-or-nothing — you can't selectively restore a single job.
Alternatives: Use filesystem snapshots (LVM or ZFS) for near-instant backups. They're atomic and don't require Jenkins to pause. For cloud instances, use EBS snapshots. For job-level granularity, use the Jenkins Job Export plugin or store job configs in version control (Jenkins Job DSL). I've seen teams manage 2000+ jobs with Job DSL and never need a full backup — they just regenerate everything from code.
When to avoid ThinBackup: if your backup window exceeds 30 minutes, if you need per-job restore, or if you're running Jenkins in a container with ephemeral storage. In those cases, invest in infrastructure-as-code for Jenkins config.
The 4GB Container That Kept Dying
- Never store backups inside the Jenkins container.
- Always back up to external storage, and set retention to match your recovery time objective, not your disk space.
Key takeaways
Interview Questions on This Topic
Frequently Asked Questions
20+ years shipping production infrastructure and CI/CD at scale. Lessons pulled from things that broke in production.
That's Jenkins. Mark it forged?
5 min read · try the examples if you haven't