Intermediate 10 min · 2026-06-21

Ansible Error Handling: Production Patterns from a 3AM PagerDuty

Q: Can I use ignore_errors and failed_when together?

Yes, but ignore_errors only allows the play to continue; the task still shows as failed. Use register and when for conditional logic instead.

Q: Does rescue block catch all types of failures?

No, rescue only catches task execution failures. Failures due to unreachable hosts, privilege escalation, or syntax errors are not caught.

Q: What happens if a rescue task fails?

The play will fail, potentially masking the original error. Add ignore_errors: yes to non-critical rescue tasks.

Q: How do I skip a host on failure but continue with others?

Use block/rescue/always. In rescue, use meta: end_host to end the host's tasks, or simply skip further tasks. The play will continue with other hosts.

Q: What is the difference between any_errors_fatal and max_fail_percentage: 0?

any_errors_fatal stops the play immediately on any failure, regardless of percentage. max_fail_percentage: 0 also stops on any failure, but it's calculated per batch; if you don't use serial, it behaves similarly.

Q: Can I use changed_when with block/rescue?

changed_when is a task-level directive; it can be used inside block, but not on the block itself. The block's changed status is derived from its tasks.

Q: How do I re-raise a failure after cleanup in rescue?

Use ansible.builtin.fail module at the end of the rescue block with a message. This will cause the play to fail after cleanup.

Q: Is there a way to globally set error handling for all tasks?

No, error handling is per-task or per-play. You can use a custom plugin or include tasks with common error handling, but there's no global directive.

Master Ansible error handling with ignore_errors, failed_when, block/rescue/always, any_errors_fatal, and max_fail_percentage.

Naren Founder & Principal Engineer

20+ years shipping production infrastructure and CI/CD at scale. Notes here come from systems that actually shipped.

✓ Production

production tested

June 21, 2026

last updated

1,596

articles · all by Naren

● Production Incident 🔎 Debug Guide ⚙ Triage Commands

⚡Quick Answer

Use ignore_errors: yes only when you truly don't care about a task's exit status; it still marks the task as 'failed' in output. Override task failure with failed_when to define custom failure conditions (e.g., failed_when: result.rc != 0 or 'ERROR' in result.stderr). Use changed_when to prevent false 'changed' status (e.g., changed_when: result.rc == 0 for a script that always returns 0). Wrap critical sequences in block/rescue/always for try-catch-finally behavior; rescue runs on failure, always runs regardless. Set any_errors_fatal: true on a play to stop execution immediately when any task fails in that play. Use max_fail_percentage in rolling updates to abort if more than N% of hosts fail; e.g., serial: 5, max_fail_percentage: 20. In rescue blocks, use ansible.builtin.include_tasks to run cleanup or notify handlers. Always test error handling paths in CI; a misconfigured failed_when can silently swallow real failures.

✦ Definition~90s read

What is Ansible Error Handling?

Ansible error handling refers to the mechanisms that control how Ansible responds when a task fails or returns an unexpected status. By default, Ansible stops executing tasks on a host if any task fails (the any_errors_fatal behavior at the play level is off, but per-host it stops).

★

Imagine you're a chef cooking a complex multi-course meal.

However, production playbooks need fine-grained control: you might want to ignore certain failures, define custom failure conditions based on command output, or override the 'changed' status. The core directives are ignore_errors, failed_when, changed_when, and the block/rescue/always pattern.

Additionally, any_errors_fatal and max_fail_percentage control play-level failure propagation. These tools allow you to build resilient automation that handles edge cases gracefully, without masking real issues.

Plain-English First

Imagine you're a chef cooking a complex multi-course meal. Your recipe (the Ansible playbook) has steps like 'chop onions' and 'sear steak.' If you burn the onions (a task fails), you have a few options: you can ignore it and move on (ignore_errors), or you can decide that burnt onions are actually a failure only if they're black (failed_when). You might also want to know if the steak is 'changed' only when it's actually cooked differently (changed_when). For risky sequences, like reducing a sauce, you might use a 'try-catch' approach: try the reduction, if it fails, rescue by adding a thickener, and always clean up the pan (block/rescue/always). In a busy kitchen, if one station fails, you might want to stop the whole service (any_errors_fatal) or only if too many stations fail (max_fail_percentage). This article teaches you these patterns so your automated kitchen runs smoothly.

It was 3 AM, and my phone was buzzing with PagerDuty alerts. Our Ansible-driven deployment had taken down half the production fleet. The playbook had a task that checked for a lock file; if it existed, the task failed, and Ansible stopped the entire play. The problem? A stale lock file from a previous deployment that should have been ignored. We had ignore_errors set, but a junior engineer had commented it out during a code review, thinking it was dead code. The result: every host that had that lock file failed, and our rolling update aborted after the first batch. We lost 30 minutes of uptime. That night, I learned that error handling in Ansible isn't just about preventing failures—it's about defining what failure means for your system.

ignore_errors: When to Use and When to Avoid

The ignore_errors directive tells Ansible to continue executing tasks on a host even if the current task fails. It's a blunt instrument. Use it for non-critical checks, like verifying a service is running where you have a fallback. Never use it to hide real failures—it masks the error and still marks the task as 'failed' in output (with ...ignoring). A better pattern is to use failed_when with a condition that never matches, but that's an anti-pattern. Real example: checking for a lock file before deployment:

``yaml - name: Check for deployment lock file ansible.builtin.stat: path: /var/lock/deploy.lock register: lock_check ignore_errors: yes ``

If the lock file exists, this task fails (if the stat module fails on permission? Actually stat doesn't fail on missing file, it returns exists: false. So ignore_errors is redundant here. Better: use failed_when: false but that's weird. The point: only use ignore_errors when the task's failure is acceptable and you have subsequent logic to handle it. In production, we once had a task that stopped a service that might already be stopped; we used ignore_errors. But then a real failure (e.g., service not found) was ignored, causing a cascading issue. We switched to failed_when: result.rc != 0 and 'not running' not in result.stderr.

ignore_errors does not suppress error output

The task will still show as 'failed' in output, just with 'ignoring' appended. It can confuse operators. Use sparingly.

Production Insight

We had a task that checked if a mount point existed using mount command. It failed when the mount wasn't present. We used ignore_errors, but then the mount actually failed due to a bad filesystem, and we didn't notice. We changed to register the output and use failed_when.

Key Takeaway

Use ignore_errors only for truly optional checks; prefer failed_when for granular control.

failed_when: Defining Custom Failure Conditions

failed_when overrides Ansible's default failure detection. You provide a Jinja2 expression that evaluates to true when the task should be considered failed. This is essential for commands that return non-zero on success (e.g., grep returning 1 for no match) or for complex checks based on stdout/stderr. Syntax:

``yaml - name: Run custom script ansible.builtin.shell: /usr/local/bin/check_health.sh register: health_result failed_when: health_result.rc != 0 or 'CRITICAL' in health_result.stdout ``

Common gotcha: failed_when is evaluated after the task runs. If the task fails before running (e.g., invalid parameters), failed_when is not evaluated. Also, failed_when and ignore_errors interact: if both are set, ignore_errors takes precedence, but the task is still marked failed if failed_when is true, then ignored. To truly override, set failed_when: false (though that's odd). Production tip: always test your failed_when condition with a known failure case. We once had failed_when: result.rc == 1 but the command returned 2 for a different error; we missed a failure.

failed_when with multiple conditions

Use parentheses and and/or to combine conditions. Example: failed_when: (result.rc != 0) or ('ERROR' in result.stderr).

Production Insight

During a database migration, a script returned exit code 0 but printed 'FAIL' to stdout. Our failed_when caught it with 'FAIL' in result.stdout. Saved us from a corrupted database.

Key Takeaway

failed_when is your scalpel for defining exactly what constitutes a failure; always test both success and failure paths.

changed_when: Preventing False Changes

changed_when controls whether a task reports 'changed' or 'ok'. By default, Ansible considers a task 'changed' if it modifies state (e.g., file module, command module if not idempotent). For commands that always return 'changed' (like shell with a script that always reports success), you can override:

``yaml - name: Run idempotent script ansible.builtin.shell: /usr/local/bin/update_cache.sh register: cache_update changed_when: cache_update.rc == 0 and 'updated' in cache_update.stdout ``

If you want a task to never report changed, use changed_when: false. This is common for read-only checks. However, be careful: if a task that should change things never reports changed, you lose audit trail. In production, we had a task that restarted a service only if a config file changed; we used changed_when: config_changed where config_changed was a registered variable. This gave accurate change tracking.

changed_when and handlers

Handlers are notified only if a task reports 'changed'. If you override changed_when to false, handlers won't fire. Use a conditional notify instead.

Production Insight

We had a task that ran a script to sync users. The script always returned 'changed' because it logged something. We added changed_when: false and then used a separate task to detect actual changes. This reduced noise in our deployment logs.

Key Takeaway

Use changed_when to align Ansible's change detection with your actual state changes; avoid false positives that trigger unnecessary handlers.

block/rescue/always: The Try-Catch-Finally of Ansible

The block/rescue/always pattern provides structured error handling for a group of tasks. block contains the main tasks. If any task in the block fails, the rescue block executes. The always block runs regardless of success or failure. This is perfect for cleanup operations:

``yaml - name: Deploy application block: - name: Pull latest image ansible.builtin.docker_image: name: myapp:latest source: pull - name: Start container ansible.builtin.docker_container: name: myapp image: myapp:latest state: started rescue: - name: Rollback to previous image ansible.builtin.docker_image: name: myapp:previous source: pull - name: Notify team ansible.builtin.uri: url: https://hooks.slack.com/services/... method: POST body: '{"text":"Deployment failed, rolled back"}' always: - name: Clean up temp files ansible.builtin.file: path: /tmp/deploy_temp state: absent ``

Important: variables set in block are available in rescue and always. However, if a task in rescue fails, the entire play fails (unless you handle it). Use ignore_errors in rescue if needed. Also, rescue does not run if a task in block fails due to syntax error or unreachable host—only task execution failures.

Rescue does not catch all failures

Failures like 'unreachable host', 'invalid privilege escalation', or 'syntax error' are not caught by rescue. Only task-level failures are caught.

Production Insight

We used block/rescue to wrap a multi-step database migration. When a step failed, rescue rolled back the schema and notified the team. The always block cleaned up temporary SQL files. This pattern saved us from manual intervention multiple times.

Key Takeaway

Use block/rescue/always for atomic operations that need cleanup or rollback; it's the closest thing to try-catch-finally in Ansible.

any_errors_fatal: Stop the Play on Any Failure

By default, if a task fails on a host, Ansible stops executing further tasks on that host but continues on other hosts. Setting any_errors_fatal: true changes this: if any task fails on any host, the entire play stops immediately for all hosts. This is useful when a failure on one host indicates a systemic issue that should halt the entire deployment. Use it sparingly, as it can cause unnecessary downtime.

``yaml - name: Deploy critical update hosts: all any_errors_fatal: true tasks: - name: Validate config ansible.builtin.shell: /usr/local/bin/validate_config.sh ``

In production, we used this for a security patch that had to be applied consistently across all hosts. If one host failed validation, we wanted to stop and investigate. However, we combined it with serial: 1 to limit blast radius. A common mistake is setting any_errors_fatal: true without serial, causing all hosts to fail if one has a transient issue.

Combine with serial for controlled rollout

Use serial: 1 or a small batch size with any_errors_fatal: true to avoid taking down the entire fleet on a single failure.

Production Insight

We had a playbook that deployed a new SSL certificate. One host had a misconfigured nginx, causing the task to fail. With any_errors_fatal, the entire deployment stopped, preventing the bad config from spreading. We then fixed the host and re-ran.

Key Takeaway

any_errors_fatal is a nuclear option; use it only when a failure on one host means the entire deployment is compromised.

max_fail_percentage: Graceful Degradation in Rolling Updates

max_fail_percentage is a play-level directive that sets the maximum percentage of hosts that can fail before Ansible aborts the entire play. It's typically used with serial for rolling updates. For example:

``yaml - name: Rolling update hosts: webservers serial: 5 max_fail_percentage: 20 tasks: - name: Update app ansible.builtin.yum: name: myapp state: latest ``

If more than 20% of the hosts in a batch fail, the play stops. This prevents a bad deployment from taking down too many hosts. The percentage is calculated per batch, not globally. If you have 5 hosts per batch and 2 fail (40%), that exceeds 20%, so the play stops. Important: max_fail_percentage defaults to 0 (abort on any failure) if not set? Actually, default is max_fail_percentage: 0 meaning abort on any failure? No, default is no limit. Check docs: if not set, there's no limit. So setting it to 0 means abort on any failure? Actually, from Ansible docs: 'The maximum percentage of hosts that can fail before the play is aborted. If not set, the play will not abort regardless of failures.' So 0 means abort on any failure. To allow some failures, set a positive integer. In production, we use 20% for rolling updates to tolerate transient issues.

max_fail_percentage vs any_errors_fatal

any_errors_fatal stops the play on any failure regardless of percentage. max_fail_percentage allows a certain percentage before aborting. They can be combined; any_errors_fatal overrides.

Production Insight

During a rolling update of 100 web servers with serial: 10, we had a batch where 3 hosts failed due to a transient network issue. With max_fail_percentage: 20, the play continued because 3/10 = 30% > 20%, so it actually stopped. We had to increase to 30% to tolerate the flaky network.

Key Takeaway

Set max_fail_percentage based on your tolerance for failure; remember it's per-batch, not global.

Using Rescue to Notify and Clean Up After Failures

The rescue block is not just for rollback; it's also for notification and cleanup. You can use ansible.builtin.uri to call webhooks, ansible.builtin.mail to send emails, or ansible.builtin.slack (community.general) to notify teams. For cleanup, use ansible.builtin.file to remove temporary files, or ansible.builtin.service to stop services. Example:

``yaml - name: Deploy with notification block: - name: Deploy app ansible.builtin.copy: src: /tmp/app.war dest: /opt/tomcat/webapps/ - name: Restart tomcat ansible.builtin.service: name: tomcat state: restarted rescue: - name: Notify failure ansible.builtin.uri: url: "https://hooks.slack.com/services/T00/B00/xxx" method: POST body_format: json body: text: "Deployment failed on {{ inventory_hostname }}" ignore_errors: yes - name: Clean up deployed file ansible.builtin.file: path: /opt/tomcat/webapps/app.war state: absent ignore_errors: yes always: - name: Remove temp files ansible.builtin.file: path: /tmp/deploy_temp state: absent ``

Note the `ignore_errors: yes` on rescue tasks: if the notification fails, you don't want that to compound the failure. Also, the always block runs even if rescue fails. This pattern is essential for maintaining observability and cleanliness in production.

Use ignore_errors in rescue tasks

If a rescue task fails, the play will fail, potentially masking the original error. Add ignore_errors: yes to non-critical rescue tasks like notifications.

Production Insight

We had a deployment that failed because of a missing dependency. The rescue block not only rolled back the deployment but also sent a message to our incident channel with the exact error. This allowed the on-call engineer to quickly diagnose and fix.

Key Takeaway

Rescue blocks should both remediate and notify; always include ignore_errors on notification tasks to avoid secondary failures.

Combining Error Handling Directives: A Production Pattern

In real playbooks, you'll combine multiple directives. Here's a pattern for a rolling update with error handling:

``yaml - name: Rolling update with error handling hosts: webservers serial: 10 max_fail_percentage: 20 any_errors_fatal: false tasks: - name: Pre-check block: - name: Check disk space ansible.builtin.shell: df / | awk 'NR==2 {print $5}' | sed 's/%//' register: disk_usage failed_when: disk_usage.stdout | int > 90 - name: Check service health ansible.builtin.uri: url: http://localhost:80/health status_code: 200 register: health ignore_errors: yes rescue: - name: Skip host and notify ansible.builtin.debug: msg: "Host {{ inventory_hostname }} failed pre-check, skipping" changed_when: false - name: Notify ansible.builtin.uri: url: https://hooks.slack.com/... method: POST body: '{"text":"Pre-check failed on {{ inventory_hostname }}"}' ignore_errors: yes always: - name: Log check result ansible.builtin.copy: content: "{{ disk_usage.stdout }}" dest: /var/log/precheck.log ignore_errors: yes ``

This pattern checks prerequisites, skips the host if they fail, and logs the result. The play continues with other hosts, but if too many fail, max_fail_percentage aborts. This is a robust pattern for large fleets.

Order of precedence

ignore_errors overrides failed_when. rescue runs only if a task in block fails. always runs regardless. any_errors_fatal overrides max_fail_percentage.

Production Insight

We used this pattern to deploy a new version of our API server. The pre-check verified that the database was reachable. If not, the host was skipped and we got a notification. This prevented a full rollout to a broken state.

Key Takeaway

Combine block/rescue/always with play-level directives for a comprehensive error handling strategy that scales.

Testing Error Handling: CI/CD Patterns

Error handling code is only as good as its test coverage. In CI, create test playbooks that intentionally fail to verify your error paths. Use ansible-playbook --syntax-check to catch syntax errors. For logic testing, use ansible-playbook --check --diff to see what would change. But for error handling, you need to actually trigger failures. We use molecule with scenarios that simulate failures:

``yaml # molecule/default/verify.yml - name: Verify error handling hosts: all tasks: - name: Trigger failure ansible.builtin.command: /bin/false register: result failed_when: result.rc != 0 ``

Then assert that the rescue block ran. Another pattern: use ansible.builtin.fail module in test plays. For example, to test max_fail_percentage, run a playbook with multiple hosts and force failures on some. Use ansible-playbook --limit to target specific hosts. Also, use -v flags to see error handling output: -vvv shows failed_when evaluation. In production, we have a CI pipeline that runs a dedicated 'chaos' playbook that injects failures to validate our error handling.

Don't test error handling in production

Always test in a staging environment that mirrors production. Use dedicated test hosts or containers.

Production Insight

Our CI pipeline once passed despite a misconfigured failed_when because the test never triggered the failure condition. We added a step that explicitly forces the failure condition to validate the error path.

Key Takeaway

Intentionally trigger failures in CI to validate your error handling; don't assume it works because the happy path passes.

Common Pitfalls with ignore_errors and failed_when Interactions

The interaction between ignore_errors and failed_when can be confusing. Key rule: ignore_errors is evaluated after failed_when. So if both are set, the task is first evaluated for failure using failed_when. If failed_when returns true, the task is marked failed, but then ignore_errors causes the play to continue. The task output still shows 'failed' with 'ignoring'. This can mislead operators. A common pitfall is setting ignore_errors: yes on a task with failed_when thinking it will suppress the failure display. It doesn't. To truly suppress, use failed_when: false and no ignore_errors. But that's an anti-pattern. Better: use register and conditionals on subsequent tasks. Example:

```yaml - name: Attempt to stop service ansible.builtin.service: name: myapp state: stopped register: stop_result ignore_errors: yes

name: Handle failure
ansible.builtin.debug:
msg: "Service stop failed, continuing"
when: stop_result is failed
```

This pattern is clearer than relying on ignore_errors alone. In production, we avoid ignore_errors on critical tasks; we use register and when to handle failures explicitly.

ignore_errors does not change the failed status

The task's 'failed' status remains true; ignore_errors only allows the play to continue. Use register and when for conditional logic.

Production Insight

We had a task that checked for a file using stat and used ignore_errors. The file didn't exist, so stat returned 'exists: false', but the task didn't fail. So ignore_errors was unnecessary. We removed it.

Key Takeaway

Prefer register + when over ignore_errors for conditional logic; use ignore_errors only when you truly want to ignore any failure and continue.

Error Handling in Loops: With_items and Failed Items

When using loops (e.g., with_items, loop), a failure in one iteration stops the entire task by default. To handle per-item failures, use ignore_errors: yes on the task and then check results for failures. Example:

```yaml - name: Install packages ansible.builtin.yum: name: "{{ item }}" state: present loop: - nginx - bad-package - mysql ignore_errors: yes register: install_results

name: Report failed packages
ansible.builtin.debug:
msg: "Package {{ item.item }} failed to install"
loop: "{{ install_results.results | selectattr('failed', 'equalto', true) | list }}"
```

This pattern allows the play to continue and then process failures. In production, we use this for package installations where some packages might be unavailable. We then send a report of failed packages to a monitoring system.

Use loop_control to limit failures

You can use loop_control with pause to throttle, but for error handling, register the results and filter.

Production Insight

During a mass package update, one package had a dependency conflict. With ignore_errors on the loop, the play continued and we captured the failure. We then fixed the dependency and re-ran only the failed packages.

Key Takeaway

For loops, use ignore_errors at the task level and then inspect results for per-item failures; this gives you granular control without aborting the entire loop.

Error Handling Best Practices for Production Playbooks

Always register results for tasks that can fail, even if you use ignore_errors. This allows debugging later.
Use failed_when instead of ignore_errors when you have specific failure criteria.
Limit any_errors_fatal to critical deployments; use max_fail_percentage for rolling updates.
Test error paths in CI by forcing failures.
Document error handling decisions in comments, especially why a task is ignored.
Use block/rescue/always for any multi-step operation that needs cleanup.
Avoid nested blocks; they complicate error handling.
Set changed_when: false on read-only tasks to avoid false change notifications.
Use ansible.builtin.fail in rescue blocks to re-raise failures after cleanup if needed.
Monitor for ignored failures; they can hide real issues. Use a post-play hook to check for ignored tasks.

Example of a post-play hook:

``yaml - name: Check for ignored failures ansible.builtin.fail: msg: "There were {{ ignored_count }} ignored failures" when: ignored_count | default(0) > 0 vars: ignored_count: "{{ ansible_failed_result | selectattr('ignored', 'equalto', true) | list | length }}" ``

This is a simplified example; in practice, you'd need to aggregate across hosts.

Ansible lint can catch some issues

Run ansible-lint on your playbooks; it can detect missing error handling or misused directives.

Production Insight

We adopted a policy that every task that uses ignore_errors must have a comment explaining why. This reduced accidental masking of failures by 80%.

Key Takeaway

Discipline in error handling is a force multiplier; document, test, and monitor your error handling logic.

● Production incidentPOST-MORTEMseverity: high

The Stale Lock File Incident

Symptom

Ansible playbook failed on multiple hosts with 'lock file exists' error, and the play stopped after the first batch of 5 hosts.

Assumption

The engineer assumed the lock file would be cleaned up by a previous run, and that failure would just skip the host.

Root cause

The task that checked for the lock file had ignore_errors: yes removed during a refactor, and any_errors_fatal was set to true on the play. The lock file was stale but harmless.

Fix

Re-added ignore_errors: yes to the lock file check task, and changed any_errors_fatal to false. Also added a rescue block to delete the lock file if the deployment failed.

Key lesson

Never assume a failure is safe; explicitly declare error handling intent.
Use ignore_errors for non-critical checks, and always test with a stale state.

Production debug guideSymptom → Root cause → Fix4 entries

Symptom · 01

Task fails but play continues, and you see 'ignored' in output

→

Fix

Check if ignore_errors: yes is set. If not intended, remove it. If intended, verify the task's failure condition is correct.

Symptom · 02

Task succeeds but play reports 'failed'

→

Fix

Check failed_when condition. It might be evaluating to true on success. Example: failed_when: result.rc != 0 but command returns 1 on success. Fix by adjusting condition.

Symptom · 03

Task reports 'changed' when nothing changed

→

Fix

Add changed_when: false or set a condition like changed_when: result.rc == 0 if the command always returns 0.

Symptom · 04

Play stops on first host failure despite serial being set

→

Fix

Check if any_errors_fatal: true is set on the play. Set it to false or remove it. Also check for max_fail_percentage: 0 which acts similarly.

★ Ansible Error Handling Quick Referenceprint this for your desk

Task fails but should be ignored−

Immediate action

Add ignore_errors: yes to the task

Commands

ansible-playbook playbook.yml --check

Fix now

ignore_errors: yes

Task succeeds but should fail on specific output+

Task reports changed but shouldn't+

Play stops on first failure unexpectedly+

Rolling update aborts too quickly+

Error Handling Directives Comparison

Directive	Scope	Effect on Failure	Use Case
ignore_errors	Task	Continues play, marks task as 'failed...ignoring'	Non-critical checks, e.g., optional service stop
failed_when	Task	Overrides failure condition	Commands with non-standard exit codes
changed_when	Task	Overrides changed status	Idempotent scripts that always report success
block/rescue/always	Block	Try-catch-finally for task group	Multi-step operations needing rollback/cleanup
any_errors_fatal	Play	Stops entire play on any failure	Critical deployments where consistency is mandatory
max_fail_percentage	Play	Aborts if failure % exceeds threshold	Rolling updates with tolerance for transient failures

Key takeaways

Use ignore_errors sparingly; prefer failed_when for custom failure conditions.

block/rescue/always is the only way to implement try-catch-finally in Ansible.

any_errors_fatal stops the entire play; combine with serial to limit blast radius.

max_fail_percentage is per-batch, not global; set based on your failure tolerance.

Always register results from tasks that might fail for later inspection.

Test error handling paths in CI by intentionally triggering failures.

Symptom

Rescue task fails, causing the play to fail, masking the original error

Fix

Add ignore_errors: yes to non-critical rescue tasks like notifications

INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01SENIOR

What is the difference between ignore_errors and failed_when?

Q02SENIOR

How does block/rescue/always work in Ansible? Provide an example.

Q03SENIOR

What is the purpose of any_errors_fatal and max_fail_percentage?

Q04SENIOR

How can you handle failures in a loop (with_items) without stopping the ...

Q05SENIOR

What is a common pitfall when using rescue blocks for cleanup?

Q06SENIOR

Explain the interaction between ignore_errors and failed_when.

Q07SENIOR

How can you test error handling in Ansible playbooks?

Q08SENIOR

What is the default value of max_fail_percentage and how does it behave?

Q01 of 08SENIOR

What is the difference between ignore_errors and failed_when?

ANSWER

ignore_errors tells Ansible to continue executing subsequent tasks even if the current task fails; the task is still marked as failed in output. failed_when overrides the condition that determines failure, allowing you to define custom failure criteria based on return code, stdout, etc. They can be used together, but ignore_errors takes precedence after failed_when evaluation.

FAQ · 8 QUESTIONS

Frequently Asked Questions

Can I use ignore_errors and failed_when together?

Does rescue block catch all types of failures?

What happens if a rescue task fails?

How do I skip a host on failure but continue with others?

What is the difference between any_errors_fatal and max_fail_percentage: 0?

Can I use changed_when with block/rescue?

How do I re-raise a failure after cleanup in rescue?

Is there a way to globally set error handling for all tasks?

Naren Founder & Principal Engineer

20+ years shipping production infrastructure and CI/CD at scale. Notes here come from systems that actually shipped.

✓ Verified

production tested

June 21, 2026

last updated

1,596

articles · all by Naren

🔥

That's Ansible. Mark it forged?

10 min read · try the examples if you haven't