Ansible Command vs Shell: The Gotcha That Broke Our Prod Deploy
Master Ansible command, shell, raw, and script modules.
20+ years shipping production infrastructure and CI/CD at scale. Written from production experience, not tutorials.
Use command for simple executables — no shell expansion, no pipes, no env vars.
Use shell only when you need pipes, redirects, or $HOME — but never with user input to avoid injection.
Use raw to bootstrap hosts without Python (e.g., initial Python install via raw: yum install -y python3).
Use script to run a local script on the remote host — ideal for complex logic that would be messy inline.
Always use creates or removes flags to make command/shell tasks idempotent — without them, they run every playbook execution.
Avoid shell for package installation, file operations, or service management — use yum, copy, service modules instead for idempotency and error handling.
The command module does NOT expand $HOME or ~ — use shell or pass $HOME explicitly via environment.
For raw output debugging, use register: result and debug: var=result.stdout_lines.
Imagine you're a chef in a busy kitchen. The command module is like giving a direct order to a line cook: "Chop these onions." The cook uses a knife (the executable) and does exactly what you say — no shortcuts, no interpretation. The shell module is like telling a sous chef: "Prepare the mise en place for the onion soup." The sous chef decides to use a food processor, a mandoline, or even delegate to a junior cook (piping, redirection, variable expansion). That flexibility is powerful but dangerous — if you say "use the sharpest knife" and the sous chef interprets it as "use the cleaver," you might get chunks instead of dice. The raw module is like shouting an order through a kitchen window when the line cook doesn't speak your language — it bypasses all normal communication channels (no Python on the host). And the script module is like handing the cook a printed recipe card from your own notebook — you wrote it locally, but they execute it remotely. The key lesson: use the simplest tool that gets the job done, and always add a "check if the onions are already chopped" (creates/removes flags) to avoid chopping them twice.
I still remember the 2 AM call. Our deployment pipeline had been running smoothly for weeks, but that night, a new Ansible playbook — written by a well-meaning junior engineer — reformatted all our application servers. The culprit? A shell task that ran mkfs.ext4 /dev/xvdf without checking if the filesystem already existed. The creates flag was missing. The result: 12 production servers wiped clean. That incident taught me the hard way that Ansible's command and shell modules are powerful but unforgiving. They're the sledgehammers of automation: effective but easy to cause collateral damage.
This article is about understanding the subtle but critical differences between Ansible's command, shell, raw, and script modules. These modules are often the first tools beginners reach for, but they're also the most misused. I'll show you exactly when to use each, how to make them idempotent with creates and removes flags, and — most importantly — when to avoid them entirely and use purpose-built modules like yum, copy, or service.
We'll cover real production incidents, debugging guides, and a quick-reference cheat sheet you can print and tape to your monitor. By the end, you'll never accidentally wipe a filesystem again — and you'll know exactly why command doesn't expand $HOME and why that matters.
Command vs Shell: The Fundamental Difference
The command module runs a command directly, without a shell. This means no shell expansion — no $HOME, no ~, no wildcards, no pipes. It's the safest option because it avoids shell injection vulnerabilities and is more predictable. The shell module, on the other hand, runs the command through /bin/sh (or the configured shell via executable parameter), so it supports all shell features.
When to use command: - Running a binary with arguments: command: /usr/bin/curl -s http://example.com - Creating a directory: command: mkdir -p /data - Checking a file: command: test -f /etc/passwd
When to use shell: - Piping output: shell: cat /var/log/messages | grep error - Using environment variables: shell: echo $HOME - Redirection: shell: echo hello > /tmp/hello.txt
Production gotcha: Never use shell with user input. If you must, sanitize with | quote filter. Example: shell: grep {{ pattern | quote }} /var/log/syslog.
Code example: ```yaml - name: Create directory using command (safe) ansible.builtin.command: cmd: mkdir -p /opt/myapp/data creates: /opt/myapp/data
- name: Get current user's home using shell (needs expansion)
- ansible.builtin.shell:
- cmd: echo $HOME
- register: home
- name: Debug home
- ansible.builtin.debug:
- var: home.stdout
- ```
command: echo $HOME, it will literally print $HOME because there's no shell to expand it. Use shell or pass the environment variable explicitly via environment parameter.command: cp /tmp/{{ item }} /opt/ inside a loop. The files were not being copied because * wasn't expanded. We switched to shell with cmd: cp /tmp/{{ item }} /opt/ and it worked. But then we had injection risks. The proper fix was to use copy module.command for simple executables; use shell only when you need shell features, and never with untrusted input.The Raw Module: Bootstrapping Python-Less Hosts
The raw module is the escape hatch. It sends the command over SSH and executes it directly, without requiring Python on the target. This is critical for initial provisioning of minimal systems (e.g., Docker containers, embedded devices, or freshly installed OS images that lack Python).
When to use raw: - Installing Python on a host that has no Python: raw: yum install -y python3 - Running a one-off command on a constrained system: raw: uptime - Debugging when Python modules fail: raw: ls -la /tmp
Limitations: - No idempotency by default (use creates/removes if possible) - No fact gathering (you can't use setup without Python) - No error handling beyond return codes - Not idempotent for most operations
Production pattern: ```yaml - name: Bootstrap Python on minimal CentOS ansible.builtin.raw: cmd: yum install -y python3 become: yes changed_when: false # raw doesn't track changes
- name: Now gather facts
- ansible.builtin.setup:
- # This will work because Python is now installed
- ```
Gotcha: raw does not use the Ansible module system, so it doesn't support args like creates. You have to implement idempotency manually with when conditions based on stat results (but stat requires Python). For bootstrapping, this is acceptable because it runs only once.
raw module is the only way to execute commands on a host without Python. However, it also means no Ansible modules are available. Use it sparingly and only for bootstrapping.raw task to install Python, then ran the rest of the playbook. The raw task was the only way to get Ansible working on those devices.raw only to bootstrap Python on hosts that lack it; then switch to normal modules.The Script Module: Running Local Scripts Remotely
The script module copies a local script (from the control node) to the remote host and executes it. It's ideal for complex logic that would be messy as an inline command, or when you need to reuse a script across multiple tasks.
When to use script: - Running a multi-line bash script that's too long for shell - Executing a Python script that needs to run on the remote host - When you want to version-control the script alongside your playbook
How it works: 1. Ansible SCPs the script to a temporary directory on the remote host. 2. The script is executed via the shell (default /bin/sh). 3. The temporary files are cleaned up after execution.
Idempotency with creates: ``yaml - name: Run setup script only if flag file missing ansible.builtin.script: cmd: /opt/ansible/scripts/setup.sh creates: /opt/app/.setup_done ``
Passing arguments: ``yaml - name: Run script with arguments ansible.builtin.script: cmd: /opt/ansible/scripts/deploy.sh --env production --version 1.2.3 ``
Production gotcha: The script must exist on the control node. If you use a relative path, it's relative to the playbook directory. Also, the script is executed with the remote user's shell, so ensure it has proper shebang (#!/bin/bash).
script module. It's easier to test, maintain, and version control.script module. The playbook became clean, and we could test the script independently.script to run local scripts remotely — it's cleaner than inline shell and supports idempotency via creates.Idempotency with Creates and Removes Flags
The command, shell, and script modules are not idempotent by default — they run every time the playbook executes. To make them idempotent, use the creates and removes flags. These flags tell Ansible to skip the task if a specified file or directory exists (or does not exist).
creases: Skip if the file/directory exists. ``yaml - name: Create data directory only if it doesn't exist ansible.builtin.command: cmd: mkdir -p /data creates: /data ``
removes: Skip if the file/directory does NOT exist. ``yaml - name: Remove temp file only if it exists ansible.builtin.command: cmd: rm /tmp/tempfile removes: /tmp/tempfile ``
Combined with changed_when: Sometimes the command itself has a different success indicator. For example, grep returns 0 if found, 1 if not found. You can combine creates with failed_when or changed_when.
Production pattern: ``yaml - name: Download archive if not already present ansible.builtin.shell: cmd: curl -o /tmp/package.tar.gz https://example.com/package.tar.gz creates: /tmp/package.tar.gz args: warn: false # Suppress curl warning ``
Gotcha: creates and removes only work with files/directories, not with command output. If you need to check command output, use register and when.
Why this matters: Without these flags, every task reports 'changed' on every run, making it impossible to know what actually changed. Worse, destructive commands (like rm -rf) run every time, potentially causing data loss.
creates or removes to prevent accidental re-execution. Otherwise, your playbook is not idempotent.useradd without creates. Every run attempted to create the user, and Ansible reported 'changed' each time. Worse, if the user already existed, useradd would return non-zero and fail the playbook. The fix was to use the user module instead, which is idempotent by default.creates or removes to command/shell/script tasks to make them idempotent and safe for repeated runs.When to Avoid Shell and Use Purpose-Built Modules
One of the most common mistakes beginners make is using shell for everything. But Ansible has a rich set of modules that are idempotent, handle edge cases, and provide clear error messages. Using shell for tasks that have a dedicated module is like using a sledgehammer to hang a picture — it works, but it's risky and ugly.
Examples of purpose-built modules to use instead of shell: - Package management: yum, apt, dnf, pip — instead of shell: yum install -y nginx - File operations: copy, template, file, lineinfile — instead of shell: cp /src /dst or shell: echo 'line' >> /etc/config - Service management: service, systemd — instead of shell: systemctl restart nginx - User management: user, group — instead of shell: useradd bob - Command execution with output: command with register — instead of shell: echo $(command)
Why purpose-built modules are better: - Idempotent by design (e.g., yum only installs if not already installed) - Better error handling (e.g., copy checks checksums) - Idempotency reporting (e.g., changed vs ok) - No shell injection risk - Easier to read and maintain
Production example: ```yaml # Bad: shell for package install - name: Install nginx ansible.builtin.shell: cmd: yum install -y nginx # Not idempotent, no error if package not found
# Good: use yum module - name: Install nginx ansible.builtin.yum: name: nginx state: present ```
When is shell acceptable? - When no dedicated module exists (e.g., running a custom tool) - When you need pipes, redirects, or complex shell features - When you're prototyping and plan to refactor later
Rule of thumb: If you can find an Ansible module for the task, use it. If not, consider writing a custom module or using command with creates.
ansible-doc -l | grep <topic> to find if a module exists for your task. For example, ansible-doc -l | grep user shows user module.shell tasks, many of which were doing simple file operations. After refactoring to use copy, template, and lineinfile, the playbook became idempotent, faster, and much easier to debug. The number of 'changed' tasks dropped from 50 to 5 on subsequent runs.Environment Variables and Working Directory
Both command and shell modules support environment and chdir parameters. environment sets environment variables for the command, and chdir changes the working directory before execution.
Setting environment variables: ``yaml - name: Run command with custom environment ansible.builtin.command: cmd: /usr/bin/env environment: MY_VAR: my_value PATH: /usr/local/bin:{{ ansible_env.PATH }} ``
Changing working directory: ``yaml - name: Run script from specific directory ansible.builtin.shell: cmd: ./deploy.sh chdir: /opt/myapp ``
Production gotcha: chdir only affects the command itself, not the entire task. If you need to run multiple commands in the same directory, consider using shell with cd /path && command1 && command2 or use script module with a script that changes directory.
Using environment with creates: ``yaml - name: Download file with env var ansible.builtin.command: cmd: curl -O https://example.com/file creates: /tmp/file environment: http_proxy: http://proxy:8080 ``
Note on shell and environment: When using shell, environment variables set via environment are available to the shell. However, the shell may also inherit environment variables from the remote user's profile. Use environment to override or add variables.
command, environment variables are passed directly to the executable. In shell, they are available to the shell and can be used in the command string (e.g., echo $MY_VAR).JAVA_HOME. Using environment with command worked perfectly. But when we tried to use shell with echo $JAVA_HOME, it printed the remote user's default Java home because the shell expanded it before our env var took effect. We had to set JAVA_HOME in the shell command itself: shell: JAVA_HOME=/usr/lib/jvm/java-11; echo $JAVA_HOME.chdir and environment to control the execution context of command and shell tasks.Error Handling and Return Codes
By default, Ansible considers a command successful if it returns exit code 0. Any non-zero exit code causes the task to fail. You can override this with failed_when and changed_when.
Ignoring errors: ``yaml - name: Try to stop service (might not be running) ansible.builtin.command: cmd: systemctl stop myapp ignore_errors: yes register: result ``
Custom failure condition: ``yaml - name: Check if file contains pattern ansible.builtin.shell: cmd: grep 'pattern' /var/log/app.log register: grep_result failed_when: grep_result.rc not in [0, 1] changed_when: false ``
Using stdout and stderr: ``yaml - name: Run script and capture output ansible.builtin.shell: cmd: /opt/scripts/deploy.sh register: deploy_result failed_when: deploy_result.rc != 0 or 'ERROR' in deploy_result.stderr ``
Production pattern for idempotent command with custom check: ``yaml - name: Initialize database if not already initialized ansible.builtin.shell: cmd: /opt/app/bin/init_db.sh register: init_result changed_when: '"already initialized" not in init_result.stdout' failed_when: init_result.rc != 0 ``
Gotcha: ignore_errors still marks the task as 'failed' in the output but continues the playbook. Use failed_when for more control.
debug: var=result.stdout_lines to see output. This is invaluable for troubleshooting.failed_when: result.rc not in [0, 1] to allow exit code 1 as a valid success indicator.failed_when and changed_when to handle non-standard exit codes and output-based success conditions.Security: Shell Injection and Avoiding Unsafe Practices
Using shell with user input is a recipe for shell injection. If you must use shell with dynamic values, sanitize them with the quote filter.
Unsafe: ``yaml - name: Dangerous shell injection ansible.builtin.shell: cmd: grep {{ user_input }} /var/log/syslog ``
Safe with quote filter: ``yaml - name: Safe grep ansible.builtin.shell: cmd: grep {{ user_input | quote }} /var/log/syslog ``
Better: use command with arguments: ``yaml - name: Safe grep with command ansible.builtin.command: cmd: grep args: argv: - grep - "{{ user_input }}" - /var/log/syslog ``
Using cmd parameter with list (avoids shell entirely): ``yaml - name: Execute with no shell ansible.builtin.command: cmd: /usr/bin/curl args: argv: - curl - -s - http://example.com/{{ endpoint | urlencode }} ``
Production rule: Never use shell with variables that come from user input, facts, or external sources. Use command with argv or the quote filter.
Additional security: - Use become only when necessary - Avoid using shell to run commands that change system state - Use no_log: true if the command contains sensitive data (e.g., passwords) - Set warn: false only if you understand the risk
shell with {{ user_input }} and that input comes from a web form or API, an attacker could inject arbitrary commands. Always sanitize with | quote or use command with argv.shell task to ping the host. One day, the variable was set to ; rm -rf /. Luckily, the playbook ran on a non-production host, but it was a wake-up call. We switched to command with argv and never looked back.command with argv or sanitizing with quote filter when you must use shell.Performance and Best Practices for Large-Scale Deployments
In large environments with hundreds of hosts, the overhead of running shell commands can add up. Here are some performance tips:
1. Use pipelining to reduce SSH connections. Set pipelining: True in ansible.cfg to combine multiple module operations into fewer SSH connections. This speeds up command execution significantly.
2. Avoid shell for simple commands. command is faster because it doesn't invoke a shell. For example, command: /usr/bin/mkdir -p /tmp is faster than shell: mkdir -p /tmp.
3. Use async for long-running commands. ``yaml - name: Long running script ansible.builtin.shell: cmd: /opt/long_script.sh async: 3600 poll: 30 register: long_result ``
4. Limit changed_when to reduce logging. If a command is not idempotent but you only care about failures, set changed_when: false to avoid cluttering the output.
5. Use delegate_to to run commands on a single host instead of all. ``yaml - name: Run once on first host ansible.builtin.command: cmd: /usr/bin/update_global_config run_once: true ``
6. Combine with serial for rolling updates. ``yaml - name: Rolling restart hosts: webservers serial: 5 tasks: - name: Restart nginx ansible.builtin.shell: cmd: systemctl restart nginx ``
Production insight: In a 500-host environment, switching from shell to command for simple tasks reduced playbook runtime by 15% because we avoided shell overhead. Combined with pipelining, we saw a 30% improvement.
requiretty is disabled for the remote user in /etc/sudoers. Otherwise, you'll get 'sudo: sorry, you must have a tty to run sudo' errors.command instead of shell where possible, and adding async for long tasks, we reduced runtime to 45 minutes.command over shell, enabling pipelining, and using async for long-running tasks.Debugging Common Errors with Command and Shell
Here are the most common errors and how to fix them:
Error: 'MODULE FAILURE' or 'Timeout (12s)' - Root cause: Command took too long. Increase timeout or use async. - Fix: command: /usr/bin/long_running timeout: 300
Error: 'rc: 127' command not found - Root cause: The executable is not in the PATH or not installed. - Fix: Use full path or install the package first.
Error: 'Syntax error: Unterminated quoted string' - Root cause: Mismatched quotes in shell command. - Fix: Use YAML multiline syntax or escape quotes properly.
Error: 'The module command is not available' - Root cause: Using shell module on a host without Python (should use raw). - Fix: Use raw module for bootstrap.
Error: 'changed: [host] => {'changed': True}' when nothing changed - Root cause: No creates/removes flag. - Fix: Add creates or removes.
Debugging steps: 1. Add -vvv to ansible-playbook command to see SSH output. 2. Use register and debug to see stdout/stderr. 3. Run the command manually on the target host to verify. 4. Check if the command requires a shell feature (pipes, etc.) and switch to shell if needed.
Example debugging playbook: ```yaml - name: Debug command hosts: all tasks: - name: Run command ansible.builtin.command: cmd: /usr/bin/curl -I http://localhost register: result ignore_errors: yes
- name: Print stdout
- ansible.builtin.debug:
- var: result.stdout_lines
- name: Print stderr
- ansible.builtin.debug:
- var: result.stderr_lines
- name: Print rc
- ansible.builtin.debug:
- var: result.rc
- ```
ansible-playbook -vvv playbook.yml shows the exact SSH commands being executed, including the command string. This is invaluable for diagnosing syntax errors or path issues.Advanced Patterns: Combining Modules for Complex Workflows
In real production, you often need to combine command/shell with other modules. Here are some advanced patterns:
1. Conditional execution based on previous command output: ```yaml - name: Check if service is running ansible.builtin.shell: cmd: systemctl is-active myapp register: service_status changed_when: false failed_when: service_status.rc not in [0, 3] # 0=active, 3=inactive
- name: Restart service if not running
- ansible.builtin.service:
- name: myapp
- state: restarted
- when: service_status.rc != 0
- ```
2. Using loop with command: ``yaml - name: Create multiple directories ansible.builtin.command: cmd: mkdir -p /data/{{ item }} creates: /data/{{ item }} loop: - app1 - app2 - app3 ``
3. Combining script with creates and register: ``yaml - name: Run migration script ansible.builtin.script: cmd: /opt/ansible/scripts/migrate.sh --version {{ version }} creates: /opt/app/.migrated-{{ version }} register: migrate_result failed_when: migrate_result.rc != 0 or 'ERROR' in migrate_result.stderr ``
4. Using environment with command for proxy settings: ``yaml - name: Download package behind proxy ansible.builtin.command: cmd: curl -O http://example.com/package.rpm creates: /tmp/package.rpm environment: http_proxy: http://proxy:8080 https_proxy: http://proxy:8080 ``
5. Idempotent database migration using shell with conditional: ``yaml - name: Run database migration if not already applied ansible.builtin.shell: cmd: /opt/app/bin/migrate.sh register: migration changed_when: '"Migration applied" in migration.stdout' failed_when: migration.rc != 0 ``
creates, each iteration checks for the file independently. This ensures idempotency per item.Migrating from Shell to Purpose-Built Modules: A Refactoring Guide
If you have a legacy playbook full of shell tasks, here's a systematic approach to refactor:
Step 1: Identify the task's goal. - Installing a package? → yum/apt/pip - Copying a file? → copy/template - Managing a service? → service/systemd - Creating a user? → user - Running a command that doesn't fit any module? → Keep as command or shell
Step 2: Replace with module. ```yaml # Before - name: Install nginx ansible.builtin.shell: cmd: yum install -y nginx
# After - name: Install nginx ansible.builtin.yum: name: nginx state: present ```
Step 3: Test idempotency. Run the playbook twice. The second run should show no 'changed' tasks for the refactored tasks.
Step 4: Remove creates/removes if they were used. Purpose-built modules handle idempotency internally.
Step 5: Update error handling. Remove custom failed_when if the module handles errors better.
Common refactoring patterns: - shell: echo 'line' >> /etc/config → lineinfile: path=/etc/config line='line' - shell: cp /src /dst → copy: src=/src dest=/dst - shell: chown user:group /file → file: path=/file owner=user group=group - shell: systemctl enable nginx → service: name=nginx enabled=yes
Production insight: We refactored a 2000-line playbook that was 80% shell tasks. After refactoring, the playbook was 1200 lines, ran 40% faster, and had zero 'changed' tasks on subsequent runs (down from 150).
ansible-doc -s yum to see a summary of the yum module, or ansible-doc yum for full documentation. This helps you find the right module for your task.sed -i 's/old/new/g' /etc/config. We replaced it with the replace module: replace: path=/etc/config regexp='old' replace='new'. It was safer, idempotent, and easier to read.The Great Filesystem Wipe of 2023
mkfs command would fail if the filesystem already existed, because the disk was already formatted.shell task ran mkfs.ext4 /dev/xvdf without creates=/dev/xvdf or creates=/mnt/data. mkfs does not check for existing filesystems by default — it overwrites them.creates=/mnt/data to the task, and changed from shell to filesystem module: filesystem: fstype=ext4 dev=/dev/xvdf. Also added a pre-check using stat and register.- Never use
shellorcommandfor destructive operations without an idempotency guard. - Use purpose-built modules (like
filesystem) that are idempotent by default.
creates/removes flag. Fix: Add creates=/path/to/file if the command creates a file, or removes=/path/to/file if it removes one. Example: - command: /usr/bin/touch /tmp/lock creates=/tmp/lock.command module does not expand ~ or $HOME. Fix: Use shell module or pass the full path. Example: - command: cat /home/user/file instead of cat ~/file.shell module and ensure the command is a single line or use | for multiline. Example: - shell: cat /var/log/messages | grep error.raw to install Python first. Example: - raw: yum install -y python3 then - setup: (gather facts).ansible-doc -t module command | grep -A5 createsgrep -r 'creates:' site.ymlcreates=/path/to/output to the taskKey takeaways
command for simple executables; use shell only when you need shell features like pipes, redirects, or environment variable expansion.creates or removes flags to command/shell/script tasks to make them idempotent and safe for repeated runs.raw only to bootstrap Python on hosts that lack it; then switch to normal modules for idempotency and error handling.script to run local scripts remotelycreates.shell for package installation, file operations, or service managementyum, copy, or service instead.| quote or use command with argv to prevent shell injection.register and debug to capture and inspect command output for troubleshooting.command over shell for performance in large deployments.Common mistakes to avoid
6 patternsUsing `shell` for simple commands that don't need shell features
command module insteadForgetting `creates` or `removes` flags
creates=/path or removes=/path to make it idempotentUsing `command` with environment variable expansion (e.g., `$HOME`)
$HOME instead of the valueshell module or pass environment variable via environment parameterUsing `shell` with user input without sanitization
command with argv or sanitize with | quote filterUsing `raw` for regular tasks when Python is available
command or shell insteadUsing `shell` for package installation, file copy, etc., instead of purpose-built modules
yum, copy, service, etc.Interview Questions on This Topic
What is the difference between Ansible's `command` and `shell` modules?
command module runs a command directly without a shell, so it does not support shell expansions like $HOME, pipes, or redirects. It's safer and faster. The shell module runs the command through /bin/sh, supporting all shell features. Use command for simple executables and shell only when you need shell features.Frequently Asked Questions
20+ years shipping production infrastructure and CI/CD at scale. Written from production experience, not tutorials.
That's Ansible. Mark it forged?
13 min read · try the examples if you haven't