Ansible Network Automation: Surviving Idempotency with network_cli & NAPALM
Master Ansible network automation with network_cli, ios_config, cli_command, NAPALM, and ansible-vault.
20+ years shipping production infrastructure and CI/CD at scale. Written from production experience, not tutorials.
Use connection: ansible.netcommon.network_cli for SSH-based network modules; never use local connection.
For Cisco IOS, use cisco.ios.ios_config for idempotent config pushes and cisco.ios.ios_command for show commands.
Network-agnostic modules ansible.netcommon.cli_config and ansible.netcommon.cli_command work across vendors when used with appropriate platform defaults.
NAPALM integration via community.network.napalm_* modules requires napalm Python library and connection: community.network.napalm.
Idempotency fails if you use ios_command to push config; always use ios_config with lines and parents.
Store device credentials in ansible-vault-encrypted group_vars; never in plaintext inventory.
For Junos/EOS/NXOS, use vendor-specific modules (junipernetworks.junos.junos_config, arista.eos.eos_config, cisco.nxos.nxos_config).
Always set ansible_network_os per host to avoid module selection errors.
Imagine you're a manager of a huge office building with hundreds of light switches. You want to automate turning them all off at night. Ansible is like a robot that can flip switches for you. But here's the catch: some switches are old and wonky—they don't always respond correctly. The robot needs a special way to talk to each type of switch (that's the connection plugin). And you don't want the robot to flip a switch that's already off (that's idempotency). NAPALM is like a universal remote that works with many brands of switches. And ansible-vault is a locked safe where you keep the master key to the switch room. This article teaches you how to program your robot to handle all these quirks without causing an electrical fire.
It's 2 AM on a Saturday. I'm on-call, and our network automation pipeline just pushed a bad VLAN config to 200 Cisco switches. The symptom: all trunk ports flapped, causing a 15-minute outage for the entire east coast data center. The root cause? A playbook using ios_command with commands: 'vlan 100' instead of ios_config with lines: 'vlan 100'. The ios_command module doesn't enforce idempotency—it blindly sends the command, and the switch accepted it even though the VLAN existed, causing a momentary interface reset. That night, I learned the hard way that network automation is not server automation. You can't just SSH in and run commands; you need structured modules that understand the device's state.
Historically, Ansible started as a server automation tool. The command and shell modules worked fine for Linux, but network devices are state machines with fragile CLI parsers. Early adopters used connection: local and raw SSH, which led to brittle playbooks. The Ansible network team responded with connection plugins like network_cli and vendor-specific modules. But even today, many engineers fall into the trap of treating network devices like Linux boxes.
This article covers the production patterns I've developed over 5 years of automating Cisco, Juniper, Arista, and Nexus gear. We'll dive into the network_cli connection plugin, the difference between ios_command and ios_config, network-agnostic modules, NAPALM integration, and how to avoid the idempotency landmines that will take down your network. I'll also show you how to secure credentials with ansible-vault and debug when things go wrong.
By the end, you'll know exactly which module to use for which task, how to structure your playbooks for reliability, and what to do when a switch doesn't respond as expected. Let's fix your automation before it fixes you.
1. The network_cli Connection Plugin: Why local is Dead
The network_cli connection plugin (ansible.netcommon.network_cli) is the standard for SSH-based network automation in Ansible 2.9+. It replaces the deprecated local connection and provides persistent SSH sessions, privilege escalation (enable mode), and automatic prompt handling.
Key differences from local: - network_cli maintains a single SSH connection for the entire playbook, reducing overhead. - It handles enable mode automatically when ansible_become: yes and ansible_become_method: enable are set. - It parses device prompts and waits for the correct prompt before sending commands.
Production setup: ``yaml # group_vars/network.yml ansible_connection: ansible.netcommon.network_cli ansible_network_os: cisco.ios.ios ansible_user: admin ansible_password: "{{ vault_ansible_password }}" ansible_become: yes ansible_become_method: enable ansible_become_password: "{{ vault_enable_password }}" ansible_command_timeout: 30 ``
Gotcha: If you set ansible_connection: ssh (the default), network modules will fail because they expect network_cli. Always explicitly set it.
Verification: ``bash ansible-inventory --host switch1 --yaml | grep ansible_connection # Should output: ansible_connection: ansible.netcommon.network_cli ``
connection: local is deprecated for network modules. Using it will produce a warning in Ansible 2.9 and will be removed in a future release. Migrate all playbooks to network_cli.connection: local and delegate_to: localhost. It worked fine for 50 switches but failed with 200 because it opened 200 SSH connections simultaneously, overwhelming the control node. Switching to network_cli solved it.ansible.netcommon.network_cli for network modules; never use local or default ssh.2. Cisco IOS: ios_command vs ios_config
The two most common Cisco IOS modules are cisco.ios.ios_command and cisco.ios.ios_config. They serve different purposes:
- ios_command: Sends arbitrary CLI commands and returns output. Not idempotent. Use for
showcommands only. - ios_config: Manages configuration sections idempotently. It reads the running config, compares it with the desired state, and applies only the necessary changes.
Example: Correct usage ``yaml - name: Configure VLAN 100 on interface cisco.ios.ios_config: lines: - vlan 100 parents: - interface GigabitEthernet0/1 ` This will only apply vlan 100` if it's not already present under the interface.
Example: Wrong usage (causes flap) ``yaml - name: BAD - using ios_command for config cisco.ios.ios_command: commands: - interface GigabitEthernet0/1 - vlan 100 `` This sends the commands blindly, causing the switch to re-apply VLAN 100, which resets the interface.
Idempotency check: Use --diff flag to see what changes Ansible will make: ``bash ansible-playbook -i inventory playbook.yml --diff --check ``
The --check mode with ios_config will simulate the change without applying it.
ios_command with commands: 'no vlan 100' to remove a VLAN. That command fails if the VLAN doesn't exist, causing the playbook to abort. ios_config with lines: 'no vlan 100' handles this gracefully.ios_command only for show commands; use ios_config for all configuration changes to ensure idempotency.3. Network-Agnostic Modules: cli_config and cli_command
For multi-vendor environments, Ansible provides network-agnostic modules: ansible.netcommon.cli_config and ansible.netcommon.cli_command. These modules rely on the ansible_network_os variable to determine the correct CLI syntax.
Example: cli_config ``yaml - name: Set hostname using agnostic module ansible.netcommon.cli_config: config: "hostname {{ inventory_hostname }}" ` This works on Cisco IOS, Junos, EOS, and NXOS as long as ansible_network_os` is set correctly.
Example: cli_command ``yaml - name: Show version ansible.netcommon.cli_command: command: show version register: version_output ``
Limitations: - cli_config does not support structured config; it sends raw text. For idempotent structured config, use vendor-specific modules. - cli_command does not handle privilege escalation automatically; you may need to include enable in the command string.
When to use: - Quick ad-hoc commands across vendors. - When you don't have vendor-specific collections installed. - For read-only operations (show commands).
Gotcha: The cli_config module uses the configure terminal command on Cisco devices. If you send a command that requires enable mode, you must set ansible_become: yes.
ansible.netcommon collection must be installed: ansible-galaxy collection install ansible.netcommon.cli_config to push a banner message across 500 switches of different vendors. It worked perfectly until we hit an old IOS that required banner motd ^ instead of banner motd #. We had to fall back to vendor-specific modules.4. NAPALM Integration via community.network
NAPALM (Network Automation and Programmability Abstraction Layer with Multivendor support) provides a unified API for network devices. The community.network.napalm_* modules (e.g., napalm_cli, napalm_config, napalm_get_facts) use NAPALM under the hood.
Setup: ``bash pip install napalm==4.1.0 ansible-galaxy collection install community.network ``
Inventory variables: ``yaml ansible_connection: community.network.napalm ansible_network_os: cisco.ios.ios # or junipernetworks.junos.junos, etc. napalm_platform: ios # must match napalm's platform name ``
Example: Get facts ``yaml - name: Gather facts via NAPALM community.network.napalm_get_facts: filter: ['facts', 'interfaces'] register: napalm_facts ``
Example: Configure using NAPALM ``yaml - name: Configure using NAPALM community.network.napalm_config: replace: yes config: | interface GigabitEthernet0/1 description NAPALM managed ``
Advantages: - Works with devices that have API access (NETCONF, RESTCONF) – more reliable than CLI scraping. - napalm_get_facts returns structured data (JSON) instead of CLI text. - napalm_config supports replace and commit operations (important for Junos).
Disadvantages: - Requires NAPALM on the control node, not on the device. - Some platforms require additional libraries (e.g., junos-eznc for Junos). - Not all NAPALM methods are idempotent; napalm_config with replace is, but napalm_cli is not.
napalm --help for supported platforms. For unsupported devices, fall back to network_cli and vendor modules.junos_config sometimes got out of sync with the candidate config. NAPALM's commit method with confirm saved us from a bad commit that would have locked us out.5. Managing Junos: junos_config and junos_command
Juniper Junos devices use a different paradigm: candidate config and commit. The junipernetworks.junos.junos_config module handles this.
Example: Configure an interface ``yaml - name: Configure interface ge-0/0/0 junipernetworks.junos.junos_config: lines: - set interfaces ge-0/0/0 description "Ansible managed" - set interfaces ge-0/0/0 unit 0 family inet address 10.0.0.1/24 comment: "Updated by Ansible" ``
Commit options: ``yaml - name: Commit with confirm junipernetworks.junos.junos_config: lines: - set system hostname new-hostname commit: yes confirm: 5 # confirm in 5 minutes ``
Rollback: ``yaml - name: Rollback to previous config junipernetworks.junos.junos_config: rollback: 1 ``
Gotcha: Junos modules require ansible_network_os: junipernetworks.junos.junos and the junipernetworks.junos collection. Also, you need junos-eznc Python library on the control node: pip install junos-eznc.
Command module: ``yaml - name: Show interface status junipernetworks.junos.junos_command: commands: - show interfaces terse register: output ``
Idempotency: junos_config only applies changes that differ from the candidate config. Use --diff to see what will change.
junos_config for config changes; never use junos_command to send set commands directly. The config module properly handles the commit model.junos_command to send commit after a series of set commands. If the commit failed, we had no rollback. Now we always use junos_config which rolls back automatically on error.6. Managing Arista EOS: eos_config and eos_command
Arista EOS is similar to Cisco IOS but with some differences. The arista.eos.eos_config module is the primary config module.
Example: Configure VLAN ``yaml - name: Configure VLAN 200 arista.eos.eos_config: lines: - vlan 200 - name WEB_VLAN parents: - vlan 200 ``
Example: Configure interface ``yaml - name: Set interface description arista.eos.eos_config: lines: - description "Ansible managed" parents: - interface Ethernet1 ``
Command module: ``yaml - name: Show running config arista.eos.eos_command: commands: - show running-config register: run_cfg ``
Gotcha: Arista EOS uses enable mode like Cisco. Set ansible_become: yes and ansible_become_method: enable.
Idempotency: eos_config is idempotent. It compares the desired config with the running config. Use --diff to verify.
Collection: Install with ansible-galaxy collection install arista.eos.
> for user mode and # for privileged mode. The module handles this automatically.eos_command to send configure terminal and then vlan 200. It worked but was not idempotent. Switching to eos_config reduced run time by 50% because it only applied changes when needed.eos_config for configuration; it's idempotent and handles privilege escalation.7. Managing Cisco NXOS: nxos_config and nxos_command
Cisco NX-OS (Nexus) uses a syntax similar to IOS but with differences. The cisco.nxos.nxos_config module is the config module.
Example: Configure VLAN ``yaml - name: Configure VLAN 300 cisco.nxos.nxos_config: lines: - vlan 300 - name PROD_VLAN ``
Example: Configure interface ``yaml - name: Set interface description cisco.nxos.nxos_config: lines: - description "Ansible managed" parents: - interface Ethernet1/1 ``
Command module: ``yaml - name: Show version cisco.nxos.nxos_command: commands: - show version register: version ``
Gotcha: NX-OS requires feature commands for certain features (e.g., feature interface-vlan). You must ensure these are enabled before configuring related features.
Idempotency: nxos_config is idempotent. Use --diff to see changes.
Collection: ansible-galaxy collection install cisco.nxos.
feature interface-vlan, the module will fail. Add a task to enable the feature first.feature interface-vlan. The playbook failed with a cryptic error. Now we always have a prerequisite playbook that enables required features.8. Idempotency Challenges in Network Automation
Idempotency is the property that running a playbook multiple times produces the same result. In network automation, achieving idempotency is harder than on servers because:
- Config modules compare text, not structured data. A slight difference in whitespace or ordering can cause false positives.
- Stateful devices: Some commands have side effects (e.g.,
no shutdownon an already up interface). - Commit models: Junos requires explicit commit; if the module doesn't commit, the config is not applied.
- CLI drift: If someone manually changes the config, Ansible may not detect it if the module uses a cached version.
Best practices: - Always use config modules (*_config) not command modules. - Use --diff to verify what will change. - Run playbooks with --check first. - For critical changes, use --diff and manual review. - Use ansible_network_os correctly to ensure the right module is used.
Example: Idempotent VLAN config ``yaml - name: Ensure VLAN 100 exists cisco.ios.ios_config: lines: - vlan 100 - name TEST_VLAN parents: - vlan 100 `` This will only create VLAN 100 if it doesn't exist. If it exists, no change.
Non-idempotent example: ``yaml - name: BAD - always sets hostname cisco.ios.ios_command: commands: - configure terminal - hostname {{ inventory_hostname }} `` This sets the hostname every run, even if it's already correct.
changed=0 for all tasks. If not, your module is not idempotent.parents to specify the exact hierarchy.--check and --diff.9. Using ansible-vault for Device Credentials
Storing device credentials in plaintext is a security risk. ansible-vault encrypts sensitive data so it can be safely stored in version control.
Setup: ``bash ansible-vault create group_vars/all/vault.yml ` Inside the vault file: `yaml vault_ansible_password: cisco123 vault_enable_password: cisco ``
Reference in group_vars: ``yaml # group_vars/all/main.yml ansible_user: admin ansible_password: "{{ vault_ansible_password }}" ansible_become: yes ansible_become_method: enable ansible_become_password: "{{ vault_enable_password }}" ``
Running playbook: ``bash ansible-playbook -i inventory playbook.yml --ask-vault-pass ` Or use a vault password file: `bash echo 'my_vault_pass' > .vault_pass ansible-playbook -i inventory playbook.yml --vault-password-file .vault_pass ``
Best practices: - Use separate vault files for different environments (dev, prod). - Never commit the vault password to version control. - Use ansible-vault rekey to change passwords. - For automation (CI/CD), use a vault password file stored in a secrets manager (e.g., HashiCorp Vault, AWS Secrets Manager).
Gotcha: If you use ansible-vault encrypt on a file that already contains variables, ensure the variables are referenced correctly. Use {{ }} syntax.
vault.yml and placed in the same directory as the vars file. Ansible loads them automatically.ansible-vault and never store the vault password in the repository.10. Multi-Vendor Playbook Structure
Managing different vendors in one playbook requires careful structuring. Use group_vars per vendor type.
Inventory structure: `` production/ hosts.yml group_vars/ all.yml cisco_ios.yml juniper_junos.yml arista_eos.yml cisco_nxos.yml ``
hosts.yml example: ``yaml all: children: cisco_ios: hosts: switch1: ansible_host: 10.0.0.1 juniper_junos: hosts: switch2: ansible_host: 10.0.0.2 arista_eos: hosts: switch3: ansible_host: 10.0.0.3 cisco_nxos: hosts: switch4: ansible_host: 10.0.0.4 ``
group_vars/cisco_ios.yml: ``yaml ansible_network_os: cisco.ios.ios ansible_connection: ansible.netcommon.network_cli ``
Playbook example: ``yaml - name: Configure NTP on all devices hosts: all gather_facts: no tasks: - name: Set NTP server ansible.builtin.include_role: name: ntp_config ``
The role ntp_config would have tasks for each vendor using when: ansible_network_os == 'cisco.ios.ios' etc.
when: ansible_network_os == 'cisco.ios.ios' to run vendor-specific tasks.cli_config for NTP across vendors, but the syntax differed. We now use vendor-specific tasks with when conditions.ansible_network_os to conditionally run vendor-specific tasks.11. Error Handling and Rollback Strategies
Network changes can break connectivity. Implement rollback strategies.
Using any_errors_fatal: ``yaml - hosts: all any_errors_fatal: true tasks: - name: Critical change cisco.ios.ios_config: lines: - interface GigabitEthernet0/1 - shutdown `` If this task fails, the playbook stops immediately.
Using block and rescue: ``yaml - name: Change with rollback block: - name: Apply new config cisco.ios.ios_config: lines: - interface GigabitEthernet0/1 - shutdown backup: yes - name: Test connectivity wait_for: host: "{{ ansible_host }}" port: 22 timeout: 10 register: result rescue: - name: Rollback cisco.ios.ios_config: src: "{{ playbook_dir }}/backup/{{ inventory_hostname }}_config.cfg" ``
Using NAPALM rollback: ``yaml - name: Apply config with rollback community.network.napalm_config: replace: yes config: "{{ desired_config }}" commit_changes: yes commit_confirm: 5 # confirm in 5 minutes register: napalm_result - name: If connectivity lost, rollback community.network.napalm_config: rollback: "{{ napalm_result.candidate_cfg }}" when: connectivity_lost ``
Backup config: ``yaml - name: Backup running config cisco.ios.ios_config: backup: yes backup_options: filename: "{{ inventory_hostname }}_config.cfg" dir_path: /backup/ ``
Gotcha: The backup option only saves the config before changes; it doesn't automatically rollback on failure.
wait_for to test connectivity after changes and have a rollback plan.12. Debugging and Logging Best Practices
When things go wrong, you need detailed logs.
Enable verbose output: ``bash ansible-playbook -i inventory playbook.yml -vvv ` - -v: basic info - -vv: more details - -vvv: connection details - -vvvv`: SSH debugging
Enable logging: In ansible.cfg: ``ini [defaults] log_path = /var/log/ansible/network.log ``
Use register and debug: ```yaml - name: Show version cisco.ios.ios_command: commands: - show version register: version_output
- name: Print version
- ansible.builtin.debug:
- var: version_output.stdout_lines
- ```
Check module return values: ```yaml - name: Configure VLAN cisco.ios.ios_config: lines: - vlan 100 register: config_result
- name: Debug config result
- ansible.builtin.debug:
- var: config_result
- ```
Use --syntax-check: ``bash ansible-playbook -i inventory playbook.yml --syntax-check ``
Test with --check and --diff: ``bash ansible-playbook -i inventory playbook.yml --check --diff ``
Gotcha: --check may not work with all network modules; some will still apply changes. Test in a lab first.
/var/log/ansible/network.log.-vvvv we saw that the SSH connection was being closed due to a timeout. We increased ansible_command_timeout to 60 seconds.register to capture module output; always test with --check in a lab.The VLAN Flap That Took Down a Data Center
ios_command does not check current state; it sends the command verbatim. The switch interprets vlan 100 as a create command even if VLAN 100 exists, causing a momentary interface reset.ios_command with ios_config module using lines: 'vlan 100' and parents: 'interface GigabitEthernet0/1'.- Never use
ios_command(or any *command module) for configuration changes. - Use dedicated config modules that implement idempotency checks.
ssh -vvv user@device. If SSH works, ensure ansible_connection: ansible.netcommon.network_cli and ansible_network_os are set. Also check ansible_user and ansible_password are correct.enable mode. Set ansible_become: yes, ansible_become_method: enable, and ansible_become_password: <enable_secret> in group_vars.ios_config) not command modules. Check diff output: if it says 'after' matches 'before', the module thinks the config is already present. Use --diff flag to see what Ansible thinks is the current state.pip install napalm==4.1.0. Also set connection: community.network.napalm and provide napalm_platform (e.g., ios, junos).ssh -vvv -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null ansible@switch1.example.comansible-inventory --host switch1.example.com --yaml | grep ansible_connectionansible_connection: ansible.netcommon.network_cli and network_os is set.Key takeaways
ansible.netcommon.network_cli for SSH-based network automation; never use local.ios_config (and vendor-specific config modules) for configuration; never use ios_command for config changes.cli_config, cli_command) are useful for simple tasks but lack idempotency for complex configs.--check and --diff.ansible-vault and never store passwords in plaintext.ansible_network_os for conditional tasks.-vvv) and register to debug module behavior.--check.Common mistakes to avoid
6 patternsUsing `ios_command` for configuration changes
ios_config (or vendor-specific config module) with lines and parentsNot setting `ansible_connection: ansible.netcommon.network_cli`
ansible_connection: ansible.netcommon.network_cli in group_varsOmitting `ansible_become: yes` for enable mode
ansible_become: yes, ansible_become_method: enable, and provide ansible_become_passwordUsing `--check` with network modules assuming no changes
--check--checkStoring credentials in plaintext inventory
ansible-vault to encrypt credentials and reference them via variablesNot using `parents` in `ios_config` for interface config
parents to navigate to the correct configuration hierarchyInterview Questions on This Topic
What is the difference between `ios_command` and `ios_config`?
ios_command sends arbitrary CLI commands and returns output; it is not idempotent and should only be used for show commands. ios_config manages configuration sections idempotently by comparing the desired state with the running config and applying only the necessary changes.Frequently Asked Questions
20+ years shipping production infrastructure and CI/CD at scale. Written from production experience, not tutorials.
That's Ansible. Mark it forged?
10 min read · try the examples if you haven't