Ansible Inventory Management: Static, Dynamic, and Variable Precedence Pitfalls
Master Ansible inventory management: static/dynamic files, INI vs YAML, groups, host_vars, group_vars, inventory plugins, and variable precedence.
20+ years shipping production infrastructure and CI/CD at scale. Lessons pulled from things that broke in production.
Use ansible-inventory --list to verify inventory structure before running playbooks.
Static inventory files (INI or YAML) are fine for small environments; use dynamic inventory plugins for cloud or CMDB sources.
YAML format is preferred over INI for complex inventories due to better readability and nesting support.
Host variables in host_vars/ override group variables from group_vars/.
Group variable precedence: all groups → children groups → parent groups (lower group_vars dirs) → group_vars/all is lowest.
Dynamic inventory scripts must output JSON in the specific format expected by Ansible (e.g., _meta key for hostvars).
Variable precedence from inventory: host_vars > group_vars of host's groups (last group wins) > group_vars/all.
Common gotcha: ansible-playbook -i inventory without --diff may hide variable overrides; use -v to debug.
Imagine you're planning a large family reunion. You have a list of all relatives (the inventory), and you need to send each person a personalized invitation. Some details apply to everyone (like the date and location), some apply to entire families (like dietary preferences for the Smiths), and some are specific to individuals (like a special note for Aunt Carol). Ansible inventory is that list: you define hosts (people), groups (families), and variables (details). Static inventory is like a paper address book — you write everything down manually. Dynamic inventory is like a live phonebook that automatically updates when someone moves. Variable precedence is the rule that decides which detail wins if there's a conflict: a note written directly on a person's page beats a note on the family page, which beats a general note on the front cover.
I'll never forget the Tuesday morning when our production deployment started failing with cryptic SSH key errors. Our playbooks had run fine for months, but suddenly Ansible couldn't connect to a batch of new servers. The error was 'Permission denied (publickey)'. My first assumption was a key rotation gone wrong. But after an hour of head-scratching, I discovered the real culprit: a misconfigured dynamic inventory script that was returning an empty ansible_user for those hosts, causing Ansible to fall back to the current user (which had no access). That incident taught me the hard way how inventory management — especially variable precedence and inventory sources — can silently break deployments.
Ansible inventory is the backbone of configuration management. It defines which hosts to manage, how to group them, and what variables apply. Without a solid understanding of inventory, you'll run into mysterious failures, inconsistent variable resolution, and maintenance nightmares. This article covers everything a beginner needs: static and dynamic inventory files, INI vs YAML formats, groups and children, host_vars and group_vars directories, inventory plugins, and the critical rules of variable precedence across inventory.
By the end, you'll know how to structure your inventory for production, debug common issues, and avoid the pitfalls that tripped me up. We'll use real commands and production-grade examples throughout.
Static Inventory Files: INI vs YAML Format
Static inventory files are the simplest way to define your infrastructure. Ansible supports two formats: INI and YAML. INI is the legacy format, but YAML is now recommended for complex inventories due to its nesting capabilities and readability.
INI Format Example (inventory.ini): ```ini [webservers] web1.example.com ansible_user=deploy web2.example.com
[dbservers] db1.example.com
[production:children] webservers dbservers
[webservers:vars] http_port=80 ```
YAML Format Example (inventory.yaml): ``yaml all: children: webservers: hosts: web1.example.com: ansible_user: deploy web2.example.com: vars: http_port: 80 dbservers: hosts: db1.example.com: production: children: webservers: dbservers: ``
Production Insight: INI format can lead to subtle bugs when using :vars and :children syntax incorrectly. For example, forgetting the :children suffix on a group that contains other groups will silently ignore the group membership. YAML eliminates this ambiguity. In a recent migration, I found a stale INI inventory where a group was misspelled in :children, causing 20 servers to be excluded from deployments for months.
Key Takeaway: For any inventory with more than 10 hosts or nested groups, use YAML format to avoid parsing errors and improve maintainability.
:children exactly match existing group names. Ansible does not warn about missing groups; it simply ignores the line.[newgroup] instead of [newgroup:children] for a parent group. The hosts were listed but never included in the parent. The fix was to switch to YAML and add a CI linting step with ansible-inventory --list.Groups, Children, and Group Hierarchy
Groups are the primary way to organize hosts in Ansible. A group can contain hosts, other groups (via children), or both. Group hierarchy allows you to apply variables to a set of hosts efficiently.
Defining Groups and Children: ``yaml all: children: us_east: children: webservers: hosts: web-east-1: web-east-2: dbservers: hosts: db-east-1: us_west: children: webservers: hosts: web-west-1: dbservers: hosts: db-west-1: ``
Group Resolution: Ansible flattens the group hierarchy at runtime. A host belongs to all groups it is directly or indirectly a member of. For example, web-east-1 belongs to webservers, us_east, and implicitly all.
Variable Inheritance: Variables defined on a group apply to all hosts in that group and its children. If a variable is defined on multiple groups in the hierarchy, the last group (in alphabetical order) wins for that host. This is often confusing; see the variable precedence section.
Production Insight: In a multi-region deployment, we used nested groups for regions and tiers. The problem was that we defined ntp_server on both us_east and webservers groups. Hosts in us_east got the us_east value, but hosts in us_west got the webservers value because us_west didn't define it. The fix was to define region-specific variables only on region groups and tier-specific variables only on tier groups, avoiding overlap.
Key Takeaway: Design group hierarchy to minimize variable conflicts. Use group_vars/all for truly global defaults, and override only at the most specific level needed.
ansible-inventory -i inventory --graph to visualize the group hierarchy. This helps debug unexpected group membership.dc1 and dc2 as children of all. A host was moved from dc1 to dc2 but the old group_vars file for dc1 was not cleaned up. The host still picked up variables from dc1 because it was still listed in that group. The fix was to remove the host from the old group in the inventory file.ansible-inventory --graph to verify group membership after any inventory change.host_vars and group_vars Directories
Ansible automatically loads variables from host_vars/ and group_vars/ directories located relative to the inventory file or playbook directory. These directories contain YAML files named after the host or group.
Directory Structure Example: `` production/ inventory.yaml host_vars/ web1.example.com.yaml db1.example.com.yaml group_vars/ all.yaml webservers.yaml dbservers.yaml us_east.yaml ``
File Naming: The file name must match the hostname (for host_vars) or group name (for group_vars) exactly, including domain suffix. For example, web1.example.com.yaml for host web1.example.com.
Variable Loading Order: For a given host, Ansible loads variables in this order: 1. group_vars/all 2. group_vars of parent groups (alphabetically) 3. group_vars of the host's immediate groups (alphabetically) 4. host_vars/<hostname>
Later files override earlier ones. This means host_vars always wins over any group_vars, and within group_vars, the last group alphabetically wins.
Production Insight: We once had a variable app_port defined in group_vars/all.yaml as 8080, in group_vars/webservers.yaml as 80, and in host_vars/web1.example.com.yaml as 3000. The host web1.example.com got 3000, as expected. But another host web2.example.com (no host_vars) got 80, which was correct. However, we also had a group_vars/production.yaml that defined app_port: 9090. Because production was a parent group of webservers, and production came alphabetically after webservers, it actually overrode webservers? No, the loading order is parent groups first, then immediate groups. Since production is a parent, its variables are loaded before webservers, so webservers wins. This is a common point of confusion.
Key Takeaway: To avoid confusion, use group_vars/all for defaults, and override in specific group_vars or host_vars. Avoid defining the same variable in multiple group_vars at the same level.
ansible_host IP. The IP was defined in group_vars/all but overridden in group_vars/datacenter_a. However, the host belonged to both datacenter_a and datacenter_b groups. Because datacenter_b came alphabetically after datacenter_a, its group_vars loaded last and overrode the IP. The fix was to ensure only one group defined that variable for the host.Dynamic Inventory: Scripts and Plugins
Dynamic inventory sources generate host lists at runtime from external systems like cloud providers, CMDBs, or custom databases. Ansible supports two approaches: inventory scripts (executable files that output JSON) and inventory plugins (Python modules). Plugins are the modern, preferred method.
Inventory Scripts: A script must be executable and accept --list (return all hosts) and --host <hostname> (return variables for a specific host). The output JSON must include a _meta key with hostvars for all hosts to avoid multiple script calls.
Example minimal script output: ``json { "_meta": { "hostvars": { "web1": { "ansible_host": "10.0.0.1", "ansible_user": "deploy" } } }, "webservers": { "hosts": ["web1"] } } ``
Inventory Plugins: Built-in plugins for AWS (aws_ec2), GCP (gcp_compute), Azure (azure_rm), and many more. They are configured in YAML files and are more efficient than scripts.
Example `aws_ec2.yaml`: ``yaml plugin: aws_ec2 regions: - us-east-1 filters: tag:Environment: production hostnames: - tag:Name keyed_groups: - key: tags.Environment prefix: env - key: tags.Role prefix: role compose: ansible_host: public_ip_address ``
Production Insight: In the incident I mentioned earlier, our custom Python script had a bug: it didn't include ansible_user for hosts in a specific AZ. The script passed --list validation but returned empty strings. The fix was to add a validation step in CI that runs ansible-inventory --list and checks for required variables using jq.
Key Takeaway: Prefer inventory plugins over custom scripts for cloud providers. They are maintained by Ansible and include built-in caching and error handling.
cache: yes and cache_plugin: jsonfile in the plugin configuration, and configure fact_caching_timeout in ansible.cfg.constructed plugin with a static inventory file that was periodically refreshed via a cron job, avoiding real-time API calls.ansible-inventory --list and validate the output format, especially the _meta key.Variable Precedence Across Inventory
Understanding variable precedence is critical to avoid surprises. Ansible merges variables from multiple sources in a specific order. For inventory-related sources, the order from lowest to highest priority is: 1. group_vars/all 2. group_vars of parent groups (alphabetically) 3. group_vars of child groups (alphabetically) 4. host_vars/<hostname> 5. Playbook group_vars/ and host_vars/ (if using vars_files or include_vars) 6. Extra vars (-e)
Key Rule: Host variables override group variables. Among group variables, the last group alphabetically that the host belongs to wins. Note that groups are processed breadth-first: all parent groups (from root to leaf) are loaded first, then child groups. But within each level, alphabetical order applies.
Example: Host web1 belongs to groups webservers and us_east. us_east is a parent of webservers. The order of loading: - group_vars/all - group_vars/us_east (parent) - group_vars/webservers (child) - host_vars/web1
If group_vars/us_east and group_vars/webservers both define the same variable, webservers wins because it's a child and loaded later.
Production Insight: I once saw a team define ansible_user in group_vars/all as centos, but also in group_vars/webservers as ec2-user. They assumed all would be overridden, but because webservers is a child, it did override. However, a host that belonged to both webservers and dbservers (where dbservers came alphabetically after webservers) got dbservers's ansible_user if defined. This caused confusion. The fix was to use host_vars for hosts that needed a specific user.
Key Takeaway: To predict variable values, trace the group hierarchy and alphabetical order. Use ansible-inventory --host <hostname> --export to see the final resolved variables.
group_vars/all is lower than group_vars of any named group.app_version defined in group_vars/all as 1.0, in group_vars/production as 2.0, and in group_vars/canary as 3.0. A host in both production and canary groups got 3.0 because canary comes after production alphabetically. This was intentional for canary testing, but a new engineer assumed production would win and was confused.Inventory Plugins: Built-in and Custom
Ansible provides over 20 inventory plugins for various sources: cloud providers, databases, file systems, and more. They are configured in YAML files and are more reliable than custom scripts.
Common Built-in Plugins: - aws_ec2: AWS EC2 instances - azure_rm: Azure VMs - gcp_compute: Google Compute Engine - vmware_vm_inventory: VMware VMs - constructed: Build groups and variables from existing inventory - ini: Parse INI format (static) - yaml: Parse YAML format (static)
Using a Plugin: Create a YAML file (e.g., aws_ec2.yaml) with the plugin configuration and use it as the inventory source: ``bash ansible-inventory -i aws_ec2.yaml --list ``
Custom Inventory Plugins: You can write your own plugin by subclassing BaseInventoryPlugin and implementing . This is advanced but gives full control.parse()
Production Insight: We migrated from a custom script to the aws_ec2 plugin and saw immediate benefits: built-in caching, better error messages, and automatic handling of pagination. However, we had to adjust our group naming because the plugin uses keyed_groups which creates groups with prefixes like env_production. Our playbooks expected group names like production. We added a custom keyed_groups mapping to match.
Key Takeaway: Use inventory plugins for cloud sources. They are well-tested and reduce maintenance burden.
ansible-doc -t inventory -l. Get plugin-specific help: ansible-doc -t inventory aws_ec2.constructed plugin to create groups based on hostname patterns. For example, all hosts starting with 'web' were added to a webservers group. This eliminated the need to maintain a static list. The configuration was simple:constructed plugin is powerful for dynamically assigning groups based on host variables or names.Best Practices for Inventory File Organization
Organizing inventory files for production is about scalability, security, and maintainability. Here are patterns I've used in large deployments.
1. Separate Environments: Use different inventory directories for dev, staging, production. `` inventories/ production/ inventory.yaml host_vars/ group_vars/ staging/ inventory.yaml host_vars/ group_vars/ ``
2. Use group_vars/all for Global Defaults: Put common variables like ntp_server, dns_server, and ansible_user here. Override in specific groups only when necessary.
3. Keep Secrets Out of Inventory: Use Ansible Vault for sensitive variables. Store encrypted files in group_vars/all/vault.yaml or use ansible-vault encrypt on individual files.
4. Use Dynamic Inventory for Cloud: For cloud environments, use the appropriate plugin. For on-prem, use static YAML inventory with version control.
5. Validate Inventory in CI: Add a CI step that runs ansible-inventory --list and checks for required variables using a script or jq.
Production Insight: In one project, we had a single monolithic inventory file with hundreds of hosts. It became unmanageable. We split it into environment-specific directories and used ansible-inventory --export to generate a combined view for debugging. The change reduced deployment errors by 40%.
Key Takeaway: Organize inventory by environment and use group_vars/all for defaults. Validate inventory in CI.
ansible-vault to encrypt any sensitive variables in inventory files. Store the vault password securely (e.g., in a password manager or CI secret).group_vars/all. The fix was to use Vault and add a pre-commit hook that scans for potential secrets using git-secrets.Using ansible-inventory Command for Debugging
The ansible-inventory command is your best friend for debugging inventory issues. It can list, graph, export, and validate inventory.
Common Usage: ```bash # List all hosts with groups ansible-inventory -i inventory.yaml --list
# Graph group hierarchy ansible-inventory -i inventory.yaml --graph
# Export variables for a specific host ansible-inventory -i inventory.yaml --host web1.example.com --export
# Validate inventory (check for syntax errors) ansible-inventory -i inventory.yaml --list > /dev/null ```
Example Output of --graph: `` @all: |--@ungrouped: |--@us_east: | |--@webservers: | | |--web-east-1 | | |--web-east-2 | |--@dbservers: | | |--db-east-1 |--@us_west: | |--@webservers: | | |--web-west-1 | |--@dbservers: | | |--db-west-1 ``
Production Insight: When debugging the SSH user incident, I ran ansible-inventory --host newserver --export and saw ansible_user: "". That immediately pointed to the dynamic inventory script. Without this command, I would have wasted hours checking SSH keys.
Key Takeaway: Use ansible-inventory --list and --host to inspect resolved inventory before running playbooks.
--list output includes _meta with hostvars. Use jq to filter: ansible-inventory -i inventory.yaml --list | jq '._meta.hostvars'.ansible-inventory --list to jq to quickly check variables across hosts. For example: ansible-inventory -i prod --list | jq '._meta.hostvars | to_entries[] | {host: .key, user: .value.ansible_user}'.ansible-inventory commands; they are essential for troubleshooting.Common Inventory Gotchas with INI Format
Despite being simpler, INI format has several pitfalls that can cause silent failures.
Gotcha 1: Missing :children suffix ``ini [production] webservers dbservers ` This creates a group production with hosts named webservers and dbservers, not the groups themselves. To include groups, you need: `ini [production:children] webservers dbservers ``
Gotcha 2: Whitespace in hostnames Trailing spaces after hostnames can cause Ansible to fail to connect. Always trim whitespace.
Gotcha 3: Case sensitivity Group names are case-sensitive. [Webservers] and [webservers] are different groups.
Gotcha 4: Variables with spaces In INI, variables cannot have spaces in values without quotes. Use ansible_user=my user fails; use ansible_user="my user".
Production Insight: We once had a host that was intermittently unreachable. The cause was a trailing space after the hostname in the INI file. Ansible parsed the hostname as web1 (with space) and tried to resolve that hostname, which failed. The fix was to add a linting rule to check for trailing whitespace.
Key Takeaway: If you must use INI, validate with ansible-inventory --list and use a linter like ansible-lint.
ini plugin may be removed in a future version of Ansible.[db:children] didn't work. He had written [db:child] (singular). Ansible silently ignored it. The fix was to use YAML where such errors are caught at parse time.Variable Precedence: Inventory vs Other Sources
Inventory variables are just one source. The full precedence from lowest to highest includes: 1. Role defaults (roles/role/defaults/main.yml) 2. Inventory vars (group_vars/all, group_vars/, host_vars/) 3. Playbook vars (vars: in play) 4. vars_files and include_vars 5. Role vars (roles/role/vars/main.yml) 6. Block vars (only for tasks in block) 7. Task vars (only for that task) 8. register variables 9. set_fact 10. Extra vars (-e)
Key Point: Inventory variables are relatively low in precedence. They can be overridden by playbook vars and extra vars. This is intentional: playbooks should be able to override inventory defaults for a specific run.
Production Insight: We had a scenario where we wanted to run a playbook with a different ansible_user temporarily. Using -e ansible_user=tempuser worked because extra vars have highest precedence. However, we forgot to remove the -e flag in a CI pipeline, and it overrode the inventory variable for all subsequent runs. The fix was to use --extra-vars only in ad-hoc commands, not in CI.
Key Takeaway: Be aware that extra vars override inventory vars. Use them sparingly in automation.
app_port: 8080 in its vars section. The inventory had app_port: 3000 for that host. The playbook's var took precedence, causing the application to start on the wrong port. The fix was to remove the var from the playbook and rely on inventory.Using Host Variables for Overrides
Host variables are the most specific inventory-level variables. They are defined in host_vars/<hostname>.yaml and override any group variables.
When to Use Host Variables: - Per-host secrets (e.g., ansible_become_password) - Unique configuration (e.g., a specific http_port for a load balancer) - Overriding group defaults for a specific host
Example host_vars/web1.example.com.yaml: ``yaml ansible_host: 10.0.0.1 ansible_user: deploy http_port: 3000 ``
Best Practices: - Keep host variables minimal. Prefer group variables for shared config. - Use host variables only when a host truly differs from its group. - Consider using dictionary variables to organize multiple overrides.
Production Insight: We had a host that needed a different SSH port (2222) because of firewall restrictions. We set ansible_port: 2222 in host_vars. However, we also had a group variable ansible_port: 22 in group_vars/all. The host variable correctly overrode it. The mistake was that we forgot to update the host variable when the firewall changed, causing a deployment failure. We now use a dynamic inventory plugin that reads the port from a CMDB.
Key Takeaway: Use host variables sparingly and document why each override exists.
web1.example.com.yaml for host web1.example.com.web-01 in inventory but the file was named web01.yaml. Ansible silently ignored it. The fix was to ensure file names match exactly, and we added a CI check that compares hostnames in inventory to files in host_vars/.Common Mistakes and How to Avoid Them
Based on years of production experience, here are the most common inventory mistakes.
1. Forgetting to update inventory after infrastructure changes: When servers are added or removed, the inventory becomes stale. Use dynamic inventory to avoid this.
2. Overusing group_vars/all: Putting all variables there defeats the purpose of group-specific overrides. Use it only for true global defaults.
3. Not using ansible-inventory to validate: Always run ansible-inventory --list after changes.
4. Mixing INI and YAML in the same inventory: This can cause parse errors. Stick to one format per inventory directory.
5. Ignoring variable precedence: Assuming that inventory variables always take effect without considering playbook vars or extra vars.
6. Hardcoding credentials in inventory: Use Ansible Vault or environment variables.
Key Takeaway: Validate your inventory with ansible-inventory and follow the principle of least privilege for variables.
aws_ec2 plugin, which automatically reflects the current state.The Case of the Missing SSH User
ansible_user for hosts in a specific availability zone. The script output JSON with an empty string for ansible_user for those hosts. Ansible's variable precedence then used the empty string, overriding the default ansible_user from group_vars/all.ansible_user for all hosts. Added validation in CI to run ansible-inventory --list and check for missing required variables. Also added a default ansible_user in group_vars/all as a fallback.- Always validate dynamic inventory output before using it.
- Use
ansible-inventory --listto inspect the resolved inventory, and set sensible defaults ingroup_vars/allfor critical variables likeansible_user.
ansible-inventory --graph to visualize group hierarchy. Check inventory file paths and ensure dynamic scripts are executable. Use -i with explicit path.ansible-inventory --host <hostname> --export to see resolved variables for that host. Check variable precedence: host_vars > group_vars (last group wins) > group_vars/all. Use debug module in playbook to print variable._meta key for hostvars. Use ansible-inventory -i script.py --list to see parsing errors.--limit flag and inventory group membership. Run ansible all -i inventory --list-hosts to see which hosts are matched. Verify group names in inventory file.ansible-inventory -i /path/to/inventory --listls -la /path/to/inventoryKey takeaways
ansible-inventory --list and --graph to validate inventory before running playbooks.Common mistakes to avoid
6 patternsUsing INI format without :children suffix for group membership
Putting all variables in group_vars/all
Not validating inventory with ansible-inventory
Hardcoding secrets in inventory files
Assuming alphabetical order of groups doesn't matter
Using a dynamic inventory script that doesn't output _meta key
Interview Questions on This Topic
What is the difference between INI and YAML inventory formats? When would you use each?
Frequently Asked Questions
20+ years shipping production infrastructure and CI/CD at scale. Lessons pulled from things that broke in production.
That's Ansible. Mark it forged?
12 min read · try the examples if you haven't