Advanced 12 min · 2026-06-21

Ansible Dynamic Inventory: AWS EC2 Plugin Gotchas and Production Patterns

Master Ansible dynamic inventory with AWS EC2 plugin.

N
Naren Founder & Principal Engineer

20+ years shipping production infrastructure and CI/CD at scale. Written from production experience, not tutorials.

Follow
Production
production tested
June 21, 2026
last updated
1,596
articles · all by Naren
 ● Production Incident 🔎 Debug Guide ⚙ Triage Commands
Quick Answer

Use inventory plugins (like amazon.aws.aws_ec2) over legacy scripts; they are faster, cache-aware, and natively integrated. Configure plugin with plugin: amazon.aws.aws_ec2 in a YAML file named aws_ec2.yml. Tagging strategy: use keyed_groups to create groups from tags (e.g., keyed_groups: [{prefix: tag, key: tags.Name}]). Always enable caching with cache_plugin: jsonfile and set cache_timeout: 3600 to avoid API rate limits. Test inventory with ansible-inventory -i aws_ec2.yml --list to verify groups and variables. For GCP, use gcp_compute plugin; for Azure, use azure_rm plugin. Both support keyed_groups and caching. Avoid using ansible-inventory --graph for debugging; use --list with --export for clean output. Set compose to add custom variables, e.g., compose: { ansible_host: public_ip_address }.

✦ Definition~90s read
What is Ansible Dynamic Inventory?

Ansible dynamic inventory is a system that automatically discovers and organizes hosts (servers, containers, network devices) from external sources like cloud providers, CMDBs, or databases. Instead of a static file listing hosts, you provide a script or plugin that returns a JSON representation of the inventory at runtime.

Imagine you have a huge box of Lego bricks, and you need to build a specific model.

Inventory plugins are the modern way. They are Python modules that ship with Ansible or from collections (like amazon.aws). You configure them in a YAML file with a plugin key. The plugin fetches data from the source (e.g., AWS API), applies transformations (like grouping by tags), and returns a structured inventory.

Plugins support caching, which is critical for production to avoid API rate limits and speed up execution.

The key difference from scripts: plugins are first-class citizens in Ansible. They can be used in ansible-inventory, ansible-playbook, and even in ansible.cfg via inventory_plugins. They also support keyed_groups, compose, and strict mode for validation. Scripts are legacy and should be avoided for new projects.

Plain-English First

Imagine you have a huge box of Lego bricks, and you need to build a specific model. Instead of digging through the box every time, you create a map that tells you exactly where each brick is and which set it belongs to. Ansible dynamic inventory is that map for your cloud servers. It automatically finds all your servers (like EC2 instances) and organizes them into groups based on tags (like 'web' or 'database'). This way, you can run commands on all web servers at once without manually listing them. The inventory plugin is like a smart assistant that updates the map every time you ask, but it can also remember the map for a while to save time.

I still remember the Monday morning when our deployment pipeline failed for the second week in a row. The playbook ran fine on Friday, but on Monday it couldn't find any EC2 instances. The error was cryptic: SKIPPED: No hosts matched. After an hour of head-scratching, I discovered that the legacy dynamic inventory script we were using had a hardcoded AWS region that no longer existed. That was the day I decided to switch to inventory plugins.

Historically, Ansible dynamic inventory was done via executable scripts (Python, shell) that output JSON. They worked, but were brittle: no caching, no native error handling, and each script was a snowflake. The Ansible team introduced inventory plugins in Ansible 2.4, and they became the recommended approach by 2.9. Plugins are faster, support caching out of the box, and integrate deeply with Ansible's group and variable logic.

This article covers everything you need to use dynamic inventory in production: from the AWS EC2 plugin configuration to tagging strategies, caching, and debugging. We'll also touch on GCP and Azure plugins. I'll share real incidents and patterns from running Ansible at scale across hundreds of instances.

Inventory Plugins vs Scripts: Why Plugins Win in Production

Legacy dynamic inventory scripts are standalone executables (Python, shell, etc.) that output JSON to stdout. They work, but they have no standard way to handle caching, errors, or configuration. Ansible 2.4 introduced inventory plugins, which are Python modules that integrate with Ansible's internals. They support caching, configuration via YAML, and advanced features like keyed_groups and compose.

In production, plugins are superior
  • Caching: Built-in support via cache_plugin (e.g., jsonfile, redis). Scripts require manual caching.
  • Error handling: Plugins raise Ansible errors with clear messages. Scripts often fail silently.
  • Performance: Plugins use persistent connections and batch API calls. Scripts may make many individual API calls.
  • Maintainability: Configuration is YAML, not code. No need to manage dependencies for each script.

To use a plugin, create a YAML file (e.g., aws_ec2.yml) with: ``yaml plugin: amazon.aws.aws_ec2 regions: - us-east-1 - us-west-2 filters: instance-state-name: running keyed_groups: - prefix: tag key: tags.Name compose: ansible_host: public_ip_address ` Then reference it with -i aws_ec2.yml`.

Migration: If you have legacy scripts, wrap them in a plugin using ansible.builtin.script plugin? No, that's not a plugin. Instead, rewrite the logic as a custom plugin or use the constructed plugin to add groups. But honestly, just use the cloud-specific plugin.

Legacy scripts are deprecated
Ansible 2.9+ deprecates the script inventory source. Always use plugins for new projects. Scripts may break in future versions.
Production Insight
We had a custom script that took 45 seconds to run because it made individual API calls for each of 200 instances. Switching to the aws_ec2 plugin with caching reduced this to 2 seconds. The script also had a bug that missed instances in certain regions because it hardcoded the endpoint URL.
Key Takeaway
Use inventory plugins over scripts: they are faster, cache-aware, and easier to maintain.

Configuring amazon.aws.aws_ec2 Plugin for Production

The amazon.aws.aws_ec2 plugin is the gold standard for AWS dynamic inventory. Install the collection: ansible-galaxy collection install amazon.aws. Then create a YAML file, typically named aws_ec2.yml.

Key configuration options
  • plugin: amazon.aws.aws_ec2 (required)
  • regions: list of AWS regions. Use ["*"] for all regions (but be careful with API limits).
  • filters: dict of EC2 filters (e.g., {"instance-state-name": "running"}). Supports all EC2 API filters.
  • hostnames: list of hostname sources (e.g., ["dns-name", "private-dns-name", "ip-address"]). The first match becomes the inventory hostname.
  • keyed_groups: create groups from tags or attributes.
  • compose: set variables like ansible_host.
  • cache_plugin: enable caching (e.g., jsonfile).
  • cache_timeout: seconds to cache (e.g., 300).
  • strict: boolean to fail on invalid group names (default false).

Production example: ``yaml plugin: amazon.aws.aws_ec2 regions: - us-east-1 - eu-west-1 filters: instance-state-name: running tag:Environment: production hostnames: - dns-name - private-dns-name keyed_groups: - prefix: tag key: tags.Name - prefix: env key: tags.Environment compose: ansible_host: public_ip_address cache_plugin: jsonfile cache_timeout: 300 strict: false ``

Gotcha: The hostnames list order matters. If you use dns-name first, but the instance has no public DNS, it falls back to the next. Always include private-dns-name for VPC instances.

Authentication: The plugin uses boto3. Ensure AWS credentials are available via environment variables, IAM role, or ~/.aws/credentials. Use aws sts get-caller-identity to verify.

Use strict mode in development
Set strict: true to catch misconfigured keyed_groups or compose expressions early. In production, set to false to avoid failures from unexpected data.
Production Insight
Once, we set hostnames: ["dns-name"] but our instances were in a private subnet with no public DNS. Ansible created hosts with empty names, causing failures. We fixed it by adding private-dns-name as a fallback.
Key Takeaway
Always configure hostnames with fallbacks, and use compose to set ansible_host explicitly.

Tagging Strategy for Dynamic Groups with keyed_groups

Tags are the backbone of dynamic grouping. With keyed_groups, you can automatically create Ansible groups based on EC2 tags, instance attributes, or any metadata.

Syntax: ``yaml keyed_groups: - prefix: tag key: tags.Environment separator: '' ` This creates groups like tag_production, tag_staging. The prefix is prepended, then the tag value. Use separator: ''` to avoid double underscores.

Production tagging strategy: Use consistent tag keys across all instances. Common tags: - Name: instance name (often unique) - Environment: production, staging, development - Role: web, database, cache - Tier: frontend, backend - Project: project name

Then configure groups: ``yaml keyed_groups: - prefix: env key: tags.Environment separator: '' - prefix: role key: tags.Role - prefix: tag key: tags.Name parent_group: all ``

Nested groups: You can nest groups by combining tags. Use keyed_groups with parent_group to create hierarchy. Example: instances with tags.Environment=prod and tags.Role=web could be in groups env_prod and role_web. To group all prod web servers, use group: "{{ tags.Environment }}_{{ tags.Role }}" via the constructed plugin, but that's more complex.

Gotcha: Tag values with spaces or special characters become invalid Ansible group names. Set strict: false to ignore them, or sanitize with compose.

Best practice: Use lowercase tag values and avoid spaces. E.g., Environment: production not Environment: Production.

Group name limitations
Ansible group names must match [a-zA-Z_][a-zA-Z0-9_]*. Tags with hyphens or dots will cause errors if strict: true. Use separator: '_' or sanitize via compose.
Production Insight
We had a tag Environment: Production (US) with parentheses. This created a group tag_Production (US) which was invalid. Ansible failed silently, and the instances were not grouped. We fixed it by enforcing tag values to be alphanumeric only.
Key Takeaway
Enforce consistent, alphanumeric tag values to avoid invalid group names.

Advanced Grouping with keyed_groups and compose

The keyed_groups directive can create groups from any key returned by the plugin, not just tags. For example, you can group by instance type, region, or VPC ID.

``yaml keyed_groups: - prefix: instance_type key: instance_type - prefix: region key: placement.region ``

But keyed_groups only creates groups based on exact values. For more complex logic, use compose to create custom variables, then group on those.

Example: Group instances by whether they have a public IP. ``yaml compose: has_public_ip: public_ip_address is defined keyed_groups: - key: has_public_ip prefix: public ` This creates groups public_True and public_False`.

Another pattern: Create groups based on multiple tags. Use Jinja2 expressions: ``yaml compose: env_role: "{{ tags.Environment | default('unknown') }}_{{ tags.Role | default('unknown') }}" keyed_groups: - key: env_role prefix: '' ` This creates groups like production_web`.

Gotcha: compose runs after keyed_groups? Actually, compose is evaluated before keyed_groups, so you can use composed variables in keyed_groups keys. But be careful with order: compose sets host variables, then keyed_groups uses them.

Performance: Complex compose expressions are evaluated for each host. For large inventories (thousands of hosts), keep expressions simple. Avoid expensive filters like regex_replace.

Jinja2 in compose is powerful but dangerous
Expressions like {{ tags.Environment | default('unknown') }} can mask missing tags. Use | mandatory in development to catch missing tags.
Production Insight
We used compose to set ansible_user based on AMI owner: ansible_user: "{{ 'ec2-user' if 'amazon' in image_id else 'ubuntu' }}". This worked but was slow for 500 instances. We moved this logic to a playbook task instead.
Key Takeaway
Use compose for simple variable mappings; for complex logic, prefer playbook tasks to keep inventory fast.

Caching Inventory to Avoid API Rate Limits

Without caching, each ansible-inventory call hits the AWS API. For large environments, this can trigger rate limits (e.g., RequestLimitExceeded). Caching stores the inventory locally and refreshes it periodically.

Configure caching in the plugin YAML: ``yaml cache_plugin: jsonfile cache_timeout: 300 cache_connection: ~/.ansible/tmp/inventory_cache ` Or set globally in ansible.cfg: `ini [inventory] cache_plugin = jsonfile cache_timeout = 300 cache_connection = ~/.ansible/tmp/inventory_cache ``

Cache plugins: jsonfile (default), redis, memcached, sqlite. For single-controller setups, jsonfile is fine. For multi-controller, use redis to share cache.

Cache invalidation: The cache is invalidated after cache_timeout seconds. To force refresh, delete the cache directory: rm -rf ~/.ansible/tmp/inventory_cache/. Or run with --flush-cache: ``bash ansible-inventory -i aws_ec2.yml --list --flush-cache ``

Production pattern: Set cache_timeout based on how often your infrastructure changes. For auto-scaling groups, set to 60 seconds. For static environments, 300 seconds is fine.

Gotcha: If using cache_plugin: jsonfile, ensure the cache directory is writable. Also, the cache file can become corrupt if multiple processes write simultaneously. Use redis for concurrent access.

Debugging cache: Check the cache file: ``bash cat ~/.ansible/tmp/inventory_cache/ansible_inventory_cache | jq '. | keys' ``

Use --flush-cache in CI/CD
In CI/CD pipelines, always use --flush-cache to ensure fresh inventory. Otherwise, stale cache can cause deployments to fail.
Production Insight
We had a Jenkins pipeline that ran Ansible every 5 minutes. Without caching, we hit AWS API rate limits and got throttled. Setting cache_timeout: 120 (2 minutes) solved it. We also moved to redis cache to handle parallel jobs.
Key Takeaway
Always enable caching with a reasonable timeout to avoid API rate limits and speed up inventory generation.

Testing Dynamic Inventory with ansible-inventory --list

The ansible-inventory command is your best friend for debugging. Use --list to output the full inventory as JSON.

``bash ansible-inventory -i aws_ec2.yml --list | jq '."tag_Name_web"' ``

Useful flags
  • --list : output all hosts and groups.
  • --graph : output ASCII graph of group hierarchy (less detailed).
  • --export : output clean JSON without _meta (for external tools).
  • --flush-cache : ignore cache and refresh.
  • --debug : verbose logging (use with 2>&1 | grep error).

Testing specific groups: ``bash ansible-inventory -i aws_ec2.yml --list | jq '.env_production' ``

Testing host variables: ``bash ansible-inventory -i aws_ec2.yml --list | jq '._meta.hostvars["i-12345"]' ``

Using with a playbook: ``bash ansible-playbook -i aws_ec2.yml site.yml --list-hosts ``

Gotcha: The --list output includes _meta with hostvars. Use --export to omit it if you want a pure inventory JSON.

Automated testing: In CI, run: ``bash ansible-inventory -i aws_ec2.yml --list > /dev/null && echo "Inventory valid" ``

Common issues
  • Empty output: check credentials and filters.
  • Missing groups: check keyed_groups syntax.
  • Wrong hostnames: check hostnames list.

Pro tip: Pipe through jq to extract specific fields. For example, list all hostnames: ``bash ansible-inventory -i aws_ec2.yml --list | jq '._meta.hostvars | keys' ``

ansible-inventory vs ansible-playbook --list-hosts
Use ansible-inventory for debugging, ansible-playbook --list-hosts to verify playbook targeting. The latter respects playbook host patterns.
Production Insight
I once spent an hour debugging why a playbook skipped all hosts. Turns out, I had a typo in the filters section: instance-state-name: runnning (three n's). ansible-inventory --list returned empty, but the error was not obvious. Adding --debug showed the invalid filter.
Key Takeaway
Always run ansible-inventory --list after changing inventory configuration to catch errors early.

GCP Inventory Plugin: gcp_compute

For Google Cloud Platform, use the gcp_compute plugin from the google.cloud collection. Install: ansible-galaxy collection install google.cloud.

Configuration example (`gcp_compute.yml`): ``yaml plugin: gcp_compute projects: - my-project zones: - us-central1-a - us-east1-b filters: - status = RUNNING keyed_groups: - prefix: gcp key: labels.environment compose: ansible_host: networkInterfaces[0].accessConfigs[0].natIP hostnames: - name - networkInterfaces[0].networkIP cache_plugin: jsonfile cache_timeout: 300 ``

Differences from AWS
  • Uses labels instead of tags.
  • zones instead of regions.
  • filters use GCE filter syntax (e.g., status = RUNNING).
  • hostnames uses GCE instance properties.

Authentication: Use application default credentials or service account JSON file via GCP_SERVICE_ACCOUNT_FILE environment variable.

Gotcha: The gcp_compute plugin does not support hostnames with fallback like AWS. You must specify a list; the first match is used. If the instance has no public IP, natIP will be undefined, and the hostname will be empty.

Production pattern: Use compose to set ansible_host to the internal IP if public IP is missing: ``yaml compose: ansible_host: "{{ networkInterfaces[0].accessConfigs[0].natIP | default(networkInterfaces[0].networkIP) }}" ``

GCP labels are case-sensitive
Unlike AWS tags, GCP labels are case-sensitive. Use consistent casing (lowercase) to avoid missing groups.
Production Insight
We had a GCP project with 1000+ instances. The gcp_compute plugin was slow because it queried all zones. We added zones: [us-central1-a] to limit scope and enabled caching. The inventory generation time dropped from 30s to 3s.
Key Takeaway
Limit zones and projects to only what you need, and always enable caching for GCP inventory.

Azure Inventory Plugin: azure_rm

For Microsoft Azure, use the azure_rm plugin from the azure.azcollection collection. Install: ansible-galaxy collection install azure.azcollection.

Configuration example (`azure_rm.yml`): ``yaml plugin: azure_rm include_vm_resource_groups: - my-resource-group - another-rg auth_source: auto keyed_groups: - prefix: azure key: tags.environment - prefix: location key: location compose: ansible_host: public_ip_address | default(private_ip_address) hostnames: - name - private_ip_address cache_plugin: jsonfile cache_timeout: 300 ``

Differences
  • Uses include_vm_resource_groups or include_vmss to scope.
  • auth_source: auto uses Azure CLI or environment variables.
  • tags are key-value pairs (like AWS).
  • location is the Azure region.

Authentication: Use az login or service principal via AZURE_CLIENT_ID, AZURE_SECRET, AZURE_TENANT environment variables.

Gotcha: The azure_rm plugin can be slow for large subscriptions. Use include_vm_resource_groups to limit scope. Also, the plugin does not support hostnames fallback as elegantly; you may need to use compose.

Production pattern: Use tags to organize VMs. Example: ``yaml keyed_groups: - prefix: environment key: tags.Environment - prefix: role key: tags.Role ``

Performance: For large Azure environments, consider using azure_rm with cache_plugin: redis to share cache across controllers.

Azure inventory can be slow without scoping
Without include_vm_resource_groups, the plugin queries all resource groups in the subscription, which can take minutes. Always scope to specific resource groups.
Production Insight
We had an Azure subscription with 2000 VMs across 50 resource groups. The inventory took 4 minutes to generate. We scoped to only the resource groups containing production VMs (5 groups), reducing time to 20 seconds.
Key Takeaway
Scope Azure inventory to specific resource groups to avoid performance issues.

Combining Multiple Inventory Sources

In production, you often need hosts from multiple clouds or sources. Ansible supports multiple inventory sources by specifying a directory with multiple files, or by listing multiple -i flags.

Using a directory: Place all inventory YAML files in a directory (e.g., inventory/). Then run: ``bash ansible-playbook -i inventory/ site.yml `` Ansible will merge all inventories. Hosts with the same name are merged (variables from later sources override earlier ones).

Using multiple -i flags: ``bash ansible-playbook -i aws_ec2.yml -i gcp_compute.yml -i azure_rm.yml site.yml ``

Merging logic: Groups from all sources are combined. If a host appears in multiple sources, its variables are merged (last source wins). To avoid conflicts, ensure hostnames are unique across clouds (e.g., use instance ID or FQDN).

Using constructed plugin: The constructed plugin can add groups and variables based on existing inventory data. Example: ``yaml plugin: constructed strict: false keyed_groups: - prefix: cloud key: cloud_type compose: cloud_type: "{{ 'aws' if 'ec2' in group_names else 'gcp' }}" ` But this requires setting cloud_type` first.

Production pattern: Use separate inventory files per cloud, and a directory to combine them. Then use a playbook-level hosts: all to target all.

Hostname conflicts across clouds
If two clouds have a host with the same name (e.g., both have a host named 'web'), the later inventory source overwrites the earlier. Use unique naming like aws_web and gcp_web or use instance IDs.
Production Insight
We had a hybrid cloud environment with AWS and GCP. Both had a host named 'web'. The inventory merged them, and playbooks targeting 'web' ran on both, causing chaos. We renamed GCP hosts to include a 'gcp-' prefix via hostnames: ["gcp-{{ name }}"].
Key Takeaway
Ensure hostnames are unique across inventory sources to avoid unintended merges.

Custom Inventory Plugins: When and How

Sometimes the built-in plugins don't meet your needs (e.g., custom API, on-premise servers). You can write a custom inventory plugin. This is advanced but powerful.

Structure: A plugin is a Python module in a collection. It must inherit from ansible.plugins.inventory.BaseInventoryPlugin and implement parse(), verify_file(), and optionally get_option().

Minimal example: ```python from ansible.plugins.inventory import BaseInventoryPlugin

DOCUMENTATION = ''' name: my_custom plugin_type: inventory options: my_option: description: Example option required: true type: str '''

class InventoryModule(BaseInventoryPlugin): NAME = 'my_custom'

def verify_file(self, path): return path.endswith('.my.yml')

def parse(self, inventory, loader, path, cache=True): super().parse(inventory, loader, path, cache) self.set_options() # Fetch data from external source hosts = [{'name': 'host1', 'groups': ['web'], 'vars': {'ansible_host': '10.0.0.1'}}] for host in hosts: inventory.add_host(host['name']) for group in host['groups']: inventory.add_group(group) inventory.add_host_to_group(host['name'], group) for k, v in host['vars'].items(): inventory.set_variable(host['name'], k, v) ```

Production considerations
  • Implement caching similar to built-in plugins.
  • Use self.display.vvv() for debug logging.
  • Use self.get_option() to read configuration.
  • Test with ansible-inventory.
When to write a custom plugin
  • You have a custom CMDB.
  • You need to query an API that doesn't have a plugin.
  • You need complex logic not supported by compose.

Otherwise, stick with existing plugins.

Custom plugins require maintenance
Writing a plugin is fun, but you now own its bugs and compatibility. Use built-in plugins whenever possible.
Production Insight
We wrote a custom plugin for an on-premise virtualization platform. It worked, but when we upgraded Ansible from 2.9 to 2.12, the plugin broke due to internal API changes. We ended up migrating to the constructed plugin with a script that fed data via a static source.
Key Takeaway
Only write a custom plugin as a last resort; prefer built-in plugins with compose and keyed_groups.

Common Pitfalls and How to Avoid Them

Here are the most common mistakes I've seen with dynamic inventory:

1. Not setting ansible_host Without compose: { ansible_host: public_ip_address }, Ansible uses the hostname as the connection address. If the hostname is not resolvable, SSH fails.

2. Over-filtering Using filters that are too restrictive (e.g., tag:Environment: production) but forgetting to tag new instances. Always include a fallback group like ungrouped.

3. Ignoring cache Not enabling caching leads to slow runs and API rate limits. Always set cache_plugin and cache_timeout.

4. Using deprecated inventory_script Legacy scripts still work but are deprecated. They lack caching and error handling. Migrate to plugins.

5. Not testing with --list Skipping ansible-inventory --list leads to surprises in production. Always test after changes.

6. Group name collisions If two inventory sources create groups with the same name, they merge. This can cause unexpected host membership.

7. Missing dependencies For AWS, install amazon.aws collection and boto3 and botocore. For GCP, install google.cloud collection and google-auth. For Azure, install azure.azcollection and azure-cli.

8. Not handling missing tags If a tag is missing, keyed_groups will fail (if strict: true) or skip. Use default filter in compose to provide defaults.

Automate testing with CI
Add a CI step that runs ansible-inventory --list --flush-cache on every change to catch issues early.
Production Insight
A colleague once set ansible_host: private_ip_address but forgot that the controller was outside the VPC. All playbooks failed with timeout. We added a conditional: ansible_host: "{{ public_ip_address | default(private_ip_address) }}".
Key Takeaway
Always test inventory generation and connectivity with a simple ping module after configuration changes.

Production Deployment: Putting It All Together

Directory structure: `` inventory/ aws_ec2.yml gcp_compute.yml azure_rm.yml group_vars/ all.yml env_production.yml role_web.yml ``

ansible.cfg: ```ini [defaults] inventory = inventory/ host_key_checking = False

[inventory] cache_plugin = jsonfile cache_timeout = 300 cache_connection = ~/.ansible/tmp/inventory_cache ```

CI/CD pipeline: ```bash # Install collections ansible-galaxy collection install amazon.aws google.cloud azure.azcollection

# Clear cache and test ansible-inventory -i inventory/ --list --flush-cache > /dev/null

# Run playbook ansible-playbook -i inventory/ site.yml ```

Monitoring: Use ansible-inventory --list to export inventory to a monitoring system. For example, dump to JSON and push to Prometheus.

Security
  • Store credentials in a vault (e.g., Ansible Vault, HashiCorp Vault).
  • Use IAM roles for AWS, service accounts for GCP, and managed identities for Azure.
  • Never commit credentials to version control.
Scaling
  • For 1000+ hosts, use redis cache to share across controllers.
  • Use --forks in playbooks to parallelize.
  • Consider using ansible-pull for agent-based models.
Disaster recovery
  • Keep a static inventory as a fallback (e.g., inventory/static.yml) with critical hosts.
  • Test disaster recovery by running playbooks against static inventory.
Use ansible-inventory --export for external tools
The --export flag outputs inventory without _meta, suitable for feeding into monitoring or CMDBs.
Production Insight
In a large deployment, we had multiple Ansible controllers sharing the same cache via NFS. This caused cache corruption. We switched to redis cache, which solved the issue and improved performance.
Key Takeaway
Design your inventory setup with caching, security, and monitoring in mind from day one.
● Production incidentPOST-MORTEMseverity: high

The Missing EC2 Instance: A Caching Disaster

Symptom
Ansible showed hosts that were terminated hours ago. Running ansible-inventory --list returned old data.
Assumption
The engineer assumed the AWS API was slow to reflect changes.
Root cause
The inventory plugin had cache_timeout: 86400 (24 hours) and the cache file was never invalidated. The cache plugin was jsonfile with a persistent directory.
Fix
Set cache_timeout: 300 (5 minutes) and cleared the cache: rm -rf ~/.ansible/tmp/inventory_cache/.
Key lesson
  • Always set a reasonable cache timeout for dynamic inventories.
  • Cloud resources change frequently; treat cache as a speed optimization, not a source of truth.
Production debug guideSymptom → Root cause → Fix4 entries
Symptom · 01
ansible-inventory --list returns empty or no hosts
Fix
Check AWS credentials: aws sts get-caller-identity. Verify the plugin YAML file has correct regions or filters. Use --debug flag: ansible-inventory -i aws_ec2.yml --list --debug 2>&1 | grep -i error.
Symptom · 02
Hosts appear but are missing expected groups
Fix
Review keyed_groups configuration. Ensure tag keys match exactly (case-sensitive). Test with ansible-inventory -i aws_ec2.yml --list | jq '._meta.hostvars'.
Symptom · 03
Playbook runs but says 'No hosts matched'
Fix
Check if the inventory source is correctly referenced in ansible.cfg or -i flag. Run ansible-inventory --graph to see group hierarchy. Ensure ansible_host is set via compose.
Symptom · 04
API rate limiting errors (e.g., RequestLimitExceeded)
Fix
Enable caching: set cache_plugin: jsonfile and cache_timeout: 300. Increase max_retries in AWS config. Use boto3 retries: export AWS_MAX_ATTEMPTS=10.
★ Ansible Dynamic Inventory Quick Referenceprint this for your desk
No hosts found
Immediate action
Check AWS credentials and region
Commands
aws sts get-caller-identity
ansible-inventory -i aws_ec2.yml --list --debug 2>&1 | grep error
Fix now
Set correct regions in plugin YAML
Stale hosts in inventory+
Immediate action
Clear cache
Commands
rm -rf ~/.ansible/tmp/inventory_cache/
ansible-inventory -i aws_ec2.yml --list | jq '._meta.hostvars'
Fix now
Reduce cache_timeout to 300
Missing groups from tags+
Immediate action
Verify tag names and keyed_groups syntax
Commands
ansible-inventory -i aws_ec2.yml --list | jq '. | keys'
ansible-inventory -i aws_ec2.yml --list | jq '._meta.hostvars | to_entries[] | {host: .key, tags: .value.tags}'
Fix now
Use keyed_groups: [{prefix: tag, key: tags.Name}]
API rate limiting+
Immediate action
Enable caching
Commands
grep cache_plugin ~/.ansible.cfg
ansible-inventory -i aws_ec2.yml --list --cache-only
Fix now
Set cache_plugin: jsonfile and cache_timeout: 300
ansible_host not set+
Immediate action
Check compose section
Commands
ansible-inventory -i aws_ec2.yml --list | jq '._meta.hostvars | .[] | {host: .ansible_host}'
ansible -i aws_ec2.yml all -m ping
Fix now
Add compose: { ansible_host: public_ip_address }
Cloud Inventory Plugin Comparison
FeatureAWS (amazon.aws.aws_ec2)GCP (gcp_compute)Azure (azure_rm)
Collectionamazon.awsgoogle.cloudazure.azcollection
Hostname sourcesdns-name, private-dns-name, ip-address, instance-idname, networkInterfaces[0].networkIP, etc.name, private_ip_address, public_ip_address
Grouping keytagslabelstags
Filter syntaxEC2 filters (dict)GCE filter (string)Resource group inclusion
Caching supportYes (any cache_plugin)YesYes
Authenticationboto3 (env, IAM, file)Application default credentials or SAAzure CLI or service principal
Performance for large envGood with cachingModerate; slow without zone scopingSlow without resource group scoping
Keyed groups prefixprefix: tagprefix: gcpprefix: azure
Compose supportYesYesYes

Key takeaways

1
Use inventory plugins (not scripts) for dynamic inventory; they are faster, cache-aware, and easier to maintain.
2
Always configure ansible_host via compose to ensure connectivity.
3
Enable caching with `cache_plugin
jsonfile and cache_timeout: 300` to avoid API rate limits.
4
Use keyed_groups to automatically create groups from tags or labels.
5
Test inventory with ansible-inventory --list after every change.
6
Scope cloud inventory to specific regions, zones, or resource groups for performance.
7
Enforce consistent, alphanumeric tag values to avoid invalid group names.
8
Use `strict
false` in production to gracefully handle unexpected data.
9
Combine multiple inventory sources via a directory for multi-cloud environments.
10
Only write custom plugins as a last resort; prefer built-in plugins.
11
Automate inventory testing in CI/CD with ansible-inventory --list.
12
Monitor inventory generation time and cache hit rate.

Common mistakes to avoid

6 patterns
×

Not setting ansible_host in compose

Symptom
SSH connection failures (timeout or host unreachable)
Fix
Add compose: { ansible_host: public_ip_address | default(private_ip_address) }
×

Using legacy script instead of plugin

Symptom
Slow inventory generation, no caching, deprecation warnings
Fix
Replace with appropriate plugin YAML file
×

Not enabling caching

Symptom
API rate limiting errors, slow playbook runs
Fix
Set cache_plugin: jsonfile and cache_timeout: 300
×

Overly broad filters (e.g., no region filter)

Symptom
Inventory includes instances from all regions, slow generation
Fix
Specify regions: [us-east-1, eu-west-1] or use filters
×

Using invalid group names from tags

Symptom
Groups missing from inventory, errors if strict: true
Fix
Enforce alphanumeric tag values, set strict: false
×

Forgetting to install required collections

Symptom
Plugin not found error
Fix
Run ansible-galaxy collection install amazon.aws (or gcp, azure)
INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01SENIOR
What is the difference between an inventory plugin and an inventory scri...
Q02JUNIOR
How do you configure the AWS EC2 inventory plugin to group instances by ...
Q03SENIOR
What is the purpose of the `compose` directive in an inventory plugin?
Q04JUNIOR
How do you test an inventory plugin configuration without running a play...
Q05SENIOR
What caching options are available for dynamic inventory, and why would ...
Q06SENIOR
How do you handle hosts that appear in multiple inventory sources?
Q07SENIOR
What are the key differences between AWS and GCP inventory plugins?
Q08SENIOR
How do you write a custom inventory plugin?
Q01 of 08SENIOR

What is the difference between an inventory plugin and an inventory script?

ANSWER
Inventory plugins are Python modules integrated into Ansible, supporting caching, configuration via YAML, and features like keyed_groups and compose. They are faster and more maintainable. Inventory scripts are standalone executables that output JSON; they lack caching and error handling and are deprecated since Ansible 2.9.
FAQ · 8 QUESTIONS

Frequently Asked Questions

01
What is Ansible dynamic inventory?
02
How do I install the AWS EC2 inventory plugin?
03
Why is my inventory empty?
04
How do I force a cache refresh?
05
Can I use multiple inventory sources?
06
How do I set the SSH user dynamically?
07
What is the `constructed` plugin?
08
How do I debug group membership?
N
Naren Founder & Principal Engineer

20+ years shipping production infrastructure and CI/CD at scale. Written from production experience, not tutorials.

Follow
Verified
production tested
June 21, 2026
last updated
1,596
articles · all by Naren
🔥

That's Ansible. Mark it forged?

12 min read · try the examples if you haven't

Previous
Ansible Command and Shell Modules
14 / 23 · Ansible
Next
Ansible Galaxy and Collections