Ansible Galaxy & Collections: Pinning Versions to Avoid Production Surprises
Master ansible-galaxy role install vs collection install, requirements.yml, FQCN modules, private Automation Hub, and version pinning for reproducible builds in production..
20+ years shipping production infrastructure and CI/CD at scale. Lessons pulled from things that broke in production.
Use ansible-galaxy collection install for collections; ansible-galaxy role install for roles. They install to different paths.
Always pin collection versions in requirements.yml using version: to avoid unexpected updates.
Collections follow namespace.name format; access modules via FQCN (e.g., cisco.ios.ios_config).
For private Automation Hub, set server_list in ansible.cfg and use token for authentication.
When using collections in playbooks, always use FQCN to avoid ambiguity with built-in modules.
Use ansible-galaxy collection list to verify installed versions; ansible-galaxy collection install -r requirements.yml -f to force reinstall.
For roles, requirements.yml can mix roles and collections with src, name, version keys.
Never rely on latest; always pin to a specific version in CI/CD pipelines.
Imagine you're building a house from a kit. Ansible Galaxy is like the hardware store where you buy pre-made components (roles and collections). Roles are like pre-assembled wall panels—you drop them in and they work. Collections are like toolkits containing multiple specialized tools (modules, plugins, roles) that belong to a brand (namespace). You wouldn't grab a random screwdriver from the pile; you'd pick a specific brand's set. Pinning versions is like buying the exact same screwdriver set version every time—if you don't, next month's set might have a different handle that breaks your project. Private Automation Hub is your own locked warehouse where you store approved components, so your team doesn't accidentally use a bad part from the public store.
It was 3 AM on a Sunday. Our CI pipeline had been green for weeks, but suddenly a deployment failed with ERROR! couldn't resolve module 'aws_s3'. The playbook had been using aws_s3 without a FQCN, and a new collection version had renamed the module. We had no version pinning, no requirements.yml—just ansible-galaxy collection install amazon.aws in a script. That night, I learned the hard way that Ansible Galaxy is not a 'set it and forget it' tool.
Ansible Galaxy has evolved significantly. Originally, it was just a repository for roles—shareable, reusable automation units. Then came Collections (introduced in Ansible 2.9), which bundle modules, plugins, roles, and playbooks under a namespace. This solved the 'module explosion' problem but introduced new complexity: namespace management, dependency hell, and version conflicts.
In this article, I'll cover exactly how to use ansible-galaxy for both roles and collections, write a bulletproof requirements.yml, understand namespace conventions, use FQCN in playbooks, connect to private Automation Hub, and pin versions for reproducible builds. Every example is production-tested, and I'll share the exact commands that fixed my 3 AM outage.
We'll also dive into a real production incident where a missing version pin caused a cascading failure across 200 servers, and how we recovered with a one-line fix.
ansible-galaxy Role Install vs Collection Install
The ansible-galaxy command has two subcommands: role and collection. They are not interchangeable. Roles install to ~/.ansible/roles/ (or a custom roles_path). Collections install to ~/.ansible/collections/ansible_collections/<namespace>/<name>/. This means you cannot use ansible-galaxy collection install to install a role, and vice versa.
Role install: ``bash ansible-galaxy role install geerlingguy.nginx,1.0.0 ` This downloads the role from Galaxy and places it in ~/.ansible/roles/geerlingguy.nginx/. You can also install from git: `bash ansible-galaxy role install git+https://github.com/geerlingguy/ansible-role-nginx.git,1.0.0 ``
Collection install: ``bash ansible-galaxy collection install community.general:6.0.0 ` This installs to ~/.ansible/collections/ansible_collections/community/general/. Note the colon :` between namespace.name and version.
Key gotcha: If you accidentally run ansible-galaxy install (without subcommand), it defaults to role install. Always specify role or collection.
Production pattern: Use requirements.yml to manage both. Example: ``yaml --- roles: - name: geerlingguy.nginx version: 1.0.0 collections: - name: community.general version: 6.0.0 ` Then run: `bash ansible-galaxy install -r requirements.yml `` This single command installs all roles and collections listed.
ansible-galaxy install without subcommand for collections. It will fail or install to wrong path.ansible-galaxy install community.general (without subcommand). It silently installed an empty role directory. Our playbook failed with 'role not found' because it was looking for a collection. We spent hours debugging before noticing the missing subcommand.ansible-galaxy role install for roles and ansible-galaxy collection install for collections. Use requirements.yml to manage both in one file.Writing a Bulletproof requirements.yml for Roles and Collections
A requirements.yml file is the cornerstone of reproducible Ansible environments. It should contain both roles and collections with explicit versions. Here's a production-tested template:
```yaml --- roles: - name: geerlingguy.nginx version: 1.0.0 source: https://galaxy.ansible.com - name: mycompany.custom_role version: 2.1.0 source: https://automationhub.mycompany.com
collections: - name: community.general version: 6.0.0 source: https://galaxy.ansible.com - name: mycompany.operations version: 1.2.3 source: https://automationhub.mycompany.com ```
Important fields: - name: For roles, it's the role name (e.g., geerlingguy.nginx). For collections, it's namespace.name. - version: Use exact version (e.g., 6.0.0) or version range (e.g., >=6.0.0,<7.0.0). For production, always use exact. - source: Optional. If omitted, it uses the default server from ansible.cfg. Specify for private hubs.
Gotcha: When using source, ensure it matches the server list in ansible.cfg. Otherwise, you may get authentication errors.
Install command: ``bash ansible-galaxy install -r requirements.yml -f ` The -f` flag forces reinstallation even if already present. Use in CI to ensure clean state.
Validating: ``bash ansible-galaxy collection list -r requirements.yml `` This shows what would be installed without actually installing.
version: '*' for a collection. One day, the collection maintainer released a breaking change. Our CI failed because a module was renamed. We now enforce that all requirements.yml files must have exact versions, checked by a pre-commit hook.-f flag in CI to force reinstall.Understanding Collections Namespace and Name Convention
Collections are identified by a fully qualified collection name (FQCN) in the format namespace.name. The namespace is a logical grouping (e.g., cisco, amazon, community), and the name is the specific collection (e.g., ios, aws, general). This convention prevents naming conflicts and allows multiple collections from the same vendor.
Examples: - cisco.ios — Cisco IOS modules and roles - amazon.aws — AWS modules - community.general — General-purpose modules (deprecated, use ansible.builtin where possible)
Installation path: When you install community.general, it goes to: `` ~/.ansible/collections/ansible_collections/community/general/ `` Note the directory structure mirrors the namespace and name.
Listing installed collections: ``bash ansible-galaxy collection list | grep community.general ``
Importance of namespace: Namespaces are registered on Galaxy. You cannot publish a collection under a namespace you don't own. For private hubs, you can define custom namespaces.
Gotcha: Some collections have similar names (e.g., community.kubernetes vs kubernetes.core). Always verify the correct one from the official documentation.
k8s) may resolve to different modules depending on collection order.k8s (short name). After installing kubernetes.core, the module resolved to kubernetes.core.k8s instead of the expected community.kubernetes.k8s. Using FQCN community.kubernetes.k8s fixed it.namespace.name convention. Use FQCN in playbooks to prevent module resolution conflicts.Using Collection Modules in Playbooks with FQCN
To use a module from a collection in a playbook, you must use the Fully Qualified Collection Name (FQCN). The format is namespace.name.module_name. For example, to use the ios_config module from cisco.ios:
``yaml - name: Configure interface cisco.ios.ios_config: lines: - interface GigabitEthernet0/1 - ip address 192.0.2.1 255.255.255.0 ``
Why FQCN? Without FQCN, Ansible resolves module names by searching installed collections in order. If two collections provide a module with the same name, the first one found is used. This can lead to unexpected behavior. FQCN makes the source explicit.
Using filters and plugins: Filters from collections also require FQCN: ``yaml - name: Use ipaddr filter debug: msg: '{{ "192.0.2.1" | ansible.utils.ipaddr }}' ``
Legacy short names: Ansible still supports short names for some built-in modules (e.g., copy, file). But for collection modules, always use FQCN.
Verifying module availability: ``bash ansible-doc -t module cisco.ios.ios_config `` This shows documentation if the collection is installed.
Production pattern: Set the collections keyword in a playbook to avoid typing FQCN repeatedly: ``yaml --- - hosts: all collections: - cisco.ios tasks: - name: Configure interface ios_config: lines: - interface GigabitEthernet0/1 `` But be careful: this applies to all tasks in the play. Use with caution in large playbooks.
ansible-galaxy collection install namespace.name:version. Not a comma or equals.ec2_instance from amazon.aws. After installing amazon.aws version 5.0.0, the module behavior changed. Switching to FQCN amazon.aws.ec2_instance made the version dependency explicit.collections keyword for brevity but with caution.Connecting to Private Automation Hub
Many organizations run a private Automation Hub (part of Red Hat Ansible Automation Platform) to host internal collections and roles. To use it, configure ansible.cfg:
```ini [galaxy] server_list = automation_hub, galaxy
[galaxy_server.automation_hub] url=https://automationhub.example.com/api/galaxy/ auth_url=https://automationhub.example.com/auth/realms/automation-hub/protocol/openid-connect/token token=your_token_here
[galaxy_server.galaxy] url=https://galaxy.ansible.com ```
Getting a token: From Automation Hub UI, generate an API token. Or use ansible-galaxy: ``bash ansible-galaxy auth login --server https://automationhub.example.com ``
Installing from private hub: ``bash ansible-galaxy collection install mycompany.operations --server automation_hub ` Or specify in requirements.yml: `yaml collections: - name: mycompany.operations version: 1.0.0 source: https://automationhub.example.com/api/galaxy/ ``
Troubleshooting: If you get 404 or 401, check: - Token validity (regenerate if expired) - URL path (must end with /api/galaxy/ for Automation Hub) - User permissions for the namespace
Production pattern: Store token in a vault or environment variable, not in ansible.cfg. Use ANSIBLE_GALAXY_SERVER_TOKEN or ansible-vault.
Pinning Versions for Reproducible Builds
Reproducibility is the holy grail of infrastructure as code. Without version pinning, ansible-galaxy install gives you whatever is latest at that moment. This is a disaster for CI/CD.
How to pin: In requirements.yml: ``yaml collections: - name: community.general version: 6.0.0 ``
Version formats: - Exact: 6.0.0 - Range: >=6.0.0,<7.0.0 - Latest: * (avoid)
Lock file approach: Use ansible-galaxy collection install -r requirements.yml --lock (Ansible 2.10+) to generate a lock file requirements.yml.lock with exact versions of all dependencies. Then commit the lock file and use it for installs: ``bash ansible-galaxy collection install -r requirements.yml.lock ``
Verifying pinned versions: ``bash ansible-galaxy collection list | grep community.general ` Should show exactly 6.0.0`.
CI/CD integration: In your pipeline: ``bash ansible-galaxy collection install -r requirements.yml -f ansible-galaxy collection list > installed_versions.txt `` Then diff against expected versions.
Gotcha: Pinning roles is similar: version: 1.0.0 in requirements.yml. But roles from git must use git+ syntax and pin to a commit or tag.
version: '*' in production. It will install the latest version every time, breaking reproducibility.community.general released a new version that deprecated a module we used. The pipeline failed. We now generate a lock file and commit it, and the pipeline fails if the lock file doesn't match.Managing Dependencies Between Collections
Collections can depend on other collections. For example, community.kubernetes depends on kubernetes.core. When you install a collection, ansible-galaxy automatically installs its dependencies. This can lead to unexpected version conflicts.
How dependencies work: Each collection has a galaxy.yml file that lists dependencies. When you install community.kubernetes, it might install kubernetes.core>=1.0.0.
Problem: If your requirements.yml pins kubernetes.core to 2.0.0, but community.kubernetes requires <2.0.0, you get a conflict. ansible-galaxy will fail with a message like: `` ERROR! Cannot satisfy requirements for 'kubernetes.core' as it conflicts with version '2.0.0' required by 'community.kubernetes' ``
Solution: - Use a lock file to resolve all transitive dependencies. - Manually specify compatible versions in requirements.yml. - Use --force with caution; it may install conflicting versions.
Best practice: Always run ansible-galaxy collection install -r requirements.yml --lock to generate a consistent set of versions. Then commit the lock file.
Checking dependencies: ``bash ansible-galaxy collection install community.kubernetes --dry-run `` This shows what will be installed without actually doing it.
community.kubernetes and kubernetes.core versions. The lock file had pinned kubernetes.core to 2.0.0, but community.kubernetes required <2.0.0. We had to downgrade kubernetes.core to 1.2.0 to resolve it.--lock to generate a lock file that resolves all transitive dependencies. Test changes in a staging environment before production.Using ansible-galaxy with Git Repositories and Tarballs
Not all collections and roles are on Galaxy. You can install from git repositories or local tarballs. This is common for internal tools.
Installing a collection from git: ``bash ansible-galaxy collection install git+https://github.com/mycompany/mycollection.git,main ` This clones the repo and installs from the main branch. You can also pin to a tag or commit: `bash ansible-galaxy collection install git+https://github.com/mycompany/mycollection.git,v1.0.0 ``
Installing from a tarball: ``bash ansible-galaxy collection install ./mycollection-1.0.0.tar.gz ``
In requirements.yml: ``yaml collections: - name: https://github.com/mycompany/mycollection.git version: v1.0.0 type: git ` Or for tarball: `yaml collections: - name: ./mycollection-1.0.0.tar.gz type: file ``
Roles from git: ``bash ansible-galaxy role install git+https://github.com/geerlingguy/ansible-role-nginx.git,1.0.0 ``
Production pattern: For internal collections, host them on a private git server or tarball repository. Use version tags to pin.
Gotcha: Git installs can be slow and require network access. Consider packaging as tarballs for faster CI.
develop. One day, a force push broke our CI because the branch history changed. We now always pin to a tag.Troubleshooting Collection Installation Failures
Collection installation can fail for many reasons. Here's a systematic approach:
1. Check connectivity: ``bash ansible-galaxy collection install community.general --server https://galaxy.ansible.com -vvv ` The -vvv` flag shows detailed HTTP requests. Look for 200 OK or error codes.
2. Authentication issues: For private hubs, ensure token is valid. Test with: ``bash curl -H "Authorization: Bearer YOUR_TOKEN" https://automationhub.example.com/api/galaxy/content/ ``
3. Version conflicts: If you get dependency conflict errors, use --dry-run to see the dependency tree: ``bash ansible-galaxy collection install community.kubernetes --dry-run ``
4. Corrupted cache: Clear the galaxy cache: ``bash rm -rf ~/.ansible/galaxy_cache/ ``
5. Wrong server list: Verify ansible.cfg: ``bash ansible-config dump | grep GALAXY_SERVER ``
6. Permission issues: Ensure the install directory is writable. For system-wide installs, use sudo or set collections_path in ansible.cfg.
7. YAML syntax errors in requirements.yml: Validate with: ``bash python -c "import yaml; yaml.safe_load(open('requirements.yml'))" ``
-vvv when debugging collection installation. It reveals HTTP errors, authentication issues, and dependency resolution.-vvv showed the exact URL being requested.-vvv for verbose output, clear cache, validate YAML, and check authentication when installation fails.Best Practices for ansible-galaxy in CI/CD Pipelines
In CI/CD, every build must be deterministic. Here's how to integrate ansible-galaxy:
1. Use a lock file: Generate it once and commit: ``bash ansible-galaxy collection install -r requirements.yml --lock ` Then in CI: `bash ansible-galaxy collection install -r requirements.yml.lock -f ``
2. Cache collections: In CI, cache the ~/.ansible/collections/ directory to speed up builds. But invalidate cache when lock file changes.
3. Validate requirements.yml: Add a step to check YAML syntax and that all versions are pinned (no *).
4. Test with multiple Ansible versions: Different Ansible versions may have different collection compatibility. Test with the version you use in production.
5. Use a dedicated CI user: Create a service account on Automation Hub with minimal permissions.
6. Monitor for deprecations: Regularly run: ``bash ansible-galaxy collection list --format json | jq '.[].version' `` And compare with latest available versions.
Example CI script: ```bash #!/bin/bash set -euo pipefail
ansible-galaxy collection install -r requirements.yml.lock -f ansible-galaxy collection list > /tmp/installed.txt diff <(sort /tmp/installed.txt) <(sort expected_versions.txt) || { echo "Version mismatch"; exit 1; } ```
Migrating from Roles to Collections: A Practical Guide
If you have a legacy Ansible codebase using roles exclusively, you might want to migrate to collections for better organization. Here's how to do it safely.
1. Create a collection scaffold: ``bash ansible-galaxy collection init mycompany.infrastructure `` This creates the directory structure.
2. Move roles into the collection: Place roles in mycompany/infrastructure/roles/. Then in playbooks, reference them as mycompany.infrastructure.rolename.
3. Create a requirements.yml for the collection: ``yaml collections: - name: mycompany.infrastructure version: 1.0.0 source: ./mycompany-infrastructure-1.0.0.tar.gz ``
4. Update playbooks: Replace role references with FQCN. For example: ``yaml - hosts: all roles: - mycompany.infrastructure.nginx ``
5. Test incrementally: Start with one collection and migrate roles one by one.
6. Use ansible-lint to enforce FQCN: Add rule fqcn-builtins to catch non-FQCN module usage.
Production pattern: Keep roles and collections side by side during migration. Use a requirements.yml that includes both.
Gotcha: Roles inside collections cannot use import_role with short name. Always use FQCN.
Security Considerations for Ansible Galaxy Usage
Using public Galaxy content introduces security risks. Here's how to mitigate:
1. Use private Automation Hub: Host a curated set of approved collections. This prevents developers from accidentally using malicious content.
2. Pin versions: Always pin to exact versions. A new version of a collection could contain malware.
3. Verify checksums: For critical collections, download the tarball and verify its SHA256 checksum: ``bash wget https://galaxy.ansible.com/download/community-general-6.0.0.tar.gz sha256sum community-general-6.0.0.tar.gz `` Compare with the checksum published on Galaxy.
4. Audit installed collections: Regularly run: ``bash ansible-galaxy collection list --format json | jq '.[].name' `` And compare with an approved list.
5. Use signed collections: Ansible supports GPG signing of collections. Verify signatures before installation: ``bash ansible-galaxy collection verify community.general:6.0.0 ``
6. Limit network access: In CI, only allow connections to approved Galaxy servers.
7. Scan for vulnerabilities: Use tools like safety or trivy to scan collection dependencies.
Production pattern: Maintain an allowlist of namespaces and versions. Block all others at the network level.
The Unpinned Collection That Broke 200 Servers
ERROR! couldn't resolve module 'community.kubernetes.k8s'.community.kubernetes had been deprecated and replaced by kubernetes.core. The module k8s was moved to kubernetes.core.k8s. Our requirements.yml had community.kubernetes without a version, so it installed the latest (which was the deprecated redirect package) that no longer contained the module.requirements.yml to use kubernetes.core collection and pinned version to 2.0.0. Ran ansible-galaxy collection install -r requirements.yml -f.- Always pin collection versions.
- Never rely on 'latest'.
- Use FQCN in playbooks to make module sources explicit.
- Regularly audit collections for deprecations.
ERROR! couldn't resolve module/action 'namespace.name.module_name'ansible-galaxy collection list | grep namespace.name. If missing, install with version: ansible-galaxy collection install namespace.name:version. If present, verify FQCN spelling and module existence in collection docs.ERROR! the role 'role_name' was not foundansible-galaxy role list | grep role_name. If missing, install with ansible-galaxy role install role_name,version.ERROR! 'ansible.utils' is not a valid collection for the 'ipaddr' filteransible.utils is installed. Install with version pin: ansible-galaxy collection install ansible.utils:2.9.0. Verify with ansible-galaxy collection list | grep ansible.utils.Failed to download collection: HTTP Error 404: Not Found from private Automation Hubserver_list in ansible.cfg and token validity. Run ansible-galaxy collection install namespace.name --server hub.example.com to test connectivity. Verify token has access to the namespace.ansible-galaxy collection list | grep namespace.nameansible-galaxy collection install namespace.name:versionKey takeaways
ansible-galaxy role install for roles and ansible-galaxy collection install for collections; they are not interchangeable.-vvv flag when debugging collection installation failures.Common mistakes to avoid
6 patternsUsing `ansible-galaxy install` without subcommand for collections
ansible-galaxy collection installUsing short module names instead of FQCN
namespace.name.module_nameNot pinning versions in requirements.yml
version: 6.0.0Using comma instead of colon for collection version
namespace.name:versionCommitting API tokens in ansible.cfg
Not using lock files for dependency resolution
ansible-galaxy install --lock to generate lock fileInterview Questions on This Topic
What is the difference between `ansible-galaxy role install` and `ansible-galaxy collection install`?
ansible-galaxy role install installs roles to ~/.ansible/roles/, while ansible-galaxy collection install installs collections to ~/.ansible/collections/ansible_collections/. They use different version separators: comma for roles, colon for collections. Roles are single units of automation; collections bundle modules, plugins, roles, and playbooks.Frequently Asked Questions
20+ years shipping production infrastructure and CI/CD at scale. Lessons pulled from things that broke in production.
That's Ansible. Mark it forged?
10 min read · try the examples if you haven't