Mid-level 6 min · March 09, 2026

Ansible Roles — Empty defaults/main.yml Prevents Reuse

Hardcoded paths in tasks/main.

N
Naren · Founder
Plain-English first. Then code. Then the interview question.
About
 ● Production Incident 🔎 Debug Guide
Quick Answer
  • Ansible Role = packaged unit of automation with standard directory structure (tasks, handlers, defaults, vars, templates, files, meta)
  • Convention over configuration: Ansible auto-loads main.yml from each directory when the role is called — missing that file means the directory silently does nothing
  • defaults/ for overridable variables (lowest precedence in all of Ansible), vars/ for internal constants (higher precedence — overrides from inventory won't reach them)
  • Performance: role lookups add ~100ms per role call — flatten deeply nested role dependency chains for large inventories running under time pressure
  • Production trap: hardcoding paths in tasks/ instead of using defaults/ — the role works for one team and is useless for everyone else without forking it
  • Biggest mistake: creating 'God roles' that configure databases AND web servers AND monitoring — break into separate focused roles, compose them in the playbook
Plain-English First

Think of Ansible Roles as a professional toolbox with dedicated, labeled drawers. Instead of throwing every tool — hammers, screwdrivers, drills — into one big pile (a single massive playbook), you organize them. One drawer holds Web Server tools. Another holds Database tools. A third holds Monitoring tools. When you need to build a new system, you grab exactly the drawers you need and leave the rest on the shelf.

The labels on the drawers matter too. Some tools have adjustable settings — the drill's speed, the torque on the wrench. Those settings go on a sticky note on the outside of the drawer so whoever borrows it can change them without opening the drawer and modifying the tool itself. That's what defaults/ is in an Ansible role: the sticky note that says 'this is what we assume, but you can change it.' vars/ is the weld that holds the drawer together — it should not be touched.

Ansible Roles are how you turn automation from scripting into software engineering. A single playbook that works for one team and one environment is a script. A role that any team can pull from Galaxy, override with their own variable values, and deploy to any environment without touching a line of task code — that's reusable infrastructure.

Most tutorials show you how to initialize a role and move on. What they skip is the operational detail that determines whether a role becomes an asset or a liability six months after it's written. The defaults-versus-vars distinction trips up engineers who understand the concept but haven't felt the pain of getting it wrong in production. God roles are written by people who know roles exist but haven't internalized why the single-responsibility principle applies to infrastructure automation as much as it does to application code. And role dependency chains in meta/main.yml can fail in ways that produce no errors and leave your fleet silently misconfigured.

By the end of this article you'll know how to structure roles that teams outside your own can actually use, how to test them with Molecule so regressions surface in CI rather than production, why the variable precedence hierarchy determines whether your role is overridable or effectively hardcoded, and how to recognize the God role pattern early enough to fix it before it becomes entrenched technical debt.

The Architecture of a Role: Convention Over Configuration

Ansible Roles exist to move infrastructure automation from scripting to software engineering. The distinction matters operationally: a script works for its author in their environment. A role works for any team, in any environment, without modification to its internals — only variable overrides at the boundary.

The mechanism that makes this possible is convention over configuration. Every role follows the same directory structure. When Ansible calls a role, it knows exactly where to look for each type of content without being told: tasks/main.yml for the primary logic, handlers/main.yml for service restart definitions, defaults/main.yml for overridable variables, vars/main.yml for internal constants, templates/ for Jinja2 config files, files/ for static assets, and meta/main.yml for dependencies and Galaxy metadata. None of these require explicit loading in your tasks — Ansible finds and loads them automatically based on their location.

This predictability is the entire point. When a new engineer opens a role they've never seen before, they know immediately where the task logic lives, where the variables are defined, and where the templates are. That shared mental model is what allows roles to be shared across teams and organizations via Ansible Galaxy.

The structure isn't optional and it isn't decoration. Missing tasks/main.yml means the role does nothing and produces no error. A template referenced in tasks that doesn't exist in templates/ fails at runtime with a file not found error that points to a path that looks correct. The ansible-galaxy role init command generates the full structure in one command — use it every time rather than creating directories manually and risking missing one.

One aspect of the structure that teams often underuse: the tests/ directory. Ansible generates it but leaves it empty. This is where your Molecule configuration lives — the test scenarios that verify the role works with default variables, with non-default variables, and that it's idempotent on a second run. A role without tests in tests/ is a role that breaks silently and gets discovered in production.

io/thecodeforge/ansible/init_role.shBASH
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
#!/usr/bin/env bash
# io.thecodeforge: Initialize a new role with the full standard structure
# Always use ansible-galaxy role init — never create directories manually.
# The tool generates every required directory and file, including stubs
# for meta/main.yml with Galaxy metadata fields that must be present
# for publishing to Galaxy or an internal Automation Hub.

ansible-galaxy role init io.thecodeforge.webserver

# Generated structure:
# io.thecodeforge.webserver/
# ├── defaults/
# │   └── main.yml   <- LOWEST precedence. Everything here is overridable.
# │                     Use for: ports, paths, package versions, feature flags.
# │                     If a value might differ between teams or environments, it goes here.
# ├── files/
# │                  <- Static assets. No variable substitution.
# │                     Use for: scripts, static certs, binary configs, SSH keys.
# │                     Loaded by: ansible.builtin.copy with src: filename.ext
# ├── handlers/
# │   └── main.yml   <- Service lifecycle tasks. Only run when notified.
# │                     Use for: reload, restart, enable. Never unconditional tasks.
# ├── meta/
# │   └── main.yml   <- Role dependencies, Galaxy metadata, supported platforms.
# │                     Dependencies here run before this role, automatically.
# ├── tasks/
# │   └── main.yml   <- Primary execution logic. Loaded first automatically.
# │                     Should contain NO hardcoded values — only variable references.
# ├── templates/
# │                  <- Jinja2 templates. Rendered at runtime with variable substitution.
# │                     Use for: nginx.conf, postgresql.conf, systemd unit files.
# │                     Loaded by: ansible.builtin.template with src: filename.j2
# ├── vars/
# │   └── main.yml   <- HIGH precedence. Inventory and group_vars cannot override these.
# │                     Use for: internal package names, service names, OS-specific constants.
# │                     NOT for values users should change — use defaults/ for those.
# └── tests/
#     └── molecule/  <- Molecule test scenarios. Never leave this empty in production roles.
#         ├── default/    (tests with default variable values)
#         └── custom/     (tests with non-default values — catches hardcoding)

# After init, immediately set up Molecule:
cd io.thecodeforge.webserver
molecule init scenario default --driver-name docker
molecule init scenario custom --driver-name docker
# Write converge.yml and verify.yml for both scenarios before writing a single task.
The Empty defaults/main.yml Is a Red Flag
When reviewing a role, the first file to open is defaults/main.yml. If it's empty — or doesn't exist — the role almost certainly has hardcoded values in tasks/main.yml that make it single-use. Every path, port, package version, username, and config option that could reasonably differ between environments should appear in defaults/main.yml with a sensible value. If you can read through tasks/main.yml and find a literal string where a variable should be, that's a bug in the role's design, not just a style issue.
Production Insight
The standard directory structure is how Ansible finds your files — it is not optional formatting.
Missing tasks/main.yml means the role silently does nothing. Missing templates/ causes runtime failures that point to a path that looks correct and takes time to diagnose.
Rule: run ansible-galaxy role init every time. The directory structure costs nothing to generate and is expensive to debug when wrong. Check the generated meta/main.yml and fill in the galaxy_info block — a role without author and description metadata becomes unmaintainable in a shared Galaxy namespace.
Key Takeaway
Roles are packaged automation. The directory structure is the loading contract — Ansible finds files by convention, not configuration.
tasks/ does the work. defaults/ holds what users can change. vars/ holds what they should not. templates/ holds configs that need variables. files/ holds everything static.
Convention over configuration means every role looks the same from the outside. That shared structure is what makes roles shareable across teams.
Where Does This Value Belong?
IfThe value might differ between environments, teams, or use cases (port numbers, paths, package versions, usernames)
Usedefaults/main.yml — lowest precedence, fully overridable by inventory, playbook, or command line
IfThe value is an internal constant the role needs to function correctly and users should never change (internal service name, OS package name, fixed file permission)
Usevars/main.yml — higher precedence, protected from inventory overrides
IfThe value is a config file that needs variable substitution (nginx.conf, postgresql.conf, systemd unit)
Usetemplates/ as a Jinja2 .j2 file — rendered at runtime, references variables from defaults/ or the calling playbook
IfThe value is a static file that never changes (a shell script, a static binary, a fixed certificate)
Usefiles/ — served verbatim with no variable substitution
IfThe value is a secret (password, API token, private key)
UseNeither defaults/ nor vars/ — Ansible Vault encrypted variable in group_vars/production/vault.yml, passed to the role via the calling playbook

Production Patterns: Reusability, Composition, and the God Role Problem

The single most important design principle for Ansible roles is the same one that applies to microservices, library functions, and Unix commands: do one thing well. A role that installs and configures Nginx is useful to every team that runs Nginx. A role that installs Nginx, PostgreSQL, Redis, and a monitoring agent is useful to exactly one team — the team that chose that exact combination — and becomes a maintenance burden the moment any team's requirements diverge.

This is the God role problem. It emerges gradually. Someone writes a server_setup role that installs the web server and the database because both are needed on the first server they're automating. A few weeks later they add log rotation. A few weeks after that, monitoring. By the time the role has 600 lines across tasks/main.yml, it's impossible to use partially. A team that only needs the web server configuration must accept the database configuration too, or fork the role.

The fix is decomposition: one role per service, composed in the playbook. A playbook that calls roles: [common, nginx, postgresql, prometheus_node_exporter, log_rotation] is instantly readable. You know exactly what the role list configures. You can remove prometheus_node_exporter from the list for a server where you don't want monitoring. You can test each role independently with Molecule. You can update the nginx role without touching the postgresql role.

The second pattern that determines whether roles scale is the variable boundary. The calling playbook is where environment-specific values should live — not inside the role. A role's defaults/main.yml provides the fallback values that work for the most common case. The playbook's vars block or the inventory's group_vars override those defaults for specific environments. This separation means the role itself is environment-agnostic — it works for dev, staging, and production, with the differences expressed entirely in the calling context.

When a value is passed in the vars: block at role call time in a playbook, it has higher precedence than defaults/ but lower than host_vars. This is the right level for environment-specific overrides when you want the role to receive a value without the calling team having to set it in inventory. It's also where you declare which environment-specific values are expected — a well-documented vars block at the role call site is self-documenting infrastructure.

io/thecodeforge/ansible/site.ymlYAML
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
---
# io.thecodeforge: Production orchestration playbook
# This file composes roles. It contains no task logic of its own.
# Each role does one thing. This playbook decides which things to do and in what order.
#
# Variable precedence at role call time (from the vars: block):
# Higher than: defaults/main.yml, group_vars, inventory vars
# Lower than: host_vars, extra vars (-e)
# Use the vars: block for environment-specific overrides you want visible in the playbook.
# Use group_vars for overrides that should apply to all plays targeting that group.

# ── Play 1: Load Balancer configuration ───────────────────────────────────────
- name: Configure Load Balancer Tier
  hosts: load_balancers
  become: true

  roles:
    # Common role runs first on every host — sets up SSH hardening, NTP, logging standards
    - role: io.thecodeforge.common
      vars:
        common_ntp_servers:
          - 169.254.169.123  # AWS time sync service — lower latency than pool.ntp.org
        common_ssh_allow_groups: ['deploy', 'sre']

    # HAProxy role — focused exclusively on load balancer configuration
    - role: io.thecodeforge.haproxy
      vars:
        haproxy_max_connections: 10000     # Overrides defaults/main.yml value of 2000
        haproxy_timeout_connect: '5s'
        haproxy_timeout_client: '30s'
        haproxy_timeout_server: '30s'
        haproxy_backend_servers:            # Dynamically populated from inventory
          - { name: 'web01', addr: '{{ hostvars["web-01"]["ansible_host"] }}', port: 8080 }
          - { name: 'web02', addr: '{{ hostvars["web-02"]["ansible_host"] }}', port: 8080 }

    # TLS termination role — manages certificates and nginx-based TLS offloading
    - role: io.thecodeforge.tls_termination
      vars:
        tls_domain: 'api.thecodeforge.io'
        tls_cert_source: 'acme'            # 'acme', 'vault', or 'file'
        tls_acme_email: 'ops@thecodeforge.io'

# ── Play 2: Application Server configuration ──────────────────────────────────
- name: Configure Application Server Tier
  hosts: web_servers
  become: true

  roles:
    - role: io.thecodeforge.common
      # Same role, same defaults — common runs identically on all tiers

    - role: io.thecodeforge.nginx
      vars:
        nginx_worker_processes: auto
        nginx_worker_connections: 4096
        nginx_vhosts:
          - server_name: 'api.thecodeforge.io'
            listen_port: 8080
            root: '/var/www/api'
            access_log: '/var/log/nginx/api_access.log'

    - role: io.thecodeforge.app_deploy
      vars:
        app_repo: 'https://github.com/thecodeforge/api.git'
        app_version: '{{ release_version | default("main") }}'
        app_user: 'www-data'
        app_env: 'production'

# ── Play 3: Database configuration ────────────────────────────────────────────
- name: Configure Database Tier
  hosts: database_servers
  become: true
  serial: 1               # One database server at a time — never parallel for Postgres
  max_fail_percentage: 0  # Any database failure stops the entire play

  roles:
    - role: io.thecodeforge.common

    - role: io.thecodeforge.postgresql
      vars:
        postgres_version: 16
        postgres_data_dir: '/data/pg_production'  # Overrides default /var/lib/postgresql
        postgres_max_connections: 200
        postgres_shared_buffers: '4GB'
        postgres_effective_cache_size: '12GB'
        # Passwords come from Vault — never hardcoded here
        postgres_app_password: '{{ vault_postgres_app_password }}'

    - role: io.thecodeforge.prometheus_node_exporter
      # No vars override — defaults work for all servers
      # Port 9100, /metrics endpoint, standard collectors
The God Role Tells You It's a God Role
If your role's tasks/main.yml exceeds 150 lines, or if the role name contains 'and' (postgres_and_redis, webserver_and_monitoring), or if you find yourself writing when: conditions inside the role to skip tasks for certain use cases — these are the three warning signs. A role that needs when: conditions to skip parts of itself for different callers is really two roles sharing one directory. The fix is always decomposition: split, compose in the playbook, test each piece independently.
Production Insight
Role variables passed in the vars: block at call time have higher precedence than defaults/ but lower than host_vars — this is the right level for environment-specific overrides you want visible in the playbook itself.
Never put environment-specific values in vars/main.yml inside the role. vars/ is for internal role constants, not for configuration that differs by environment. An engineer reading the role has no way to know that vars/ is being overridden externally — it looks hardcoded.
Rule: the calling playbook should be readable as documentation. A vars: block at each role call site that lists the non-default values is self-documenting infrastructure. Someone reading site.yml should understand the entire system's configuration without opening any role file.
Key Takeaway
One role per service. Compose multiple roles in the playbook. This is the pattern that scales from 5 servers to 5000.
A reusable role has no environment-specific assumptions. Everything that could differ goes in defaults/. The calling playbook provides the environment-specific values.
If two teams are maintaining separate forks of the same role, the role has a hardcoded value where a defaults/ variable should be. Find it and fix it.
One Role or Multiple Roles?
IfThe automation installs and configures a single service (Nginx, PostgreSQL, Redis, Prometheus)
UseOne role — named after the service. This is the right granularity for Galaxy and for reuse.
IfThe automation configures two related services that are always deployed together on the same host
UseStill two roles — one per service. Compose them in the playbook. The coupling is at the playbook level where it belongs, not inside a role.
IfThe automation has a 'base' configuration that applies to every server regardless of role (SSH hardening, NTP, logging standards, security patches)
UseOne 'common' role that runs first in every play. This is explicitly the right use case for a shared foundational role.
IfThe role's tasks/main.yml has grown past 150 lines or contains when: conditions to skip sections for different callers
UseSplit it. The when: conditions are telling you where the split lines are. Each conditional branch is a candidate for its own role.
IfTwo teams need the same service configured differently and one is forking the role to make their changes
UseThe role needs better defaults/ coverage. A fork means a variable that should be in defaults/ is hardcoded in tasks/. Find it, move it, eliminate the fork.

Testing Roles with Molecule — The Practice That Separates Good Roles from Great Ones

A role without tests is a role that breaks silently in production. You find out when a deployment fails, when a new engineer makes a change that looked harmless, or when a Galaxy role dependency updates and changes behavior. Molecule gives you a way to find out in CI instead.

Molecule is the standard testing framework for Ansible roles. It spins up disposable infrastructure — Docker containers for most roles, cloud instances for roles that need real hardware — runs your role against that infrastructure, verifies the resulting state with Testinfra assertions, runs the role a second time to verify idempotency, and then tears everything down. The entire cycle takes 2-5 minutes for a Docker-based test.

The most valuable test you can write is the idempotency check: run the role twice and assert that the second run shows zero 'changed' tasks. This is Molecule's default behavior — it runs the role, checks idempotency automatically, and fails if the second run shows any changes. If your role isn't idempotent, Molecule tells you which task is the problem.

The second most valuable test is the non-default variable scenario: create a Molecule scenario that sets every variable in defaults/main.yml to a non-default value and runs the role. If any task contains a hardcoded value instead of a variable reference, this test surfaces it. The production incident in this article would have been caught by this test on the first day the role was written.

For roles that will be shared via Galaxy or an internal Automation Hub, add platform-specific scenarios: test on Ubuntu 22.04 LTS, on Ubuntu 24.04, and on RHEL 9 if your organization runs Red Hat. Platform divergence in package names, service names, and file paths is a major source of 'works on my machine' failures in shared roles.

io/thecodeforge/ansible/roles/nginx/molecule/default/converge.ymlYAML
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
---
# io.thecodeforge: Molecule converge playbook — default scenario
# This runs the role with default variable values.
# Molecule automatically runs this twice and fails if second run shows 'changed'.

- name: Converge — test nginx role with default variables
  hosts: all
  become: true

  roles:
    - role: io.thecodeforge.nginx
      # No vars: block here — testing that defaults/main.yml values work correctly

---
# io.thecodeforge: Molecule verify playbook — Testinfra assertions
# File: molecule/default/verify.yml
# These assertions run after converge and confirm the role achieved its intended state.

- name: Verify — confirm nginx role achieved correct state
  hosts: all
  gather_facts: false

  tasks:
    - name: Confirm Nginx package is installed
      ansible.builtin.package_facts:
        manager: apt

    - name: Assert Nginx is installed at the pinned version
      ansible.builtin.assert:
        that:
          - "'nginx' in ansible_facts.packages"
        fail_msg: "Nginx is not installed — role task failed silently"

    - name: Confirm Nginx service is running and enabled
      ansible.builtin.service_facts:

    - name: Assert Nginx service state
      ansible.builtin.assert:
        that:
          - "ansible_facts.services['nginx.service'].state == 'running'"
          - "ansible_facts.services['nginx.service'].status == 'enabled'"
        fail_msg: "Nginx is not running or not enabled — handler or service task failed"

    - name: Confirm Nginx is listening on the default port
      ansible.builtin.wait_for:
        port: "{{ nginx_port | default(80) }}"
        timeout: 5
        msg: "Nginx is not listening on port {{ nginx_port | default(80) }}"

    - name: Verify Nginx config is valid
      ansible.builtin.command: nginx -t
      register: nginx_test
      changed_when: false
      failed_when: nginx_test.rc != 0

---
# io.thecodeforge: Molecule converge playbook — custom_paths scenario
# File: molecule/custom_paths/converge.yml
# This scenario runs the role with NON-DEFAULT variable values.
# It catches hardcoded paths and values in tasks/main.yml.
# If this scenario fails where default/ passes, a path is hardcoded.

- name: Converge — test nginx role with non-default variable values
  hosts: all
  become: true

  roles:
    - role: io.thecodeforge.nginx
      vars:
        nginx_port: 8080                         # Non-default: catches port hardcoding
        nginx_worker_processes: 2                 # Non-default: catches proc count hardcoding
        nginx_log_dir: /var/log/nginx_custom      # Non-default: catches path hardcoding
        nginx_config_dir: /etc/nginx_custom       # Non-default: catches config path hardcoding
        # Every variable in defaults/main.yml should appear here with a non-default value.
        # If the role fails this scenario, find the hardcoded value and move it to defaults/.

---
# io.thecodeforge: CI pipeline configuration for Molecule testing
# File: .gitlab-ci.yml excerpt
# This runs both Molecule scenarios on every merge request.

# molecule_test:
#   stage: test
#   image: quay.io/ansible/community-ansible-dev-tools:latest
#   before_script:
#     - pip install molecule molecule-plugins[docker] ansible-lint
#   script:
#     - cd roles/io.thecodeforge.nginx
#     - ansible-lint .                              # Lint first — fast failure
#     - molecule test --scenario-name default       # Default values + idempotency check
#     - molecule test --scenario-name custom_paths  # Non-default values + hardcoding check
#   rules:
#     - if: '$CI_PIPELINE_SOURCE == "merge_request_event"'
#   artifacts:
#     when: on_failure
#     paths:
#       - roles/io.thecodeforge.nginx/.molecule/
#     expire_in: 3 days
The Two Molecule Scenarios Every Role Needs
Every role needs at minimum two Molecule scenarios. The default scenario tests with the values in defaults/main.yml and verifies idempotency on a second run — Molecule does this automatically. The custom scenario tests with non-default values for every variable in defaults/main.yml and verifies the role still works. If default passes but custom fails, there's a hardcoded value somewhere in tasks/main.yml. The production incident in this article would have been caught by the custom scenario on day one. Neither scenario is optional for a role that will be shared.
Production Insight
Molecule's idempotency check — running the role twice and failing on any 'changed' in the second run — is the single most valuable automated test for an Ansible role. It catches the shell module, timestamps in templates, and non-deterministic file generation before they reach production.
The custom_paths scenario has saved the io.thecodeforge team from three separate hardcoding incidents in shared roles. Each one was caught in CI on a merge request, not in a production deployment.
Rule: no role change merges without Molecule passing both scenarios. This is the hard gate. A role that breaks its test scenarios is not ready for production regardless of how confident the author is.
Key Takeaway
A role without Molecule tests is a role you discover is broken in production.
The default scenario catches idempotency bugs. The custom_paths scenario catches hardcoded values. Both are required for any role that leaves its author's laptop.
Molecule in CI is the hard gate between 'I think this works' and 'I have verified this works.'
Molecule Test Scenario Design
IfRole is used by exactly one team in one environment
UseMinimum: default scenario with idempotency check. The custom_paths scenario is still strongly recommended — it catches issues before the role grows.
IfRole is shared across multiple teams or published to Galaxy
UseRequired: default scenario, custom_paths scenario with every defaults/ variable at a non-default value, and platform scenarios for each supported OS
IfRole manages a stateful service (database, message queue, persistent storage)
UseAdd a separate scenario that tests idempotency with data present — the second run should make zero changes even with real data in place
IfRole uses conditional logic based on OS family or distribution
UseAdd separate scenarios per platform (ubuntu_22, rhel_9) to verify the OS-specific code paths independently
● Production incidentPOST-MORTEMseverity: high

The Role That Couldn't Be Reused

Symptom
The second team spent three days adapting the role to their environment. Every environment-specific change — dev versus prod data directories, different log paths for compliance, a different PostgreSQL major version — required editing tasks/main.yml directly. When the original team patched a security configuration bug six weeks later, the fork never received the fix. Eight months in, the two versions had diverged enough that merging them was estimated at two weeks of work. The team chose to continue maintaining both.
Assumption
The original team assumed every PostgreSQL instance in the organization would use /var/lib/postgresql/14/main. It seemed like a reasonable default at the time — it was the standard Debian package layout, and they'd never needed to deviate. They put the path literal directly in tasks/main.yml because it was faster and they weren't thinking about reuse. There was no defaults/main.yml at all when the second team first opened the role.
Root cause
The role's tasks/main.yml contained literal path strings throughout: dest: /var/lib/postgresql/14/main/postgresql.conf, src: /var/lib/postgresql/14/main/pg_hba.conf. The second team needed /data/pg_production for performance reasons — their storage was mounted separately for I/O isolation. Ansible's variable precedence system would have allowed complete override, but there was no variable to override. The path was a literal string, not a reference to a variable. The defaults/ directory existed but was empty. Every 'configuration' was actually a hardcoded implementation detail.
Fix
Refactored every hardcoded path and version-specific value into defaults/main.yml: postgres_data_dir: /var/lib/postgresql/14/main, postgres_major_version: 14, postgres_config_dir: /etc/postgresql/14/main, postgres_log_dir: /var/log/postgresql. Updated every task reference to use the variable: dest: {{ postgres_data_dir }}/postgresql.conf. The second team set postgres_data_dir: /data/pg_production in their playbook's vars block and the role worked immediately without modification. Added a Molecule test scenario named custom_paths that runs the role with non-default values for every variable in defaults/main.yml — if a future change hardcodes a path, the custom_paths scenario fails.
Key lesson
  • If a value could possibly differ between environments, teams, or PostgreSQL versions, it belongs in defaults/main.yml — not as a literal string in tasks/main.yml. When in doubt, make it a variable.
  • A role with zero entries in defaults/main.yml is almost certainly hiding hardcoded site-specific assumptions somewhere in its tasks. Treat an empty defaults/main.yml as a code smell during role review.
  • A reusable role has no hardcoded site-specific values anywhere in its tasks. Everything that varies belongs in defaults/ with a sensible value that works for the most common case.
  • Test roles with non-default variable values using a dedicated Molecule scenario. If making the test pass requires editing tasks/ rather than just setting different variables, the role isn't reusable yet.
Production debug guideThree failure patterns specific to Ansible roles — with exact diagnostics and fixes for each one.3 entries
Symptom · 01
Variables set in role defaults/main.yml aren't taking effect — inventory or playbook values appear to be ignored
Fix
This is almost always a precedence inversion — the value you're trying to override is in vars/main.yml, not defaults/main.yml. vars/ has much higher precedence than group_vars or host_vars. Run ansible-inventory --host $HOST --vars | grep variable_name to see the resolved value and confirm it's what defaults/ declares. If the value from vars/ is winning over your group_vars, move it to defaults/ or remove it from vars/ entirely. Only constants the role cannot function without belong in vars/.
Symptom · 02
Role works correctly in isolation but fails when combined with other roles in the same playbook
Fix
Two roles are using the same variable name and one is overwriting the other's value. Roles share a global variable namespace — there is no automatic role-level scoping. Run ansible-playbook --list-tasks to see the execution order, then run ansible -m debug -a 'var=port' against a failing host to see what value is actually resolved. The fix is prefixing all role variables with the role name: nginx_port not port, postgres_port not port. Audit every variable in defaults/main.yml and vars/main.yml for both roles and rename any collisions.
Symptom · 03
Role dependency declared in meta/main.yml isn't running before the dependent role, or appears to be skipped entirely
Fix
Dependencies in meta/main.yml run once per playbook and are deduplicated. If the dependency role ran earlier in the same playbook — even in a different play — Ansible considers it already satisfied and skips it. This deduplication is the expected behavior but produces surprising results when the dependency needs to run with different variables for different contexts. Run ansible-galaxy role list --roles-path ./roles to verify the dependency is installed locally. If it's missing, run ansible-galaxy install -r requirements.yml. For circular dependencies, Ansible detects them and breaks the cycle silently — use ansible-playbook -vvv to see the dependency resolution order.
★ Ansible Role Debug Cheat SheetFive commands that diagnose 80% of role-related failures. Run these before refactoring anything.
Role variable not being overridden as expected from inventory or playbook
Immediate action
Dump the fully resolved variable set for the specific failing host
Commands
ansible-inventory -i inventory.ini --host $HOST --vars | jq '.'
ansible -m debug -a 'var=postgres_data_dir' -i inventory.ini $HOST
Fix now
defaults/ has the lowest precedence in Ansible. group_vars overrides it. host_vars overrides group_vars. vars/ in the role overrides group_vars — meaning inventory cannot override vars/. If your variable is in vars/ and you need it to be overridable, move it to defaults/. If it's already in defaults/ and still not being overridden, check for a conflicting host_vars file.
Role dependency declared in meta/main.yml is not running+
Immediate action
Verify the dependency is installed locally and that the declaration syntax is correct
Commands
grep -A 10 'dependencies:' roles/role_name/meta/main.yml
ansible-galaxy role list --roles-path ./roles
Fix now
Dependencies must be installed in the roles/ directory before the playbook runs. Run ansible-galaxy install -r requirements.yml to fetch missing dependencies. If the dependency is installed but still not running, check whether it already ran earlier in the same playbook — Ansible deduplicates dependency runs. Circular dependencies are silently broken — run ansible-playbook -vvv to see the resolution order and identify cycles.
Role tasks failing with 'undefined variable' for a variable that exists in defaults/main.yml+
Immediate action
Confirm the variable is actually in scope for the task that's failing
Commands
ansible-playbook playbook.yml --check -v | grep -B5 'undefined variable'
grep -rn 'variable_name' roles/role_name/ --include='*.yml'
Fix now
defaults/main.yml variables are scoped to the role. A task in a different role cannot access them directly. To share a value across roles, either set it in inventory group_vars (which all roles can read), or use set_fact in a playbook task to promote it to the global play scope. Cross-role variable sharing via group_vars is the cleanest pattern.
Role runs successfully but idempotency is broken — always shows 'changed' on every run+
Immediate action
Identify exactly which task reports changed and what content is changing
Commands
ansible-playbook playbook.yml --check --diff > /tmp/role_diff.txt
grep -B 3 -A 15 'changed:' /tmp/role_diff.txt
Fix now
The most common cause is a Jinja2 template that includes dynamic content — timestamps, randomly generated values, or a fact that changes between runs. Remove dynamic content from templates used for config files. If the template is correct and the file still shows changed, verify that line endings and file encoding are consistent. For files that should only be written once (certificates, initialization tokens), use ansible.builtin.copy with force: no.
Role fails with file not found for a template or static file+
Immediate action
Verify the role directory structure and the src path in the failing task
Commands
ls -la roles/role_name/{files,templates}/
grep -n 'src:' roles/role_name/tasks/main.yml
Fix now
The template module resolves src: relative to the role's templates/ directory. The copy module resolves src: relative to the role's files/ directory. These paths are case-sensitive. If the file exists in the right directory but the task still fails, check that the filename in the src: field matches exactly — including case and extension. Run ls -la on both directories and compare output against the src: value character by character.
Single Playbook vs Ansible Roles — The Operational Difference
AspectSingle PlaybookAnsible Roles
Appropriate scale1-5 tasks on a single host group, one-off operations, scripts you run once. A single playbook is the right tool for small, focused, non-repeating automation.Multi-tier infrastructure, automation shared across teams, anything that runs on a schedule or in CI/CD. Roles pay for their structure the moment a second team needs the same automation.
ReusabilityNone — reusing a playbook requires copy-pasting blocks of YAML and maintaining multiple copies. Any bug fix must be applied to every copy manually.First-class — roles are versioned units with defined interfaces (defaults/). Published to Galaxy or internal Automation Hub. Bug fixes propagate to all consumers via requirements.yml version bumps.
Variable managementGlobal namespace — all variables are visible to all tasks. Name collisions between sections are invisible until they cause wrong behavior at runtime.Structured and separated — defaults/ for overridable config, vars/ for internal constants, clear precedence hierarchy. Prefix variables with role name to prevent global namespace collisions.
TestingManual — run the playbook in staging and verify by hand. Regressions are caught by the next human who notices something is wrong.Automated with Molecule — idempotency check, non-default variable scenario, platform scenarios. Regressions are caught in CI on the merge request.
Team collaborationDifficult at scale — multiple people editing one file creates merge conflicts and unclear ownership. Who is responsible for which section?Parallel ownership — each role has a clear owner and a clear boundary. The nginx team owns the nginx role. The postgres team owns the postgres role. Changes don't conflict.
Maintenance over timeDegrades — a 600-line playbook becomes impossible to read or modify without risk. Engineers avoid changing it, leading to workarounds layered on top of workarounds.Stable — each role stays focused on one service. A postgres role doesn't grow because someone added monitoring. Roles evolve independently at their own pace.

Key takeaways

1
Roles are packaged automation units
the standard directory structure is how Ansible finds tasks, handlers, templates, and variables automatically. Deviating from convention means silent failures, not helpful errors.
2
defaults/main.yml has the lowest precedence of any variable source
inventory, group_vars, and the calling playbook's vars block all override it. vars/main.yml has much higher precedence — inventory cannot override it. Put overridable values in defaults/, internal constants in vars/. Confusing these two is the most common reason roles are inflexible.
3
One role per service. A role named after two services, or a tasks/main.yml with when
conditions to skip sections for different callers, is really two roles in one directory. Decompose and compose in the playbook.
4
A reusable role has zero hardcoded site-specific values. Every path, port, version, username, and configurable option must be a variable with a sensible default. An empty defaults/main.yml is a red flag
it almost always means the role has hidden hardcoded assumptions.
5
Molecule is the hard gate between 'I think this works' and 'I have verified this works.' Two required scenarios
default (idempotency check with default values) and custom_paths (verifies the role works with non-default values for every variable). No role change merges without both passing.
6
Handler dependencies declared in meta/main.yml run automatically before the dependent role regardless of playbook order. Circular dependencies are silently broken
Ansible detects them and continues without error, which means required configuration may be silently omitted.
7
Prefix every role variable with the role name
nginx_port, postgres_port, haproxy_timeout. Roles share a global variable namespace — unprefixed variable names collide silently and produce wrong behavior with no error message.
8
import_role is static (parsed at playbook load time
handlers work, tags propagate into the role). include_role is dynamic (evaluated at runtime — use for conditional or looped role application, but handlers from included roles may not be in the handler registry).

Common mistakes to avoid

6 patterns
×

Creating God Roles that configure multiple unrelated services

Symptom
Role named server_setup has 600 lines covering Docker installation, PostgreSQL configuration, Nginx setup, Prometheus node exporter, SSH hardening, and log rotation. Teams can't use just the PostgreSQL portion. Every change risks breaking the Nginx section. Testing requires a full-stack container. New engineers avoid modifying it entirely.
Fix
Decompose by service: docker, postgresql, nginx, prometheus_node_exporter, ssh_hardening, log_rotation — each as a separate role with its own defaults/, handlers/, templates/, and Molecule tests. Compose them in site.yml. Each role is independently testable, independently versioned, and independently useful to any team that needs that one service.
×

Hardcoding environment-specific values in tasks/main.yml instead of defaults/main.yml

Symptom
Role contains literal strings: dest: /var/lib/postgresql/14/main/postgresql.conf. A second team needs /data/pg_production. They fork the role. Two versions now diverge. Upstream bug fixes never reach the fork. Six months later merging them is estimated at two weeks.
Fix
Move every configurable value to defaults/main.yml: postgres_data_dir: /var/lib/postgresql/14/main. Reference in tasks: dest: {{ postgres_data_dir }}/postgresql.conf. Add a Molecule custom_paths scenario that sets every defaults/ variable to a non-default value. If that scenario fails, a path is hardcoded somewhere.
×

Confusing defaults/ with vars/ and wondering why inventory overrides have no effect

Symptom
Team sets nginx_port: 8080 in inventory group_vars/webservers.yml. Role still uses port 80. No error message. The port value in vars/main.yml is silently winning over the inventory value because vars/ has higher precedence than group_vars.
Fix
defaults/ has the lowest precedence of any variable source — inventory, group_vars, host_vars, and -e all override it. vars/ has much higher precedence — only host_vars and -e override it, inventory group_vars cannot. The rule: if a value is meant to be overridden by callers, it belongs in defaults/. If it's an internal constant the role needs to function and users should never touch, it belongs in vars/. If you're ever tempted to put an overridable value in vars/, put it in defaults/ instead.
×

Missing role dependencies in meta/main.yml and relying on playbook ordering

Symptom
Role assumes common_security role has already run and that firewall rules allow its service's port. On a fresh host where the playbook order changed, the role fails because the port is blocked. On hosts where the common_security role ran in an earlier play, it works. Failures are intermittent and hard to trace.
Fix
Declare the dependency explicitly in meta/main.yml: dependencies: [{role: common_security}]. Dependencies run automatically before the dependent role regardless of playbook order. Run ansible-galaxy role list --roles-path ./roles to verify the dependency is installed. For conditional dependencies (only on RedHat), use the when key in the dependency declaration.
×

Not namespacing role variables — using generic names that collide across roles

Symptom
Both the nginx role and the haproxy role define a variable named port. When both roles run in the same play, the last-loaded role's value overwrites the first's. Nginx listens on HAProxy's port or vice versa. No error is produced — Ansible uses whichever value happens to be resolved last.
Fix
Prefix every role variable with the role name: nginx_port: 80, haproxy_port: 443, postgres_port: 5432. This is not a naming convention — it's a namespace collision prevention mechanism. Audit every variable in every role's defaults/ and vars/ for generic names: port, user, version, path, log_dir. Rename any that aren't prefixed. Run multiple roles in the same play and use ansible -m debug -a 'var=nginx_port' to verify the expected value resolves correctly.
×

Skipping Molecule tests because 'the role is simple' or 'it works in staging'

Symptom
A new engineer makes what looks like a minor change — adds a line to the template, adjusts a variable default, adds a new task. The role was never tested with Molecule. Three weeks later, a production deploy fails because the template now renders different content on the second run, breaking idempotency on a cron job that runs every 30 minutes.
Fix
Add Molecule testing before the role is used in production, not after. The investment is two hours to write the default and custom scenarios. The payoff is every future change gets an idempotency check and a non-default variable check automatically. Enforce Molecule as a CI gate — no merge without Molecule passing. 'Simple' roles grow, and Molecule catches the moment they become not simple.
INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01SENIOR
Describe the Ansible variable precedence hierarchy. If a variable is def...
Q02SENIOR
What is the specific use case for meta/main.yml in an Ansible Role? Prov...
Q03SENIOR
How does import_role differ from include_role? Describe a production bug...
Q04SENIOR
Explain the DRY principle in the context of Ansible. How do roles facili...
Q05SENIOR
How would you design a CI/CD pipeline to test an Ansible role independen...
Q06SENIOR
When would you use vars_prompt in a playbook instead of defining variabl...
Q01 of 06SENIOR

Describe the Ansible variable precedence hierarchy. If a variable is defined in both defaults/main.yml and vars/main.yml within a role, which one wins? What about group_vars?

ANSWER
vars/main.yml wins over defaults/main.yml. The full hierarchy from lowest to highest precedence, focusing on the levels that matter most in practice: 1. Role defaults (defaults/main.yml) — lowest of all 2. Inventory file variables 3. group_vars/all 4. group_vars/groupname 5. host_vars/hostname 6. Playbook vars block 7. Role vars (vars/main.yml) — much higher than group_vars 8. set_fact / registered variables 9. Extra vars (-e) — highest, overrides everything The production implication: group_vars cannot override vars/main.yml. If an operator sets nginx_port: 8080 in group_vars/webservers.yml and the role has nginx_port: 80 in vars/main.yml, the role wins. No error. Wrong port. This is the most common precedence bug in shared roles. The rule: defaults/ for values users should be able to override. vars/ only for internal constants the role cannot function without — package names, service names, fixed permissions. Every value that might legitimately differ between environments or teams must be in defaults/.
FAQ · 5 QUESTIONS

Frequently Asked Questions

01
What's the difference between defaults/main.yml and vars/main.yml in an Ansible role?
02
How do Ansible role dependencies work, and what happens with circular dependencies?
03
How do you share roles across multiple projects and teams?
04
What is a God role and why is it a problem?
05
How does Molecule verify that a role is idempotent?
🔥

That's Ansible. Mark it forged?

6 min read · try the examples if you haven't

Previous
Ansible Playbooks Explained
3 / 3 · Ansible
Next
Introduction to Google Cloud Platform