Home DevOps Ansible Basics for Cloud Automation — Playbooks, Inventory and Roles Explained

Ansible Basics for Cloud Automation — Playbooks, Inventory and Roles Explained

In Plain English 🔥
Imagine you're a school principal and you need to give the same set of instructions to 500 students across 10 classrooms. Instead of walking into each room yourself, you write a single instruction sheet and hand it to the teachers — they deliver the message simultaneously. Ansible works exactly like that. You write your instructions once (in a 'playbook'), tell Ansible which servers to target (your 'inventory'), and it SSHes into all of them at the same time and executes everything in order — no agent software needed on any server.
⚡ Quick Answer
Imagine you're a school principal and you need to give the same set of instructions to 500 students across 10 classrooms. Instead of walking into each room yourself, you write a single instruction sheet and hand it to the teachers — they deliver the message simultaneously. Ansible works exactly like that. You write your instructions once (in a 'playbook'), tell Ansible which servers to target (your 'inventory'), and it SSHes into all of them at the same time and executes everything in order — no agent software needed on any server.

Every cloud infrastructure beyond a certain size hits the same wall: someone on the team spends their Friday manually SSH-ing into 30 servers, running the same five commands, and praying they didn't typo on server 27. It's slow, error-prone, and completely unauditable. Scale that to hundreds of EC2 instances or GCP VMs and the problem stops being annoying — it becomes a business risk. That's the exact world Ansible was built to fix.

Ansible solves the configuration drift problem. Configuration drift is what happens when servers that were identical at launch slowly become subtly different because one was patched manually, another got a dependency installed ad-hoc, and a third had a config file tweaked at 2am during an incident. Ansible enforces a declared, version-controlled state across every machine in your fleet simultaneously, making your infrastructure reproducible and auditable from a single YAML file checked into Git.

By the end of this article you'll understand not just how to write Ansible playbooks, but why they're structured the way they are. You'll know how inventory files map to real cloud environments, how roles let you package reusable automation, and how handlers let you respond intelligently to changes — like restarting Nginx only when its config actually changed. You'll walk away ready to automate a real multi-server cloud deployment from scratch.

How Ansible Connects to Your Cloud Servers — Inventory Files Demystified

Before Ansible can do anything, it needs to know what to automate. That's the job of the inventory file. Think of it as your contacts list — it maps human-readable group names like 'webservers' or 'databases' to actual IP addresses or DNS hostnames.

In cloud environments, hard-coding IPs is a bad idea. Instances get recycled, autoscaling groups spawn new ones, and elastic IPs get reassigned. That's why Ansible supports dynamic inventory — scripts or plugins that query AWS, GCP, or Azure APIs at runtime to fetch the current list of live instances. For now we'll start with static inventory because it teaches the mental model, and then we'll show the dynamic plugin pattern.

Groups matter because they let you target subsets of your fleet precisely. You might run a security-patching playbook against only the 'production' group, or a database migration only against 'db_primary'. The colon-bracket syntax for group inheritance lets you build a clean hierarchy: 'eu_west' can be a child of 'production' which is a child of 'all'. Ansible resolves that tree at runtime — you don't have to repeat hostnames.

Variables declared in inventory travel with the host or group they're attached to, which is cleaner than duplicating them inside every playbook. Host variables override group variables, and group variables override the global 'all' group — this precedence order saves you from a lot of conditional logic inside playbooks.

cloud_inventory.ini · INI
12345678910111213141516171819202122232425262728293031323334353637
# cloud_inventory.ini
# Static inventory for a three-tier cloud application
# In production you'd replace this with a dynamic inventory plugin
# (e.g. amazon.aws.aws_ec2) that queries the AWS API directly

[load_balancers]
# Format: alias ansible_host=<IP> ansible_user=<SSH user>
nginx-lb-01 ansible_host=54.210.100.11 ansible_user=ubuntu

[web_servers]
# Three app servers behind the load balancer
app-server-01 ansible_host=10.0.1.10 ansible_user=ubuntu
app-server-02 ansible_host=10.0.1.11 ansible_user=ubuntu
app-server-03 ansible_host=10.0.1.12 ansible_user=ubuntu

[database_servers]
# Primary and replica — we'll target them separately in playbooks
db-primary-01 ansible_host=10.0.2.10 ansible_user=ubuntu
db-replica-01 ansible_host=10.0.2.11 ansible_user=ubuntu

# Group of groups — 'production' contains all three tiers
[production:children]
load_balancers
web_servers
database_servers

# Group-level variables applied to every host in web_servers
[web_servers:vars]
# These become available as variables inside any play targeting web_servers
app_port=8080
deploy_user=apprunner
max_connections=1000

# Host-level variable overrides group-level variables
# app-server-03 is a canary so it runs on a different port
[web_servers]
app-server-03 ansible_host=10.0.1.12 ansible_user=ubuntu app_port=8090
▶ Output
# Run: ansible-inventory -i cloud_inventory.ini --list
# Ansible parses the file and outputs a JSON representation:
{
"_meta": {
"hostvars": {
"app-server-01": { "ansible_host": "10.0.1.10", "ansible_user": "ubuntu", "app_port": "8080", "deploy_user": "apprunner" },
"app-server-03": { "ansible_host": "10.0.1.12", "ansible_user": "ubuntu", "app_port": "8090" },
"db-primary-01": { "ansible_host": "10.0.2.10", "ansible_user": "ubuntu" }
}
},
"production": {
"children": ["load_balancers", "web_servers", "database_servers"]
},
"web_servers": {
"hosts": ["app-server-01", "app-server-02", "app-server-03"]
}
}
⚠️
Pro Tip: Use Dynamic Inventory for AWSInstall the 'amazon.aws' collection (ansible-galaxy collection install amazon.aws), create an aws_ec2.yml plugin config file, and Ansible will query EC2 at runtime and group instances by tags like 'Environment: production' automatically. You never touch the inventory file when instances are added or terminated.

Writing Playbooks That Actually Reflect Production Reality

A playbook is where your intent lives. It answers three questions: which servers (hosts), as which user (become/remote_user), and what should be true (tasks). The word 'should be true' is important — Ansible tasks are declarative. You're not writing a script that says 'install Nginx'. You're asserting 'Nginx must be installed and at version 1.24'. Ansible figures out whether it needs to do anything.

This idempotency is the killer feature. Run the same playbook ten times — the first run installs and configures everything. Runs two through ten do nothing because the desired state already exists. This means you can safely schedule playbooks as a drift-correction cron job without fear of them breaking things.

Handlers are Ansible's 'only react to real changes' mechanism. Instead of always restarting Nginx after a config task, you notify a handler. The handler only fires if that task reported a change. If the config file was already correct, the handler never runs — no unnecessary service restart, no downtime spike.

Variables from the inventory we defined in section one flow directly into playbook tasks through Jinja2 template syntax — double curly braces. This is how one playbook serves multiple environments: 'app_port' on staging might be 8080 and on production 443. The playbook doesn't change. The inventory does.

deploy_web_app.yml · YAML
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172
---
# deploy_web_app.yml
# Deploys and configures our Node.js app behind Nginx on all web_servers
# Run with: ansible-playbook -i cloud_inventory.ini deploy_web_app.yml

- name: Configure Nginx reverse proxy and deploy application
  hosts: web_servers          # Matches the group name from our inventory file
  become: true                # Escalate to sudo — most package installs require it
  gather_facts: true          # Collect OS info (distro, arch) — used in conditionals below

  # Variables defined here override group vars but lose to host vars
  vars:
    node_version: "20"        # LTS version of Node.js to install
    app_directory: "/opt/webapp"
    nginx_config_path: "/etc/nginx/sites-available/webapp.conf"

  # Handlers only execute when notified AND only once per play, even if notified 10 times
  handlers:
    - name: Restart Nginx
      ansible.builtin.service:
        name: nginx
        state: restarted      # 'restarted' always restarts; use 'reloaded' for config-only changes

    - name: Restart application service
      ansible.builtin.service:
        name: webapp
        state: restarted

  tasks:
    # TASK 1: Ensure Nginx is installed — idempotent, does nothing if already present
    - name: Install Nginx web server
      ansible.builtin.apt:
        name: nginx
        state: present         # 'present' = install if missing; 'latest' = always upgrade
        update_cache: true     # Equivalent to running apt-get update first

    # TASK 2: Deploy our Nginx config using a Jinja2 template
    # If the rendered template differs from what's on disk, this reports 'changed'
    - name: Deploy Nginx reverse proxy configuration
      ansible.builtin.template:
        src: templates/nginx_webapp.conf.j2   # Local template file on your Ansible control node
        dest: "{{ nginx_config_path }}"
        owner: root
        group: root
        mode: '0644'
      notify: Restart Nginx    # Only restarts Nginx if this task actually changed the file

    # TASK 3: Create app directory — 'state: directory' is idempotent
    - name: Ensure application directory exists
      ansible.builtin.file:
        path: "{{ app_directory }}"
        state: directory
        owner: "{{ deploy_user }}"    # 'deploy_user' comes from inventory group vars
        mode: '0755'

    # TASK 4: Conditional — only run on Debian/Ubuntu hosts
    # ansible_facts['os_family'] is populated by gather_facts: true above
    - name: Install Node.js on Debian-based hosts
      ansible.builtin.shell: |
        curl -fsSL https://deb.nodesource.com/setup_{{ node_version }}.x | bash -
        apt-get install -y nodejs
      when: ansible_facts['os_family'] == 'Debian'
      args:
        creates: /usr/bin/node   # Skip this task entirely if /usr/bin/node already exists

    # TASK 5: Ensure the webapp systemd service is enabled and running
    - name: Enable and start webapp service
      ansible.builtin.service:
        name: webapp
        state: started
        enabled: true           # Ensures it survives a server reboot
      notify: Restart application service
▶ Output
PLAY [Configure Nginx reverse proxy and deploy application] ********************

TASK [Gathering Facts] *********************************************************
ok: [app-server-01]
ok: [app-server-02]
ok: [app-server-03]

TASK [Install Nginx web server] ************************************************
ok: [app-server-01] <- already installed, no change
changed: [app-server-02] <- was missing, just installed
ok: [app-server-03]

TASK [Deploy Nginx reverse proxy configuration] ********************************
changed: [app-server-01] <- config file differed, updated
changed: [app-server-02]
ok: [app-server-03] <- config was already correct, handler won't fire

RUNNING HANDLER [Restart Nginx] ************************************************
changed: [app-server-01] <- handler fired because task above reported changed
changed: [app-server-02] <- same
# app-server-03 did NOT restart Nginx — zero unnecessary downtime

PLAY RECAP *********************************************************************
app-server-01 : ok=5 changed=2 unreachable=0 failed=0
app-server-02 : ok=5 changed=3 unreachable=0 failed=0
app-server-03 : ok=4 changed=0 unreachable=0 failed=0
⚠️
Watch Out: 'state: latest' in ProductionUsing 'state: latest' for package installs looks convenient but means a Monday morning playbook run can silently upgrade Nginx from 1.24 to 1.26 across production. Pin your versions with 'state: present' plus an explicit version string (e.g. 'name: nginx=1.24.*') so upgrades are intentional, not accidental.

Roles — The Way You Make Ansible Scale Across a Real Organisation

Once your playbook file hits 200 lines, you feel the pain. Tasks for installing Python, configuring PostgreSQL, hardening SSH, and deploying your app are all tangled together. Copy-pasting chunks of it into a second playbook for a different project means you now have two things to maintain. This is exactly the problem Ansible roles solve.

A role is a standardised directory structure that packages tasks, variables, templates, handlers and files into a self-contained unit. You write a 'nodejs_app' role once and drop it into any playbook with two lines. The role can even declare its own dependencies on other roles — install the 'common_security' role before the 'nodejs_app' role, every time, automatically.

Roles also enforce a convention that makes onboarding new team members faster. Anyone who knows Ansible knows exactly where to look: tasks are in tasks/main.yml, default variable values are in defaults/main.yml (overridable), and static files to copy are in files/. There's no scavenger hunt.

For cloud teams, the practical pattern is to host roles in a private Git repository or Ansible Galaxy, version-tag them, and pull specific versions in a requirements.yml file. This gives you the same dependency pinning discipline you'd have in a package.json or requirements.txt.

roles/nginx_proxy/tasks/main.yml · YAML
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677
# Directory structure of the nginx_proxy role:
# roles/
# └── nginx_proxy/
#     ├── tasks/
#     │   └── main.yml        <- you are here
#     ├── handlers/
#     │   └── main.yml        <- handlers scoped to this role
#     ├── templates/
#     │   └── nginx.conf.j2   <- Jinja2 config template
#     ├── defaults/
#     │   └── main.yml        <- lowest-priority variables (safe to override)
#     ├── vars/
#     │   └── main.yml        <- higher-priority vars (internal role constants)
#     └── meta/
#         └── main.yml        <- role metadata and dependencies

# roles/nginx_proxy/defaults/main.yml — overridable defaults
# nginx_worker_processes: auto
# nginx_client_max_body_size: "10m"
# backend_port: "{{ app_port }}"   <- inherits from inventory group vars

---
# roles/nginx_proxy/tasks/main.yml

- name: Install Nginx
  ansible.builtin.apt:
    name: "nginx"
    state: present
    update_cache: true

- name: Remove default Nginx site to avoid port conflicts
  ansible.builtin.file:
    path: /etc/nginx/sites-enabled/default
    state: absent             # Deletes the file if it exists — idempotent
  notify: Reload Nginx        # Reload is gentler than restart — no dropped connections

- name: Deploy reverse proxy config from template
  ansible.builtin.template:
    src: nginx.conf.j2        # Ansible looks in roles/nginx_proxy/templates/ automatically
    dest: /etc/nginx/sites-available/webapp
    validate: 'nginx -t -c %s'  # Runs nginx syntax check BEFORE writing the file to disk
  notify: Reload Nginx

- name: Enable webapp site by symlinking into sites-enabled
  ansible.builtin.file:
    src: /etc/nginx/sites-available/webapp
    dest: /etc/nginx/sites-enabled/webapp
    state: link               # Creates a symlink — Nginx's standard activation mechanism
  notify: Reload Nginx

- name: Ensure Nginx is started and enabled on boot
  ansible.builtin.service:
    name: nginx
    state: started
    enabled: true

---
# roles/nginx_proxy/handlers/main.yml

- name: Reload Nginx
  ansible.builtin.service:
    name: nginx
    state: reloaded           # Graceful config reload — no dropped connections

---
# site_playbook.yml — the top-level playbook that consumes roles
# This is now just 12 lines instead of 150

- name: Deploy full application stack to production web servers
  hosts: web_servers
  become: true

  roles:
    - role: nginx_proxy
      vars:
        nginx_client_max_body_size: "25m"   # Override the default for this specific play
    - role: nodejs_app                       # A second role installed from requirements.yml
▶ Output
PLAY [Deploy full application stack to production web servers] *****************

TASK [nginx_proxy : Install Nginx] *********************************************
ok: [app-server-01]
ok: [app-server-02]
ok: [app-server-03]

TASK [nginx_proxy : Remove default Nginx site to avoid port conflicts] *********
changed: [app-server-01]
changed: [app-server-02]
ok: [app-server-03]

TASK [nginx_proxy : Deploy reverse proxy config from template] *****************
changed: [app-server-01]
ok: [app-server-02]
ok: [app-server-03]

TASK [nginx_proxy : Enable webapp site by symlinking into sites-enabled] *******
ok: [app-server-01]
ok: [app-server-02]
ok: [app-server-03]

RUNNING HANDLER [nginx_proxy : Reload Nginx] ***********************************
changed: [app-server-01] <- only servers with config changes got reloaded
changed: [app-server-02]

PLAY RECAP *********************************************************************
app-server-01 : ok=5 changed=3 unreachable=0 failed=0
app-server-02 : ok=5 changed=2 unreachable=0 failed=0
app-server-03 : ok=4 changed=0 unreachable=0 failed=0
🔥
Interview Gold: defaults vs vars in a RoleVariables in 'defaults/main.yml' have the lowest priority in Ansible's 22-level variable precedence chain — any inventory variable, playbook variable or extra-var will override them. Variables in 'vars/main.yml' have much higher priority and are harder to override. Use defaults for things you *want* callers to customise, and vars for internal role constants that should never be changed from outside.
AspectAnsible (Agentless)Puppet / Chef (Agent-based)
Setup complexityLow — only SSH and Python needed on targetsHigh — agent must be installed and registered on every node
Communication modelPush — control node initiates SSH connectionPull — agents periodically check in with a master server
LanguageYAML playbooks, Jinja2 templatesPuppet DSL / Ruby DSL respectively
Execution modelRuns tasks sequentially per host by defaultAgents run on their own schedule, eventual consistency
Cloud autoscaling fitExcellent — new instances need only SSH accessRequires agent bootstrap step during instance launch
IdempotencyBuilt in when using proper modulesBuilt in by design across both tools
Windows supportVia WinRM — functional but less matureStrong native Windows support in both tools
Best forCloud provisioning, ad-hoc ops, CI/CD pipelinesLarge-scale configuration management with long-lived servers

🎯 Key Takeaways

  • Ansible's agentless architecture means your only prerequisite on cloud target hosts is SSH access and Python — this makes it a natural fit for ephemeral cloud instances that come and go via autoscaling.
  • Idempotency is not automatic — it depends on using proper Ansible modules instead of raw shell commands. A playbook built on 'apt', 'template', and 'service' modules is safe to run repeatedly; one built on 'shell' commands is not.
  • Handlers are Ansible's built-in 'only act on real change' mechanism — they prevent unnecessary service restarts and are deduplicated automatically within a play, so notifying the same handler ten times still only restarts the service once.
  • Roles are the unit of reuse in Ansible — the split between 'defaults/' (overridable by callers) and 'vars/' (internal constants) is the key design decision that determines how flexible or locked-down your role is for other teams.

⚠ Common Mistakes to Avoid

  • Mistake 1: Using the 'shell' or 'command' module when a dedicated module exists — Symptom: playbook is not idempotent, shell tasks always report 'changed' even when nothing changed — Fix: replace 'shell: apt-get install nginx' with 'ansible.builtin.apt: name=nginx state=present'. The apt module checks current state before acting; the shell module just executes blindly every time. Only fall back to 'shell' when no module covers your use case, and always pair it with 'args: creates:' or 'changed_when: false' to restore idempotency.
  • Mistake 2: Storing secrets (passwords, API keys) in plaintext inventory or variable files committed to Git — Symptom: credentials exposed in source control, security audit failure — Fix: use Ansible Vault. Run 'ansible-vault encrypt_string "mysecretpassword" --name db_password' and paste the encrypted block directly into your vars file. Commit the encrypted value safely, and pass the vault password via --vault-password-file or a CI/CD secret at runtime. Never store a plaintext vault password file in the repo.
  • Mistake 3: Targeting 'hosts: all' in a playbook without an explicit inventory file flag — Symptom: playbook accidentally runs against production when you intended staging, or fails with 'Could not match supplied host pattern' — Fix: always run playbooks with an explicit '-i' flag ('ansible-playbook -i staging_inventory.ini site.yml') and add a play-level assertion at the top of critical playbooks: 'assert: that: inventory_hostname in groups["staging"]' to hard-fail if the wrong inventory was passed.

Interview Questions on This Topic

  • QWhat is idempotency in the context of Ansible, and how does it affect which modules you choose for a task?
  • QWalk me through the variable precedence order in Ansible — specifically, what wins between a role default, a group variable in inventory, and an extra-var passed on the command line?
  • QA handler is notified by three different tasks in a single play, but you only want it to run once at the end. Does Ansible already handle this, and can you explain how — and what happens if you use 'listen' on a handler instead of a direct name reference?

Frequently Asked Questions

Do I need to install anything on the servers Ansible manages?

For Linux targets, you only need SSH enabled and Python installed — Python 3.x is present by default on virtually all modern cloud images (Ubuntu, Amazon Linux, RHEL). Ansible connects over SSH from your control node and transfers small Python scripts to execute tasks. No agent, no daemon, nothing listening on a port.

What is the difference between an Ansible playbook and an Ansible role?

A playbook is a complete, runnable automation file that ties together hosts, variables, and tasks for a specific job — like 'deploy the web app to production'. A role is a reusable, packaged unit of automation (tasks, templates, handlers, defaults) with no host targeting of its own. Playbooks consume roles. Think of a role as a library function and a playbook as the main program that calls it.

How does Ansible handle secrets like database passwords or API keys?

Ansible Vault is the built-in solution. You can encrypt an entire variable file ('ansible-vault encrypt secrets.yml') or inline individual values using 'ansible-vault encrypt_string'. The encrypted ciphertext is safe to commit to Git. At runtime you provide the vault password via a prompt, a file, or an environment variable. For CI/CD pipelines, store the vault password as a protected CI secret and pass it with '--vault-password-file'.

🔥
TheCodeForge Editorial Team Verified Author

Written and reviewed by senior developers with real-world experience across enterprise, startup and open-source projects. Every article on TheCodeForge is written to be clear, accurate and genuinely useful — not just SEO filler.

← PreviousTerraform BasicsNext →Cloud Cost Optimisation
Forged with 🔥 at TheCodeForge.io — Where Developers Are Forged