Ansible Basics for Cloud Automation — Playbooks, Inventory and Roles Explained
Every cloud infrastructure beyond a certain size hits the same wall: someone on the team spends their Friday manually SSH-ing into 30 servers, running the same five commands, and praying they didn't typo on server 27. It's slow, error-prone, and completely unauditable. Scale that to hundreds of EC2 instances or GCP VMs and the problem stops being annoying — it becomes a business risk. That's the exact world Ansible was built to fix.
Ansible solves the configuration drift problem. Configuration drift is what happens when servers that were identical at launch slowly become subtly different because one was patched manually, another got a dependency installed ad-hoc, and a third had a config file tweaked at 2am during an incident. Ansible enforces a declared, version-controlled state across every machine in your fleet simultaneously, making your infrastructure reproducible and auditable from a single YAML file checked into Git.
By the end of this article you'll understand not just how to write Ansible playbooks, but why they're structured the way they are. You'll know how inventory files map to real cloud environments, how roles let you package reusable automation, and how handlers let you respond intelligently to changes — like restarting Nginx only when its config actually changed. You'll walk away ready to automate a real multi-server cloud deployment from scratch.
How Ansible Connects to Your Cloud Servers — Inventory Files Demystified
Before Ansible can do anything, it needs to know what to automate. That's the job of the inventory file. Think of it as your contacts list — it maps human-readable group names like 'webservers' or 'databases' to actual IP addresses or DNS hostnames.
In cloud environments, hard-coding IPs is a bad idea. Instances get recycled, autoscaling groups spawn new ones, and elastic IPs get reassigned. That's why Ansible supports dynamic inventory — scripts or plugins that query AWS, GCP, or Azure APIs at runtime to fetch the current list of live instances. For now we'll start with static inventory because it teaches the mental model, and then we'll show the dynamic plugin pattern.
Groups matter because they let you target subsets of your fleet precisely. You might run a security-patching playbook against only the 'production' group, or a database migration only against 'db_primary'. The colon-bracket syntax for group inheritance lets you build a clean hierarchy: 'eu_west' can be a child of 'production' which is a child of 'all'. Ansible resolves that tree at runtime — you don't have to repeat hostnames.
Variables declared in inventory travel with the host or group they're attached to, which is cleaner than duplicating them inside every playbook. Host variables override group variables, and group variables override the global 'all' group — this precedence order saves you from a lot of conditional logic inside playbooks.
# cloud_inventory.ini # Static inventory for a three-tier cloud application # In production you'd replace this with a dynamic inventory plugin # (e.g. amazon.aws.aws_ec2) that queries the AWS API directly [load_balancers] # Format: alias ansible_host=<IP> ansible_user=<SSH user> nginx-lb-01 ansible_host=54.210.100.11 ansible_user=ubuntu [web_servers] # Three app servers behind the load balancer app-server-01 ansible_host=10.0.1.10 ansible_user=ubuntu app-server-02 ansible_host=10.0.1.11 ansible_user=ubuntu app-server-03 ansible_host=10.0.1.12 ansible_user=ubuntu [database_servers] # Primary and replica — we'll target them separately in playbooks db-primary-01 ansible_host=10.0.2.10 ansible_user=ubuntu db-replica-01 ansible_host=10.0.2.11 ansible_user=ubuntu # Group of groups — 'production' contains all three tiers [production:children] load_balancers web_servers database_servers # Group-level variables applied to every host in web_servers [web_servers:vars] # These become available as variables inside any play targeting web_servers app_port=8080 deploy_user=apprunner max_connections=1000 # Host-level variable overrides group-level variables # app-server-03 is a canary so it runs on a different port [web_servers] app-server-03 ansible_host=10.0.1.12 ansible_user=ubuntu app_port=8090
# Ansible parses the file and outputs a JSON representation:
{
"_meta": {
"hostvars": {
"app-server-01": { "ansible_host": "10.0.1.10", "ansible_user": "ubuntu", "app_port": "8080", "deploy_user": "apprunner" },
"app-server-03": { "ansible_host": "10.0.1.12", "ansible_user": "ubuntu", "app_port": "8090" },
"db-primary-01": { "ansible_host": "10.0.2.10", "ansible_user": "ubuntu" }
}
},
"production": {
"children": ["load_balancers", "web_servers", "database_servers"]
},
"web_servers": {
"hosts": ["app-server-01", "app-server-02", "app-server-03"]
}
}
Writing Playbooks That Actually Reflect Production Reality
A playbook is where your intent lives. It answers three questions: which servers (hosts), as which user (become/remote_user), and what should be true (tasks). The word 'should be true' is important — Ansible tasks are declarative. You're not writing a script that says 'install Nginx'. You're asserting 'Nginx must be installed and at version 1.24'. Ansible figures out whether it needs to do anything.
This idempotency is the killer feature. Run the same playbook ten times — the first run installs and configures everything. Runs two through ten do nothing because the desired state already exists. This means you can safely schedule playbooks as a drift-correction cron job without fear of them breaking things.
Handlers are Ansible's 'only react to real changes' mechanism. Instead of always restarting Nginx after a config task, you notify a handler. The handler only fires if that task reported a change. If the config file was already correct, the handler never runs — no unnecessary service restart, no downtime spike.
Variables from the inventory we defined in section one flow directly into playbook tasks through Jinja2 template syntax — double curly braces. This is how one playbook serves multiple environments: 'app_port' on staging might be 8080 and on production 443. The playbook doesn't change. The inventory does.
--- # deploy_web_app.yml # Deploys and configures our Node.js app behind Nginx on all web_servers # Run with: ansible-playbook -i cloud_inventory.ini deploy_web_app.yml - name: Configure Nginx reverse proxy and deploy application hosts: web_servers # Matches the group name from our inventory file become: true # Escalate to sudo — most package installs require it gather_facts: true # Collect OS info (distro, arch) — used in conditionals below # Variables defined here override group vars but lose to host vars vars: node_version: "20" # LTS version of Node.js to install app_directory: "/opt/webapp" nginx_config_path: "/etc/nginx/sites-available/webapp.conf" # Handlers only execute when notified AND only once per play, even if notified 10 times handlers: - name: Restart Nginx ansible.builtin.service: name: nginx state: restarted # 'restarted' always restarts; use 'reloaded' for config-only changes - name: Restart application service ansible.builtin.service: name: webapp state: restarted tasks: # TASK 1: Ensure Nginx is installed — idempotent, does nothing if already present - name: Install Nginx web server ansible.builtin.apt: name: nginx state: present # 'present' = install if missing; 'latest' = always upgrade update_cache: true # Equivalent to running apt-get update first # TASK 2: Deploy our Nginx config using a Jinja2 template # If the rendered template differs from what's on disk, this reports 'changed' - name: Deploy Nginx reverse proxy configuration ansible.builtin.template: src: templates/nginx_webapp.conf.j2 # Local template file on your Ansible control node dest: "{{ nginx_config_path }}" owner: root group: root mode: '0644' notify: Restart Nginx # Only restarts Nginx if this task actually changed the file # TASK 3: Create app directory — 'state: directory' is idempotent - name: Ensure application directory exists ansible.builtin.file: path: "{{ app_directory }}" state: directory owner: "{{ deploy_user }}" # 'deploy_user' comes from inventory group vars mode: '0755' # TASK 4: Conditional — only run on Debian/Ubuntu hosts # ansible_facts['os_family'] is populated by gather_facts: true above - name: Install Node.js on Debian-based hosts ansible.builtin.shell: | curl -fsSL https://deb.nodesource.com/setup_{{ node_version }}.x | bash - apt-get install -y nodejs when: ansible_facts['os_family'] == 'Debian' args: creates: /usr/bin/node # Skip this task entirely if /usr/bin/node already exists # TASK 5: Ensure the webapp systemd service is enabled and running - name: Enable and start webapp service ansible.builtin.service: name: webapp state: started enabled: true # Ensures it survives a server reboot notify: Restart application service
TASK [Gathering Facts] *********************************************************
ok: [app-server-01]
ok: [app-server-02]
ok: [app-server-03]
TASK [Install Nginx web server] ************************************************
ok: [app-server-01] <- already installed, no change
changed: [app-server-02] <- was missing, just installed
ok: [app-server-03]
TASK [Deploy Nginx reverse proxy configuration] ********************************
changed: [app-server-01] <- config file differed, updated
changed: [app-server-02]
ok: [app-server-03] <- config was already correct, handler won't fire
RUNNING HANDLER [Restart Nginx] ************************************************
changed: [app-server-01] <- handler fired because task above reported changed
changed: [app-server-02] <- same
# app-server-03 did NOT restart Nginx — zero unnecessary downtime
PLAY RECAP *********************************************************************
app-server-01 : ok=5 changed=2 unreachable=0 failed=0
app-server-02 : ok=5 changed=3 unreachable=0 failed=0
app-server-03 : ok=4 changed=0 unreachable=0 failed=0
Roles — The Way You Make Ansible Scale Across a Real Organisation
Once your playbook file hits 200 lines, you feel the pain. Tasks for installing Python, configuring PostgreSQL, hardening SSH, and deploying your app are all tangled together. Copy-pasting chunks of it into a second playbook for a different project means you now have two things to maintain. This is exactly the problem Ansible roles solve.
A role is a standardised directory structure that packages tasks, variables, templates, handlers and files into a self-contained unit. You write a 'nodejs_app' role once and drop it into any playbook with two lines. The role can even declare its own dependencies on other roles — install the 'common_security' role before the 'nodejs_app' role, every time, automatically.
Roles also enforce a convention that makes onboarding new team members faster. Anyone who knows Ansible knows exactly where to look: tasks are in tasks/main.yml, default variable values are in defaults/main.yml (overridable), and static files to copy are in files/. There's no scavenger hunt.
For cloud teams, the practical pattern is to host roles in a private Git repository or Ansible Galaxy, version-tag them, and pull specific versions in a requirements.yml file. This gives you the same dependency pinning discipline you'd have in a package.json or requirements.txt.
# Directory structure of the nginx_proxy role: # roles/ # └── nginx_proxy/ # ├── tasks/ # │ └── main.yml <- you are here # ├── handlers/ # │ └── main.yml <- handlers scoped to this role # ├── templates/ # │ └── nginx.conf.j2 <- Jinja2 config template # ├── defaults/ # │ └── main.yml <- lowest-priority variables (safe to override) # ├── vars/ # │ └── main.yml <- higher-priority vars (internal role constants) # └── meta/ # └── main.yml <- role metadata and dependencies # roles/nginx_proxy/defaults/main.yml — overridable defaults # nginx_worker_processes: auto # nginx_client_max_body_size: "10m" # backend_port: "{{ app_port }}" <- inherits from inventory group vars --- # roles/nginx_proxy/tasks/main.yml - name: Install Nginx ansible.builtin.apt: name: "nginx" state: present update_cache: true - name: Remove default Nginx site to avoid port conflicts ansible.builtin.file: path: /etc/nginx/sites-enabled/default state: absent # Deletes the file if it exists — idempotent notify: Reload Nginx # Reload is gentler than restart — no dropped connections - name: Deploy reverse proxy config from template ansible.builtin.template: src: nginx.conf.j2 # Ansible looks in roles/nginx_proxy/templates/ automatically dest: /etc/nginx/sites-available/webapp validate: 'nginx -t -c %s' # Runs nginx syntax check BEFORE writing the file to disk notify: Reload Nginx - name: Enable webapp site by symlinking into sites-enabled ansible.builtin.file: src: /etc/nginx/sites-available/webapp dest: /etc/nginx/sites-enabled/webapp state: link # Creates a symlink — Nginx's standard activation mechanism notify: Reload Nginx - name: Ensure Nginx is started and enabled on boot ansible.builtin.service: name: nginx state: started enabled: true --- # roles/nginx_proxy/handlers/main.yml - name: Reload Nginx ansible.builtin.service: name: nginx state: reloaded # Graceful config reload — no dropped connections --- # site_playbook.yml — the top-level playbook that consumes roles # This is now just 12 lines instead of 150 - name: Deploy full application stack to production web servers hosts: web_servers become: true roles: - role: nginx_proxy vars: nginx_client_max_body_size: "25m" # Override the default for this specific play - role: nodejs_app # A second role installed from requirements.yml
TASK [nginx_proxy : Install Nginx] *********************************************
ok: [app-server-01]
ok: [app-server-02]
ok: [app-server-03]
TASK [nginx_proxy : Remove default Nginx site to avoid port conflicts] *********
changed: [app-server-01]
changed: [app-server-02]
ok: [app-server-03]
TASK [nginx_proxy : Deploy reverse proxy config from template] *****************
changed: [app-server-01]
ok: [app-server-02]
ok: [app-server-03]
TASK [nginx_proxy : Enable webapp site by symlinking into sites-enabled] *******
ok: [app-server-01]
ok: [app-server-02]
ok: [app-server-03]
RUNNING HANDLER [nginx_proxy : Reload Nginx] ***********************************
changed: [app-server-01] <- only servers with config changes got reloaded
changed: [app-server-02]
PLAY RECAP *********************************************************************
app-server-01 : ok=5 changed=3 unreachable=0 failed=0
app-server-02 : ok=5 changed=2 unreachable=0 failed=0
app-server-03 : ok=4 changed=0 unreachable=0 failed=0
| Aspect | Ansible (Agentless) | Puppet / Chef (Agent-based) |
|---|---|---|
| Setup complexity | Low — only SSH and Python needed on targets | High — agent must be installed and registered on every node |
| Communication model | Push — control node initiates SSH connection | Pull — agents periodically check in with a master server |
| Language | YAML playbooks, Jinja2 templates | Puppet DSL / Ruby DSL respectively |
| Execution model | Runs tasks sequentially per host by default | Agents run on their own schedule, eventual consistency |
| Cloud autoscaling fit | Excellent — new instances need only SSH access | Requires agent bootstrap step during instance launch |
| Idempotency | Built in when using proper modules | Built in by design across both tools |
| Windows support | Via WinRM — functional but less mature | Strong native Windows support in both tools |
| Best for | Cloud provisioning, ad-hoc ops, CI/CD pipelines | Large-scale configuration management with long-lived servers |
🎯 Key Takeaways
- Ansible's agentless architecture means your only prerequisite on cloud target hosts is SSH access and Python — this makes it a natural fit for ephemeral cloud instances that come and go via autoscaling.
- Idempotency is not automatic — it depends on using proper Ansible modules instead of raw shell commands. A playbook built on 'apt', 'template', and 'service' modules is safe to run repeatedly; one built on 'shell' commands is not.
- Handlers are Ansible's built-in 'only act on real change' mechanism — they prevent unnecessary service restarts and are deduplicated automatically within a play, so notifying the same handler ten times still only restarts the service once.
- Roles are the unit of reuse in Ansible — the split between 'defaults/' (overridable by callers) and 'vars/' (internal constants) is the key design decision that determines how flexible or locked-down your role is for other teams.
⚠ Common Mistakes to Avoid
- ✕Mistake 1: Using the 'shell' or 'command' module when a dedicated module exists — Symptom: playbook is not idempotent, shell tasks always report 'changed' even when nothing changed — Fix: replace 'shell: apt-get install nginx' with 'ansible.builtin.apt: name=nginx state=present'. The apt module checks current state before acting; the shell module just executes blindly every time. Only fall back to 'shell' when no module covers your use case, and always pair it with 'args: creates:' or 'changed_when: false' to restore idempotency.
- ✕Mistake 2: Storing secrets (passwords, API keys) in plaintext inventory or variable files committed to Git — Symptom: credentials exposed in source control, security audit failure — Fix: use Ansible Vault. Run 'ansible-vault encrypt_string "mysecretpassword" --name db_password' and paste the encrypted block directly into your vars file. Commit the encrypted value safely, and pass the vault password via --vault-password-file or a CI/CD secret at runtime. Never store a plaintext vault password file in the repo.
- ✕Mistake 3: Targeting 'hosts: all' in a playbook without an explicit inventory file flag — Symptom: playbook accidentally runs against production when you intended staging, or fails with 'Could not match supplied host pattern' — Fix: always run playbooks with an explicit '-i' flag ('ansible-playbook -i staging_inventory.ini site.yml') and add a play-level assertion at the top of critical playbooks: 'assert: that: inventory_hostname in groups["staging"]' to hard-fail if the wrong inventory was passed.
Interview Questions on This Topic
- QWhat is idempotency in the context of Ansible, and how does it affect which modules you choose for a task?
- QWalk me through the variable precedence order in Ansible — specifically, what wins between a role default, a group variable in inventory, and an extra-var passed on the command line?
- QA handler is notified by three different tasks in a single play, but you only want it to run once at the end. Does Ansible already handle this, and can you explain how — and what happens if you use 'listen' on a handler instead of a direct name reference?
Frequently Asked Questions
Do I need to install anything on the servers Ansible manages?
For Linux targets, you only need SSH enabled and Python installed — Python 3.x is present by default on virtually all modern cloud images (Ubuntu, Amazon Linux, RHEL). Ansible connects over SSH from your control node and transfers small Python scripts to execute tasks. No agent, no daemon, nothing listening on a port.
What is the difference between an Ansible playbook and an Ansible role?
A playbook is a complete, runnable automation file that ties together hosts, variables, and tasks for a specific job — like 'deploy the web app to production'. A role is a reusable, packaged unit of automation (tasks, templates, handlers, defaults) with no host targeting of its own. Playbooks consume roles. Think of a role as a library function and a playbook as the main program that calls it.
How does Ansible handle secrets like database passwords or API keys?
Ansible Vault is the built-in solution. You can encrypt an entire variable file ('ansible-vault encrypt secrets.yml') or inline individual values using 'ansible-vault encrypt_string'. The encrypted ciphertext is safe to commit to Git. At runtime you provide the vault password via a prompt, a file, or an environment variable. For CI/CD pipelines, store the vault password as a protected CI secret and pass it with '--vault-password-file'.
Written and reviewed by senior developers with real-world experience across enterprise, startup and open-source projects. Every article on TheCodeForge is written to be clear, accurate and genuinely useful — not just SEO filler.