Ansible AWS Automation: Production Patterns & Gotchas with amazon.aws Collection
Master Ansible AWS automation with the amazon.aws collection.
20+ years shipping production infrastructure and CI/CD at scale. Drawn from code that ran under real load.
Use amazon.aws collection (>=5.0.0) for all AWS modules; community.aws is deprecated.
Always set state: present and required parameters for idempotency; cloud modules are stateful.
For EC2 instances, use ec2_instance with exact_count and instance_role for production.
S3 bucket creation is idempotent but requires permission: private to avoid public access defaults.
IAM role creation is idempotent but assume role policy document must be exact JSON string.
RDS instances: set skip_final_snapshot: true only for testing; production must handle snapshots.
VPC subnets: use aws_vpc and ec2_vpc_subnet with tags for idempotent lookup.
Dynamic inventory with aws_ec2 plugin: use cache: yes and cache_plugin: jsonfile to avoid API rate limits.
Handle eventual consistency with retries and delay on ec2_instance_info after creation.
Store secrets in SSM Parameter Store with aws_ssm_parameter and no_log: true in playbooks.
Think of Ansible for AWS like a smart remote control for your cloud infrastructure. Instead of clicking buttons in the AWS console, you write a recipe (playbook) that says 'I want exactly 3 servers of this type, with these security settings, and this S3 bucket for logs.' Ansible talks to AWS APIs to make it happen, and if you run the recipe again, it checks what's already there and only changes what's needed — that's idempotency. But AWS is a distributed system, so sometimes when you create a server, it takes a moment for the list of servers to update. Ansible has a 'wait and retry' feature to handle that. And for secrets like database passwords, you store them in AWS SSM Parameter Store, a secure vault, and Ansible pulls them at runtime without exposing them in your code.
I still remember the 3 AM wake-up call. Our production deployment had been running smoothly for months, but that night, a seemingly innocuous change to an Ansible playbook caused a 45-minute outage. The root cause? I had used the deprecated ec2 module instead of ec2_instance, and the module didn't handle idempotency correctly — it terminated all existing instances and created new ones, thinking they were 'extra'. That incident taught me the hard way that Ansible AWS automation requires deep understanding of module behavior, API consistency, and cloud state management.
Historically, Ansible's AWS support started with basic modules like ec2 and s3, which were monolithic and often inconsistent. The community developed workarounds, but the real game-changer was the amazon.aws collection (introduced in Ansible 2.9, now the standard). This collection provides focused, idempotent modules like ec2_instance, s3_bucket, iam_role, and rds_instance, designed to work with the AWS API's eventual consistency model.
In this article, I'll share production patterns I've developed over years of managing thousands of AWS resources with Ansible. We'll cover the essential modules from the amazon.aws collection, dynamic inventory with the aws_ec2 plugin, handling idempotency and eventual consistency, and securing secrets with AWS SSM Parameter Store. Every code example is battle-tested in production environments.
By the end, you'll have a practical playbook (pun intended) for building robust, scalable AWS automation with Ansible that won't cause 3 AM phone calls.
Setting Up the amazon.aws Collection for Production
The first step is ensuring you have the correct collection version. The amazon.aws collection is the replacement for the deprecated community.aws collection. Install it with:
``bash ansible-galaxy collection install amazon.aws:==5.0.0 ``
Pin the version in requirements.yml:
``yaml --- collections: - name: amazon.aws version: '>=5.0.0,<6.0.0' ``
Then run ansible-galaxy collection install -r requirements.yml. The collection requires boto3 and botocore >= 1.21.0. On the control node:
``bash pip install 'boto3>=1.21.0' 'botocore>=1.24.0' ``
For authentication, use IAM instance profiles on EC2 or environment variables (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY). In production, avoid hardcoding secrets; use Ansible Vault or SSM Parameter Store (covered later).
A typical group_vars/all.yml:
``yaml --- ansible_aws_region: us-east-1 ansible_aws_retry_max_attempts: 10 ansible_aws_retry_delay: 5 ``
These variables control retry behavior for all modules. The default retries are often insufficient for eventual consistency.
community.aws.ec2_instance which was actually a redirect to amazon.aws.ec2_instance. After upgrading the collection, the redirect broke and tasks failed with 'module not found'. We fixed it by explicitly using amazon.aws.ec2_instance and removing community.aws from requirements.amazon.aws collection pinned to a major version, and ensure boto3/botocore are up to date on the control node.Managing EC2 Instances with ec2_instance Module
The ec2_instance module is the modern way to manage EC2 instances. It supports exact_count for idempotent instance management. Here's a production playbook snippet:
``yaml - name: Launch web servers amazon.aws.ec2_instance: name: "web-{{ item }}" instance_type: t3.medium image_id: ami-0abcdef1234567890 key_name: my-key security_group: "sg-xxxx" vpc_subnet_id: "subnet-xxxx" exact_count: 3 instance_role: Name: my-instance-profile network: assign_public_ip: true tags: Environment: production Role: web wait: yes wait_timeout: 600 loop: "{{ range(1, 4) | list }}" ``
exact_count: Ensures exactly that many instances exist. If fewer, it creates; if more, it terminates extras. Without it, the module is not idempotent.instance_role: Attaches an IAM instance profile. Use the profile name, not ARN.wait: yesandwait_timeout: Crucial for production — waits for instance to reachrunningstate.network: Allows specifying network interfaces. For multiple ENIs, usenetwork_interfaces.
Gotcha: The exact_count parameter works by filtering instances based on name tag and other filters you provide. If you don't set name, it may count unrelated instances. Always set name and tags to scope the count.
For updating instances (e.g., change instance type), use state: running and modify parameters. However, not all attributes are updatable in place; some require replacement. Use instance_ids to target specific instances for operations like stop/start.
exact_count when you want a fixed number of instances. Without it, the module will create a new instance every run, leading to drift and cost overruns.exact_count. After a few runs, we had 50 instances instead of 3. Adding exact_count: 3 with proper name and tags filters immediately terminated the extras and prevented future drift.ec2_instance with exact_count, name, and tags for idempotent EC2 management. Always set wait: yes.Creating S3 Buckets and Objects with Idempotency
The s3_bucket and s3_object modules manage S3 resources. s3_bucket is idempotent by default: if the bucket exists and is owned by you, it reports ok. If it exists but is owned by another account, it fails with BucketAlreadyExists. For production:
``yaml - name: Create application bucket amazon.aws.s3_bucket: name: my-app-bucket state: present region: "{{ ansible_aws_region }}" permission: private versioning: yes tags: Environment: production ``
permission: Default isprivate. Avoidpublic-readorpublic-read-writewithout explicit need.versioning: Enable for critical data.
For s3_object:
``yaml - name: Upload configuration file amazon.aws.s3_object: bucket: my-app-bucket object: /config/app.conf src: /local/path/app.conf mode: put permission: bucket-owner-full-control ``
Gotcha: The s3_bucket module does not manage bucket policies. Use a separate task with aws_s3_bucket_policy or iam_policy.
Idempotency for objects: The s3_object module with mode: put will upload the file every time unless you use force: false (default). To avoid unnecessary uploads, use overwrite: different (new in amazon.aws 5.0.0) which compares MD5 checksums:
``yaml - name: Upload config only if changed amazon.aws.s3_object: bucket: my-app-bucket object: /config/app.conf src: /local/path/app.conf mode: put overwrite: different ``
company-app-environment-region to avoid collisions.myapp-{{ aws_account_id }}-bucket) to guarantee uniqueness.s3_bucket with state: present and permission: private. For objects, use overwrite: different to avoid unnecessary uploads.Creating IAM Roles and Instance Profiles
IAM role management is critical for security. The iam_role module creates roles and attaches policies. Production example:
``yaml - name: Create EC2 service role amazon.aws.iam_role: name: ec2-service-role assume_role_policy_document: "{{ lookup('file', 'assume-role-policy.json') }}" managed_policies: - arn:aws:iam::aws:policy/AmazonS3ReadOnlyAccess - arn:aws:iam::aws:policy/CloudWatchAgentServerPolicy state: present create_instance_profile: yes ``
assume_role_policy_document must be a valid JSON string. Use lookup('file', ...) to load from file. Example assume-role-policy.json:
``json { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Service": "ec2.amazonaws.com" }, "Action": "sts:AssumeRole" } ] } ``
Gotcha: The iam_role module is idempotent only if the assume_role_policy_document is exactly the same. If you change the file, the module updates the role. However, managed_policies are additive — if you remove a policy from the list, the module does NOT detach it. To manage policies precisely, use iam_policy module or set managed_policies: [] and manage separately.
For instance profiles, create_instance_profile: yes creates a profile with the same name as the role. To attach the profile to an EC2 instance, use ec2_instance with instance_role parameter.
assume_role_policy_document must be a JSON string, not a YAML dict. Use lookup('file', ...) or lookup('template', ...) to ensure proper formatting.- name: Validate JSON | set_fact: policy_json={{ lookup('file', 'policy.json') | from_json }} before the IAM task.iam_role with assume_role_policy_document from a file, and create_instance_profile: yes. Validate JSON before applying.Provisioning RDS Instances with rds_instance
The rds_instance module manages RDS databases. Production example:
``yaml - name: Create PostgreSQL RDS instance amazon.aws.rds_instance: db_instance_identifier: mydb engine: postgres engine_version: "14.6" db_instance_class: db.t3.medium allocated_storage: 100 storage_type: gp3 master_username: "{{ db_master_username }}" master_user_password: "{{ db_master_password }}" vpc_security_group_ids: - sg-xxxx db_subnet_group_name: my-db-subnet-group publicly_accessible: no storage_encrypted: yes backup_retention_period: 7 skip_final_snapshot: no final_snapshot_identifier: mydb-final-{{ ansible_date_time.epoch }} wait: yes wait_timeout: 1200 state: present ``
skip_final_snapshot: In production, set tonoand providefinal_snapshot_identifierto avoid data loss on deletion.wait_timeout: RDS creation can take 10-20 minutes. Set to 1200 seconds (20 minutes).master_user_password: Use Ansible Vault or SSM Parameter Store (see section on secrets).
Gotcha: The module does not support changing master_username after creation. If you need to change, you must delete and recreate.
Idempotency: The module checks for an existing instance with the same db_instance_identifier. If found, it compares parameters and updates if necessary. Not all parameters are updatable in place; some require replacement.
skip_final_snapshot: yes in production. Always create a final snapshot before deletion. Use a unique identifier with timestamp to avoid conflicts.state: absent was triggered accidentally. The final_snapshot_identifier allowed us to restore within minutes. Without it, we would have lost 2 TB of data.skip_final_snapshot: no with a unique final snapshot identifier. Use wait: yes and wait_timeout: 1200.Building VPC Networks with aws_vpc and Subnet Modules
VPC modules allow you to define network infrastructure as code. Production example:
```yaml - name: Create VPC amazon.aws.ec2_vpc_net: name: my-vpc cidr_block: 10.0.0.0/16 tags: Environment: production state: present region: "{{ ansible_aws_region }}"
- name: Create public subnets
- amazon.aws.ec2_vpc_subnet:
- vpc_id: "{{ vpc_result.vpc.id }}"
- cidr: "10.0.{{ item }}.0/24"
- az: "{{ ansible_aws_region }}{{ item }}"
- tags:
- Name: "public-{{ item }}"
- Tier: public
- state: present
- loop:
- - a
- - b
- - c
- register: subnet_results
- ```
Idempotency: These modules use tags and cidr_block to identify existing resources. If you change the CIDR, it creates a new resource (the old one is not deleted). To delete, set state: absent.
Gotcha: The ec2_vpc_subnet module requires vpc_id or vpc_name. Using vpc_name is convenient but can be ambiguous if multiple VPCs have the same name. Prefer vpc_id.
For internet gateway and route tables:
```yaml - name: Create Internet Gateway amazon.aws.ec2_vpc_igw: vpc_id: "{{ vpc_result.vpc.id }}" tags: Name: my-igw state: present
- name: Create public route table
- amazon.aws.ec2_vpc_route_table:
- vpc_id: "{{ vpc_result.vpc.id }}"
- tags:
- Name: public-rt
- subnets:
- - "{{ subnet_results.results[0].subnet.id }}"
- - "{{ subnet_results.results[1].subnet.id }}"
- - "{{ subnet_results.results[2].subnet.id }}"
- routes:
- - dest: 0.0.0.0/0
- gateway_id: "{{ igw_result.gateway_id }}"
- state: present
- ```
Production insight: Use ec2_vpc_route_table with subnets list to associate subnets. The module is idempotent: if routes and associations match, it does nothing.
vpc_id instead of vpc_name to avoid ambiguity. You can retrieve the VPC ID using ec2_vpc_net_info with filters.vpc_name caused the playbook to modify the wrong VPC. We switched to using vpc_id obtained from ec2_vpc_net_info with region filter.vpc_id for idempotent operations. Tag all resources with Name and Environment for easy identification.Using Dynamic Inventory with AWS EC2 Plugin
The aws_ec2 inventory plugin dynamically builds inventory from AWS EC2 instances. Configure it in inventory/aws_ec2.yml:
``yaml plugin: amazon.aws.aws_ec2 regions: - us-east-1 - us-west-2 filters: tag:Environment: production instance-state-name: running hostnames: - tag:Name - dns-name - private-dns-name compose: ansible_host: public_ip_address keyed_groups: - key: tags.Role prefix: role - key: placement.region prefix: aws_region cache: yes cache_plugin: jsonfile cache_timeout: 3600 ``
cache: yesandcache_plugin: jsonfile: Avoids hitting AWS API on every playbook run. Setcache_timeoutto 3600 (1 hour) for production.filters: Use tags to limit scope. Avoid pulling all instances in an account.hostnames: Usetag:Namefor human-readable hostnames. Fallback to DNS names.keyed_groups: Create groups based on tags or attributes.
Run the playbook with:
``bash ansible-playbook -i inventory/aws_ec2.yml playbook.yml ``
Gotcha: The plugin caches inventory. If you create new instances, they won't appear until cache expires. Force refresh with --flush-cache.
Production pattern: Use cache: yes but set a low cache_timeout (e.g., 300) during deployments, then increase for normal operations.
ansible-inventory -i inventory/aws_ec2.yml --list --flush-cache to refresh the cache on demand.- name: Flush cache | meta: refresh_inventory.cache: yes to avoid API rate limits, but flush cache during deployments with --flush-cache or meta: refresh_inventory.Ensuring Idempotency in Cloud Modules
Idempotency means running a playbook multiple times produces the same result. AWS modules in amazon.aws are designed to be idempotent, but there are pitfalls.
Pattern 1: Use state: present with unique identifiers. For example, ec2_instance with name and tags uniquely identifies instances. Without name, the module may create duplicates.
Pattern 2: Use exact_count for EC2. This ensures exactly N instances exist. Without it, each run creates a new instance.
Pattern 3: Use force: false (default) on s3_bucket to avoid recreating.
Pattern 4: For IAM roles, the assume_role_policy_document must match exactly. If you use a template that changes every run (e.g., with timestamps), the role will be updated every time. Avoid dynamic content in policy documents.
Pattern 5: Use register and when to skip tasks if resource already exists. For example:
```yaml - name: Check if bucket exists amazon.aws.aws_s3_bucket_info: name: my-bucket register: bucket_info ignore_errors: yes
- name: Create bucket if not exists
- amazon.aws.s3_bucket:
- name: my-bucket
- state: present
- when: bucket_info is failed
- ```
This pattern is useful for modules that are not fully idempotent (e.g., some community modules).
Gotcha: Some modules like ec2_instance with exact_count can be slow because they query all instances matching filters. Use specific filters to limit scope.
tags with a Name key.tags: { Name: my-sg } fixed it.state: present for idempotency. For EC2, use exact_count. Avoid dynamic content in policy documents.Handling Eventual Consistency with Retries
AWS APIs are eventually consistent — after creating a resource, it may not be immediately visible in other APIs. This causes failures in subsequent tasks. The amazon.aws collection provides retries and delay parameters.
Global retry settings: Set in group_vars/all.yml:
``yaml ansible_aws_retry_max_attempts: 10 ansible_aws_retry_delay: 5 ``
Per-task retries: Override with module parameters:
``yaml - name: Wait for EC2 instance to be ready amazon.aws.ec2_instance_info: instance_ids: - i-xxxx region: "{{ ansible_aws_region }}" register: instance_info retries: 15 delay: 10 until: instance_info.instances[0].state.name == "running" ``
- After creating an RDS instance, use
rds_instance_infowith retries untildb_instance_statusisavailable. - After creating a security group, use
ec2_security_group_infowith retries before using it in other tasks. - After creating an IAM role, wait for it to propagate before attaching policies.
Gotcha: The until condition must be a boolean expression. Use | default('') to handle missing keys.
Production pattern: Use a custom retry wrapper:
``yaml - name: Retry until resource is found amazon.aws.ec2_vpc_net_info: filters: "tag:Name": my-vpc register: vpc_info retries: 10 delay: 5 until: vpc_info.vpcs | length > 0 ``
retries and until with a condition that checks for existence (e.g., vpcs | length > 0).retries: 10, delay: 5 on the VPC info check before subnet creation fixed it.until with existence checks.Storing Secrets with AWS SSM Parameter Store
Never hardcode secrets in playbooks. Use AWS SSM Parameter Store with the aws_ssm_parameter module to manage parameters, and lookup to retrieve them securely.
Storing a secret:
``yaml - name: Store database password in SSM amazon.aws.aws_ssm_parameter: name: /myapp/dbpassword value: "{{ db_password }}" type: SecureString overwrite: yes region: "{{ ansible_aws_region }}" no_log: true ``
no_log: true prevents the value from being logged.
Retrieving a secret in a playbook:
``yaml - name: Get database password from SSM set_fact: db_password: "{{ lookup('aws_ssm_parameter', '/myapp/dbpassword', decrypt=True, region=ansible_aws_region) }}" no_log: true ``
Then use {{ db_password }} in subsequent tasks.
Gotcha: The lookup plugin returns the value as a string. If the parameter is SecureString, you must set decrypt=True and have permission to decrypt.
Production pattern: Use environment-specific paths:
``yaml - name: Get environment-specific secret set_fact: db_password: "{{ lookup('aws_ssm_parameter', '/myapp/' + env + '/dbpassword', decrypt=True, region=ansible_aws_region) }}" ``
Permissions: The IAM role executing Ansible must have ssm:GetParameter and kms:Decrypt (if using KMS) permissions.
Alternative: Use Ansible Vault, but SSM is better for centralized secret management across teams.
/app/env/parameter for organization. Ensure the IAM role has ssm:GetParametersByPath permission to list parameters.no_log: true on a set_fact task. The password appeared in CI logs. We added a pre-commit hook to check for no_log: true on tasks that use aws_ssm_parameter lookup.aws_ssm_parameter to store secrets and lookup to retrieve them. Always use no_log: true on tasks handling secrets.Advanced: Combining Modules for Multi-Tier Deployments
Production applications often require multiple AWS resources. Here's a playbook that creates a VPC, subnets, security groups, RDS, and EC2 instances in a coordinated way.
```yaml --- - name: Provision multi-tier application hosts: localhost connection: local gather_facts: no vars: vpc_cidr: 10.0.0.0/16 public_subnets: - cidr: 10.0.1.0/24 az: "{{ ansible_aws_region }}a" - cidr: 10.0.2.0/24 az: "{{ ansible_aws_region }}b" private_subnets: - cidr: 10.0.10.0/24 az: "{{ ansible_aws_region }}a" - cidr: 10.0.11.0/24 az: "{{ ansible_aws_region }}b" tasks: - name: Create VPC amazon.aws.ec2_vpc_net: name: myapp-vpc cidr_block: "{{ vpc_cidr }}" tags: Environment: production state: present register: vpc
- name: Create subnets
- amazon.aws.ec2_vpc_subnet:
- vpc_id: "{{ vpc.vpc.id }}"
- cidr: "{{ item.cidr }}"
- az: "{{ item.az }}"
- tags:
- Name: "{{ item.name }}"
- state: present
- loop:
- - { cidr: "{{ public_subnets[0].cidr }}", az: "{{ public_subnets[0].az }}", name: "public-a" }
- - { cidr: "{{ public_subnets[1].cidr }}", az: "{{ public_subnets[1].az }}", name: "public-b" }
- - { cidr: "{{ private_subnets[0].cidr }}", az: "{{ private_subnets[0].az }}", name: "private-a" }
- - { cidr: "{{ private_subnets[1].cidr }}", az: "{{ private_subnets[1].az }}", name: "private-b" }
- register: subnets
- name: Create security group for web
- amazon.aws.ec2_security_group:
- name: web-sg
- description: Security group for web servers
- vpc_id: "{{ vpc.vpc.id }}"
- rules:
- - proto: tcp
- ports: 80
- cidr_ip: 0.0.0.0/0
- - proto: tcp
- ports: 443
- cidr_ip: 0.0.0.0/0
- tags:
- Name: web-sg
- state: present
- name: Create RDS subnet group
- amazon.aws.rds_subnet_group:
- name: myapp-db-subnet
- description: Subnet group for RDS
- subnet_ids:
- - "{{ subnets.results[2].subnet.id }}"
- - "{{ subnets.results[3].subnet.id }}"
- state: present
- name: Create RDS instance
- amazon.aws.rds_instance:
- db_instance_identifier: myapp-db
- engine: postgres
- engine_version: "14.6"
- db_instance_class: db.t3.medium
- allocated_storage: 100
- master_username: "{{ db_user }}"
- master_user_password: "{{ db_password }}"
- db_subnet_group_name: myapp-db-subnet
- vpc_security_group_ids:
- - "{{ sg_result.group_id }}"
- wait: yes
- wait_timeout: 1200
- state: present
- name: Launch EC2 instances
- amazon.aws.ec2_instance:
- name: web-{{ item }}
- instance_type: t3.medium
- image_id: ami-0abcdef1234567890
- vpc_subnet_id: "{{ subnets.results[0].subnet.id }}"
- security_group: web-sg
- exact_count: 2
- tags:
- Environment: production
- Role: web
- wait: yes
- loop: "{{ range(1, 3) | list }}"
- ```
- Use
registerto capture resource IDs for later use. - Use
loopto create multiple subnets. - Wait for RDS before proceeding to EC2.
- Use
no_log: trueon tasks with secrets.
wait: yes on RDS to ensure it's ready before EC2 tries to connect.async. The EC2 instances booted before RDS was ready, causing application failures. We switched to sequential with wait: yes on RDS.wait: yes and register to pass IDs between tasks.Testing and Validating AWS Playbooks Locally
Testing AWS playbooks without affecting real infrastructure is crucial. Use --check mode and --diff to preview changes.
``bash ansible-playbook -i inventory/aws_ec2.yml playbook.yml --check --diff ``
Limitations: --check mode does not actually call AWS APIs for creation tasks; it simulates. Some modules return 'changed' even in check mode. To validate syntax:
``bash ansible-playbook playbook.yml --syntax-check ``
Unit testing with Molecule: Use the molecule tool with the ec2 driver to spin up temporary instances for testing. Example molecule.yml:
``yaml --- dependency: name: galaxy driver: name: ec2 region: us-east-1 instance_type: t2.micro image_id: ami-0abcdef1234567890 vpc_subnet_id: subnet-xxxx security_group: sg-xxxx platforms: - name: instance groups: - web provisioner: name: ansible inventory: group_vars: all: ansible_aws_region: us-east-1 verifier: name: ansible ``
Run molecule test to create, test, and destroy instances.
Gotcha: Molecule with EC2 driver incurs costs. Use --destroy=never during development to keep instances for debugging.
Production pattern: Use a separate AWS account for testing. Implement check_mode: yes in playbooks with conditional logic to skip destructive tasks.
``yaml - name: Create EC2 instance (check mode safe) amazon.aws.ec2_instance: ... when: not ansible_check_mode ``
iam_role) do not support check mode fully. They may report 'changed' even when no changes would occur. Always verify with a dry run in a test environment.--check but failed in production because of a missing IAM permission. We added a pre-validation task that calls aws iam simulate-principal-policy to check permissions before running the main playbook.--check and --diff for dry runs. Use Molecule with EC2 driver for integration testing. Always test in a separate AWS account.The Idempotency Fail: ec2 Module vs ec2_instance
ec2 module was idempotent and would only create instances if the count was insufficient.ec2 module does not have an exact_count parameter that properly handles existing instances. It treated all running instances as 'extra' and terminated them before creating new ones.ec2 module with ec2_instance using exact_count: 3 and instance_role parameters. Also added instance_ids to target specific instances for updates.- Always use the latest module from
amazon.awscollection (e.g.,ec2_instanceoverec2). - The old modules are deprecated for a reason — they lack proper idempotency and state management.
timeout: 120 to the module and use retries: 5, delay: 10 for eventual consistency tasks.s3_bucket module with state: present and force: false (default). The error is benign; add ignore_errors: yes or check bucket existence with aws_s3_bucket_info first.lookup('file', 'policy.json') and validate with json.loads via a set_fact before the task.ec2_instance_info doesn't find it for 30 secondswait: yes, wait_timeout: 600 to rds_instance module, then use retries: 10, delay: 10 on subsequent tasks that query RDS.ansible-playbook -i inventory/aws_ec2.yml playbook.yml -vvv | grep -i 'ec2_instance'aws ec2 describe-instances --instance-ids i-xxx --region us-east-1wait: yes, wait_timeout: 600 to ec2_instance taskKey takeaways
Common mistakes to avoid
6 patternsUsing the deprecated `ec2` module instead of `ec2_instance`
ec2 with amazon.aws.ec2_instance and use exact_count.Not setting `wait: yes` on EC2 or RDS creation
wait: yes and wait_timeout to creation tasks.Hardcoding secrets in playbooks
aws_ssm_parameter lookup with no_log: true.Not using `exact_count` for EC2 instances
exact_count with name and tags filters.Omitting tags on resources
tags with a Name key on all resources.Not handling eventual consistency with retries
retries, delay, and until conditions to subsequent tasks.Interview Questions on This Topic
How do you ensure idempotency when creating EC2 instances with Ansible?
amazon.aws.ec2_instance module with the exact_count parameter. Provide filters like name and tags to scope the count. Without exact_count, the module creates a new instance every run. Also set state: present and use wait: yes to ensure the instance is running before proceeding.Frequently Asked Questions
20+ years shipping production infrastructure and CI/CD at scale. Drawn from code that ran under real load.
That's Ansible. Mark it forged?
12 min read · try the examples if you haven't