Ansible Docker Management: Production Patterns with community.docker 3.x
Master Ansible Docker management with community.docker collection.
20+ years shipping production infrastructure and CI/CD at scale. Drawn from code that ran under real load.
Use community.docker 3.4.0+; install via ansible-galaxy collection install community.docker.
docker_container module: use state: started with restart_policy: unless-stopped for production.
docker_image: build with source: build and force_source: yes to rebuild on code changes.
docker_network: create with driver: overlay for Swarm; use attachable: yes for standalone.
docker_volume: prefer driver_opts: { type: nfs, o: addr=..., device=...} for shared storage.
docker_compose module vs raw docker-compose: use community.docker.docker_compose_v2 for Ansible-native; fallback to command: docker compose up -d for complex stacks.
Container lifecycle: use restart: yes only when config changes; prefer comparisons: ['cmd'] for idempotency.
Bootstrap Docker hosts: run get_url for Docker GPG key, apt_repository for repo, then package for docker-ce.
Imagine you're the manager of a busy restaurant kitchen. You have a bunch of prep cooks (Docker containers) each making different dishes. Ansible is like your head chef who can instantly tell each cook to start, stop, or change their recipe. The community.docker collection is the set of standardized order slips that let Ansible communicate with the kitchen's computer system (Docker daemon). Instead of running around yelling, you give precise written instructions that work every time. When a cook needs a new cutting board (a Docker volume) or a new station layout (a Docker network), you just write it on the slip and it happens automatically. No more burnt dishes or missing ingredients.
It was 2 AM on a Tuesday when I got paged. Our microservices deployment had failed halfway through—half the containers were running old images, half the new ones, and the database container was missing entirely. The previous engineer had written a bash script that ran docker commands sequentially and it had a race condition. That's when I realized we needed a proper configuration management tool for Docker, not ad-hoc scripts. Ansible with the community.docker collection became our salvation. Over the next weeks, I rewrote our deployment pipeline to use Ansible's idempotent modules, and we never had a partial deploy again.
Installing the community.docker Collection
The community.docker collection is not included in ansible-base. You must install it explicitly. Use ansible-galaxy collection install community.docker to get the latest version (3.4.0 as of writing). For production, pin the version: ansible-galaxy collection install community.docker:==3.4.0. On the target hosts, you need the docker Python SDK: pip install docker==6.1.0. Alternatively, you can use the community.docker.docker_connection plugin if you want to run modules on the control node and connect to remote Docker daemons via TCP. However, I recommend running tasks on the target host to avoid latency and socket issues.
Example playbook snippet: ``yaml - hosts: docker_hosts tasks: - name: Install docker SDK pip: name: docker==6.1.0 state: present ``
If you're using Ansible Automation Platform or AWX, add the collection to requirements.yml: ``yaml collections: - name: community.docker version: 3.4.0 ``
docker_container module failed with cryptic JSON decode errors. Always keep collection and SDK versions in sync across all nodes.Managing Containers with docker_container
The docker_container module is the workhorse. Key parameters for production: - state: started (idempotent, creates if missing, starts if stopped) - restart_policy: unless-stopped (survives daemon restarts) - restart: yes only when you want to force restart (use with handlers) - comparisons: ['cmd', 'env', 'labels'] to detect changes - image: myimage:{{ tag }} with a variable for versioning
Example: ``yaml - name: Ensure web container is running community.docker.docker_container: name: web image: myapp:{{ app_version }} state: started restart_policy: unless-stopped ports: - "80:80" env: DB_HOST: "{{ db_host }}" comparisons: - "cmd" - "env" restart: "{{ container_restart | default(false) }}" ``
Use `detach: yes (default) to run in background. For one-off tasks, use detach: no and cleanup: yes. Avoid using command` to run shell commands—use the module instead.
comparisons, the module will not detect changes in environment variables or command. It will report 'ok' even if you changed env vars. Always set comparisons for production.memory_limit: 512m and memory_reservation: 256m to the module. But the module didn't apply the limit because the container already existed. We had to set state: absent first, then recreate. That's why I now always include state: absent in a separate task when changing resource limits.comparisons with restart or destroy and recreate.Building Images with docker_image
The docker_image module can pull or build images. For building, use: - source: build - force_source: yes to rebuild even if image exists (use with caution) - build.path: /path/to/docker/context - build.dockerfile: Dockerfile.prod if non-default - build.args: { BUILD_VERSION: "{{ version }}" } for build args
Example: ``yaml - name: Build production image community.docker.docker_image: name: myapp tag: "{{ app_version }}" source: build force_source: yes build: path: /opt/myapp dockerfile: Dockerfile.prod args: APP_VERSION: "{{ app_version }}" pull: yes state: present ``
For pulling from a registry, use source: pull and repository: registry.example.com/myapp:tag. To push, use push: yes with repository and source: local after building.
Production tip: Use force_source: no in CI/CD and only rebuild when code changes (detected by git commit hash). This speeds up deployments.
delegate_to: localhost and ensure Docker is installed locally.COPY with files that didn't exist in the build context. I added a stat check before building to verify all required files exist.force_source: yes in dev, but use no in prod to avoid unnecessary rebuilds. Verify build context files exist.Creating Docker Networks
For multi-container apps, you need networks. Use docker_network: - state: present to create - driver: bridge for standalone, overlay for Swarm - ipam_config: [{subnet: 10.10.0.0/24, gateway: 10.10.0.1}] for fixed IP range - attachable: yes to allow standalone containers to attach (useful for debugging)
Example: ``yaml - name: Create frontend network community.docker.docker_network: name: frontend driver: bridge ipam_config: - subnet: 172.20.0.0/24 state: present ``
For Swarm, use driver: overlay and scope: swarm. Be careful: overlay networks require Docker Swarm mode enabled. If you try to create an overlay network on a non-swarm node, the module will fail.
Production insight: Always specify subnet to avoid conflicts with existing networks. I once had a network that got a random subnet that overlapped with a VPN, causing connectivity issues.
attachable: no. Later we needed to attach a standalone container for monitoring. We had to recreate the network. Now I always set attachable: yes on overlay networks.attachable: yes for flexibility. For Swarm, ensure Swarm mode is active.Managing Docker Volumes
state: presentto createdriver: local(default) orrbd,nfs, etc.driver_opts: { type: nfs, o: addr=192.168.1.100,rw,device=:/exported/path }for NFS
Example: ``yaml - name: Create shared data volume community.docker.docker_volume: name: shared_data driver: local driver_opts: type: nfs o: addr=192.168.1.100,rw device: :/mnt/nfs_share state: present ``
For local volumes, you can specify labels and labels_from_object. Volume removal is tricky: if a container is using it, the module will fail. Use force: yes to remove even if in use (but be careful).
Production tip: Use recreate: yes only when you need to purge data. Otherwise, volumes persist across container recreations, which is usually desired.
nfsvers=4 for better performance. Test with docker volume create --driver local --opt type=nfs --opt o=addr=... --opt device=:... testvol.driver_opts but later the NFS server IP changed. The module didn't update the volume—it just said 'ok'. We had to remove and recreate the volume. Now I use a unique volume name per deployment to avoid this.docker_compose Module vs Raw docker-compose
The community.docker.docker_compose_v2 module (introduced in collection 2.0) manages Docker Compose stacks natively. It is idempotent and integrates with Ansible's state model. However, it has limitations: it cannot handle all Compose features (e.g., extends, profiles). For complex stacks, I fall back to the command module running docker compose up -d.
Example using docker_compose_v2: ``yaml - name: Deploy stack with compose community.docker.docker_compose_v2: project_src: /opt/myapp files: - docker-compose.yml - docker-compose.prod.yml state: present recreate: always ``
- Need to use
--profileor--with-registry-auth - Need to run
docker compose execfor one-off tasks - Complex variable substitution
Example raw: ``yaml - name: Deploy with raw docker-compose command: docker compose -f docker-compose.yml -f docker-compose.prod.yml up -d --remove-orphans args: chdir: /opt/myapp ``
Production insight: Use docker_compose_v2 for simple stacks and raw for complex ones. The raw approach is less idempotent but more flexible.
changed_when: false on raw tasks if you handle restarts separately.--profile monitoring to start extra containers. The docker_compose_v2 module didn't support profiles, so we switched to raw. We added changed_when: false and used a separate task to check if containers were running.Container Lifecycle Management
Managing container lifecycle means handling creation, updates, restarts, and removal. Key patterns: - Use state: started for normal operation. - Use state: stopped to pause (but containers restart on daemon restart if restart policy is set). - Use state: absent to remove. - Use restart: yes only when you need to force restart (e.g., after config change).
Example with handlers: ```yaml - name: Update web config template: src: nginx.conf.j2 dest: /opt/nginx/nginx.conf notify: restart web
handlers: - name: restart web community.docker.docker_container: name: web state: started restart: yes ```
For zero-downtime deployments, use rolling updates with multiple containers and a load balancer. Ansible can orchestrate this by updating containers one by one.
Production insight: I once used state: started with restart: yes on every playbook run. This caused unnecessary container restarts and downtime. Now I only restart when configuration changes, using comparisons or handlers.
state: absent then started) destroys and creates a new container. Use restart for config changes, recreate for image updates.state: absent then state: started in two tasks. But this caused a brief downtime. We switched to using a blue-green pattern with two containers and a reverse proxy.Bootstraping Docker Hosts with Ansible
To manage Docker with Ansible, you first need Docker installed on the target hosts. Use Ansible to bootstrap:
``yaml - name: Install Docker on Ubuntu hosts: all tasks: - name: Install prerequisites apt: name: - apt-transport-https - ca-certificates - curl - gnupg - lsb-release state: present - name: Add Docker GPG key get_url: url: https://download.docker.com/linux/ubuntu/gpg dest: /usr/share/keyrings/docker-archive-keyring.gpg - name: Add Docker repository apt_repository: repo: "deb [arch=amd64 signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu {{ ansible_distribution_release }} stable" state: present - name: Install Docker packages apt: name: - docker-ce - docker-ce-cli - containerd.io - docker-compose-plugin state: present - name: Start and enable Docker service: name: docker state: started enabled: yes - name: Install Docker SDK for Python pip: name: docker==6.1.0 state: present ``
For CentOS/RHEL, use yum_repository and yum. For production, consider using the official Docker install script or a custom role from Ansible Galaxy.
docker-compose-plugin (v2) instead of standalone docker-compose (v1). The plugin is maintained by Docker and integrates with docker compose command. The community.docker.docker_compose_v2 module requires the plugin.when: docker_sdk_installed is not defined with a stat check to make it idempotent.Deploying Compose Stacks with Ansible
Once Docker is installed, deploy Compose stacks. Example using docker_compose_v2:
``yaml - name: Deploy application stack hosts: docker_hosts vars: app_version: "1.2.3" tasks: - name: Ensure network exists community.docker.docker_network: name: app_net state: present - name: Ensure volumes exist community.docker.docker_volume: name: app_data state: present - name: Deploy stack community.docker.docker_compose_v2: project_src: /opt/myapp files: - docker-compose.yml state: present recreate: always environment: APP_VERSION: "{{ app_version }}" ``
If using raw docker-compose: ``yaml - name: Deploy with raw compose command: docker compose -f docker-compose.yml up -d --remove-orphans args: chdir: /opt/myapp environment: APP_VERSION: "{{ app_version }}" changed_when: false ``
Production tip: Use --remove-orphans to clean up containers not defined in the Compose file. This prevents drift.
environment parameter of the task or define them in the Compose file's environment section.${APP_VERSION} in the Compose file. With docker_compose_v2, the variable was not substituted because the module runs in its own environment. We switched to raw docker-compose with environment set, which worked. Later we moved to using Ansible templates to render the Compose file before deploying.Advanced: Rolling Updates with Ansible
For zero-downtime deployments, implement rolling updates. Example with two containers and a load balancer:
``yaml - name: Rolling update web service hosts: docker_hosts serial: 1 tasks: - name: Pull new image community.docker.docker_image: name: myapp tag: "{{ new_version }}" source: pull state: present - name: Stop old container community.docker.docker_container: name: web state: stopped - name: Start new container with new image community.docker.docker_container: name: web image: "myapp:{{ new_version }}" state: started restart_policy: unless-stopped ``
But this pattern has downtime. Better: use two containers (blue/green): - Deploy new container with different name - Update load balancer to point to new container - Remove old container
Example: ``yaml - name: Deploy blue/green hosts: docker_hosts tasks: - name: Deploy green container community.docker.docker_container: name: web_green image: "myapp:{{ new_version }}" state: started ports: - "8081:80" - name: Switch load balancer to green template: src: nginx.conf.j2 dest: /etc/nginx/conf.d/app.conf notify: reload nginx - name: Remove blue container community.docker.docker_container: name: web_blue state: absent ``
Production insight: We used this pattern with HAProxy. The playbook would update the HAProxy config to point to the new container and then remove the old one. This gave us zero downtime.
uri module to check a health endpoint: uri: url: http://{{ inventory_hostname }}:8081/health status_code: 200.docker_container's healthcheck parameter in the container definition.Security: Managing Secrets and Registries
When pulling images from private registries, you need authentication. Use docker_login module (or docker_config in newer versions) to log in:
``yaml - name: Log into private registry community.docker.docker_login: registry_url: https://registry.example.com username: "{{ registry_user }}" password: "{{ registry_pass }}" reauthorize: yes no_log: yes ``
Then pull images normally. For Kubernetes-style secrets, you can create Docker config secrets with docker_secret (community.docker 3.x).
For container secrets (e.g., database passwords), use Ansible Vault to encrypt variables and pass them as environment variables or mounted files. Avoid hardcoding secrets in Dockerfiles.
Example: ``yaml - name: Run container with secrets community.docker.docker_container: name: app image: myapp env: DB_PASSWORD: "{{ vault_db_password }}" secrets: - source: db_password target: /run/secrets/db_password ``
Production insight: We used no_log: yes on the login task, but the password still appeared in the module's return output. We had to set no_log: yes on the entire playbook. Now we use Ansible Vault for all secrets.
no_log: yes on tasks that handle passwords, API keys, or any sensitive data. Even with no_log, some modules may leak secrets in error messages. Test by running with -v and checking output.no_log: yes. We immediately rotated the password and added a pre-commit hook to flag missing no_log on tasks with password in the name.no_log: yes on tasks with secrets. Use Ansible Vault for variable encryption.Monitoring and Idempotency Checks
After deploying, verify that containers are running. Use the docker_container_info module:
```yaml - name: Get container info community.docker.docker_container_info: name: web register: container_info
- name: Fail if container not running
- fail:
- msg: "Container web is not running"
- when: container_info.container.State.Status != 'running'
- ```
For idempotency, use docker_container_info to decide whether to recreate: ```yaml - name: Check if container exists community.docker.docker_container_info: name: web register: web_info
- name: Remove old container if image changed
- community.docker.docker_container:
- name: web
- state: absent
- when: web_info.container.Image != 'myapp:{{ new_version }}'
- ```
Production insight: We had a playbook that always recreated containers because we didn't check the current image. Adding the docker_container_info check reduced deployment time by 50%.
docker_container_info to avoid unnecessary container recreations. Compare container.Image or container.State.StartedAt to decide if an update is needed.docker_container_info to check if a container was running the correct image tag. If not, we recreated. This made our deployments idempotent and fast.The Case of the Missing Container Restart
restart: yes would always restart the container.restart: yes parameter only triggers a restart if the container already exists and is running. But the module's idempotency check didn't detect the config change because comparisons was not set.comparisons: ['cmd', 'env', 'labels'] to detect changes, and use restart: yes only when comparisons detect a change. Alternatively, use state: started with restart: yes and a handler.- Always use
comparisonsto define what constitutes a change for docker_container. - Never rely on
restart: yesalone.
DOCKER_HOST env var or use docker_host: unix:///var/run/docker.sock in task. Ensure user has permissions. Run ansible -m shell -a 'docker ps' to test.build.path: /path/to/dir with Dockerfile in it. Add build.dockerfile: Dockerfile.prod if non-default. Run docker build -t test . locally first.state: present which is idempotent. If using state: absent, ensure you have force: yes to remove in use networks.docker compose config. Use project_src: /path and ensure files: [docker-compose.yml].ansible target -m community.docker.docker_image -a 'name=myimage state=present'ansible target -m community.docker.docker_container -a 'name=mycontainer state=absent'Key takeaways
Common mistakes to avoid
6 patternsNot setting comparisons on docker_container
Using restart: yes on every run
Forgetting to install Docker Python SDK
Not pinning collection version
Using docker_compose_v2 with V1 compose files
Not using no_log on secret tasks
Interview Questions on This Topic
How do you ensure idempotency when using docker_container module?
comparisons parameter to specify which attributes to compare (e.g., ['cmd', 'env']). Also use docker_container_info to check current state before making changes. Set restart: yes only when a change is detected via handlers.Frequently Asked Questions
20+ years shipping production infrastructure and CI/CD at scale. Drawn from code that ran under real load.
That's Ansible. Mark it forged?
8 min read · try the examples if you haven't