Ansible Windows Automation: WinRM CredSSP Pitfall with Double-Hop & Fix
Master Ansible Windows automation with WinRM configuration, Kerberos/NTLM/CredSSP auth, win_* modules, and PowerShell scripts.
20+ years shipping production infrastructure and CI/CD at scale. Everything here is grounded in real deployments.
Use ansible_connection:winrm and set ansible_winrm_transport to credssp for double-hop scenarios. Configure WinRM on Windows hosts with Enable-PSRemoting and Set-Item WSMan:\localhost\Client\TrustedHosts. Prefer win_shell over win_command when you need PowerShell-specific features like pipelines or object manipulation. For Kerberos auth, ensure SPN is set and time sync within 5 minutes; use klist to verify tickets. CredSSP requires Group Policy enablement and the credentialssp feature on the host. Always use ansible_winrm_server_cert_validation: ignore for self-signed certs in production labs. win_copy uses SMB; ensure proper firewall rules for port 445 and 5986 (HTTPS). Troubleshoot with ansible -m win_ping -vvvv to see WinRM transport details.
Imagine you're a remote control operator trying to fix a computer in another building. The remote control app (Ansible) talks to a helper program (WinRM) on that computer. But if you need the helper to then access a file server in a third building, the helper can't pass your credentials — that's the 'double-hop' problem. CredSSP is like giving the helper a temporary badge that lets it go to the third building on your behalf. NTLM is like a simple password check, Kerberos is like a secure ticket system, and each has its own rules for when it can pass credentials.
I still remember the outage. We had 200 Windows servers that needed a hotfix deployed via Ansible. The playbook worked perfectly on the first 50, then suddenly failed with 'WinRM transport failed' on a cluster of servers in a different domain. The error message was cryptic: 'The WinRM client cannot process the request because the server name is not in the trusted hosts list.' We had set TrustedHosts, but the issue was deeper — it was a Kerberos double-hop problem. That night, I learned the hard way that Windows automation with Ansible is not just about installing pywinrm and running a playbook. It's about understanding WinRM authentication protocols, Group Policy settings, and the subtle differences between win_shell and win_command. This article covers everything I wish I knew then.
1. WinRM Configuration on Windows Hosts
Before Ansible can manage a Windows host, you need to configure WinRM. The quickest way is to run Enable-PSRemoting -Force in an elevated PowerShell. This starts the WinRM service, sets it to auto-start, and creates HTTP and HTTPS listeners. However, for production, you need to secure it.
HTTPS Listener with Self-Signed Cert: ``powershell $cert = New-SelfSignedCertificate -CertStoreLocation Cert:\LocalMachine\My -DnsName $env:COMPUTERNAME -FriendlyName 'WinRM HTTPS' New-WSManInstance -ResourceURI winrm/config/Listener -SelectorSet @{Address='*';Transport='HTTPS'} -ValueSet @{CertificateThumbprint=$cert.Thumbprint} ``
Firewall rules: ``powershell New-NetFirewallRule -DisplayName 'WinRM HTTP' -Direction Inbound -Protocol TCP -LocalPort 5985 -Action Allow New-NetFirewallRule -DisplayName 'WinRM HTTPS' -Direction Inbound -Protocol TCP -LocalPort 5986 -Action Allow ``
TrustedHosts: For non-domain or cross-domain, set TrustedHosts to allow connections: ``powershell Set-Item WSMan:\localhost\Client\TrustedHosts -Value '*' -Force ``
Ansible Inventory for Windows: ``yaml [windows] server1 ansible_host=192.168.1.10 ansible_user=Administrator ansible_password=Passw0rd ansible_connection=winrm ansible_winrm_server_cert_validation=ignore ``
Production insight: I once spent hours debugging a connection issue only to find the WinRM service was set to manual start after a Windows update. Always verify with Get-Service WinRM.
2. WinRM Authentication Protocols: NTLM, Kerberos, CredSSP
Ansible supports three main authentication transports: ntlm, kerberos, and credssp. Each has trade-offs.
NTLM: Default, works for workgroup and domain. Simple, but no double-hop. Use ansible_winrm_transport: ntlm.
Kerberos: Requires domain-joined hosts, time sync, and proper SPN. Set ansible_winrm_transport: kerberos. You need python-gssapi or pywinrm[kerberos] on the control node. Verify with klist get HTTP/target.domain.com.
CredSSP: Enables double-hop delegation. Set ansible_winrm_transport: credssp. On the target, run: ``powershell Enable-WSManCredSSP -Role Server -Force ` On the control node, install pip install pywinrm[credssp]`.
Choosing the right one: - Same domain, no double-hop: Kerberos (fastest) - Workgroup or cross-domain: NTLM - Need to access network shares: CredSSP
Production insight: We once had Kerberos fail because the DNS suffix was missing. Always use FQDN in ansible_host for Kerberos to work.
3. The ansible_connection: winrm Plugin Deep Dive
The winrm connection plugin is what makes Ansible talk to Windows. Key variables:
ansible_winrm_transport: ntlm, kerberos, credssp, basicansible_winrm_port: 5985 (HTTP) or 5986 (HTTPS)ansible_winrm_server_cert_validation: ignore or validate (default: validate)ansible_winrm_operation_timeout_sec: default 60ansible_winrm_read_timeout_sec: default 60ansible_winrm_pipeline_timeout_sec: default 60
Example inventory with advanced settings: ``yaml [windows:vars] ansible_connection=winrm ansible_winrm_transport=credssp ansible_winrm_port=5986 ansible_winrm_server_cert_validation=ignore ansible_winrm_operation_timeout_sec=120 ansible_winrm_read_timeout_sec=120 ansible_winrm_pipeline_timeout_sec=120 ``
Why timeouts matter: Long-running PowerShell scripts may timeout. For example, a win_shell that runs a 5-minute script needs pipeline_timeout > 300.
Production insight: We had a deployment that kept failing with 'WinRM operation timeout' because the default 60 seconds wasn't enough for a large package installation. Bumping to 300 fixed it.
ansible target -m win_ping -vvvv to see which transport and port are being used.ansible_winrm_proxy to bypass.4. Using win_service for Service Management
The win_service module manages Windows services. Common parameters: - name: service name (e.g., Spooler) - state: started, stopped, restarted - start_mode: auto, manual, disabled
Example playbook: ``yaml - name: Ensure print spooler is running and set to auto win_service: name: Spooler state: started start_mode: auto ``
Production gotcha: Service names are case-insensitive but must match exactly. Use Get-Service to verify.
Idempotency: win_service is idempotent; it will only change state if needed.
Restart with dependencies: ``yaml - name: Restart service and its dependencies win_service: name: W3Svc state: restarted dependencies: - WAS ``
Production insight: We once tried to stop a service that was already stopped, and win_service reported 'changed' because the start_mode was different. Always check the module's return values.
5. Installing Software with win_package
win_package installs or uninstalls software using MSI or EXE. Key parameters: - path: path to installer - product_id: GUID for idempotency (required for MSI) - state: present or absent - arguments: additional CLI arguments
Example: ``yaml - name: Install 7-Zip win_package: path: '\\fileserver\share\7z1900-x64.msi' product_id: '{23170F69-40C1-2702-1900-000001000000}' state: present ``
Double-hop issue: If the installer is on a network share, use CredSSP or copy it locally first.
Getting product_id: ``powershell Get-WmiObject Win32_Product | Select-Object Name, IdentifyingNumber ``
Production insight: We once had an MSI that required a reboot. win_package doesn't handle reboots; you need to use win_reboot module after installation.
/L*v C:\install.log to arguments for MSI installations to debug failures.arguments: '/S /D=C:\Program Files\App' to get silent install.6. File Transfer with win_copy
win_copy copies files from the control node to the Windows host (or between Windows hosts if using remote_src). Key parameters: - src: local path - dest: remote path - remote_src: yes if src is on the remote host
Example: ``yaml - name: Copy configuration file win_copy: src: ./app.config dest: C:\Program Files\App\app.config ``
Performance: win_copy uses SMB under the hood. For large files, consider using win_get_url or a file server.
Production gotcha: The destination folder must exist. Use win_file to create it first.
Copy from network share (requires CredSSP): ``yaml - name: Copy from share win_copy: src: '\\fileserver\share\file.txt' dest: C:\temp\file.txt remote_src: yes ``
Production insight: We had a 2GB file that timed out. We split it into chunks and used multiple win_copy tasks.
checksum parameter to verify file integrity after copy.7. Running Commands: win_command vs win_shell
Both modules execute commands, but they differ in how they process output.
win_command: Runs a command via cmd.exe. Does not support PowerShell pipelines. Best for simple executables. ``yaml - name: Run ipconfig win_command: ipconfig ``
win_shell: Runs via PowerShell. Supports pipelines, variables, and complex scripts. ``yaml - name: Get services with status win_shell: Get-Service | Where-Object {$_.Status -eq 'Running'} | Select-Object Name ``
Key differences: - win_command returns stdout as string; win_shell returns stdout as string but can also return structured data if you use ConvertTo-Json. - win_shell has a executable parameter to choose the shell (e.g., powershell.exe, cmd.exe). - win_command is faster for simple commands.
Production insight: We used win_command to run a legacy batch file that failed because it needed PowerShell. Switching to win_shell with executable: cmd.exe fixed it.
| Out-String -Width 4096 to avoid.8. PowerShell Scripts via win_shell
For complex automation, write PowerShell scripts and execute them with win_shell. Best practices:
- Use
executable: powershell.exeexplicitly. - Set
ansible_winrm_pipeline_timeout_sechigh enough. - Use
$ErrorActionPreference = 'Stop'to catch errors. - Return data as JSON for easy parsing.
Example playbook with inline script: ```yaml - name: Run custom script win_shell: | $ErrorActionPreference = 'Stop' $result = Get-Service | Select-Object Name, Status ConvertTo-Json $result args: executable: powershell.exe register: script_output
- debug:
- var: script_output.stdout
- ```
External script file: ``yaml - name: Run script from file win_shell: | C:\scripts\deploy.ps1 -Param1 value1 args: executable: powershell.exe ``
Production insight: We had a script that took 10 minutes. The default timeout caused failures. We set ansible_winrm_pipeline_timeout_sec=900.
9. Configuring Windows Hosts for Ansible at Scale
For large-scale deployment, automate the WinRM configuration using Group Policy or DSC.
Group Policy settings: - Computer Configuration -> Administrative Templates -> Windows Components -> Windows Remote Management (WinRM) -> WinRM Service - Allow remote server management through WinRM: Enabled - Allow unencrypted traffic: Disabled (use HTTPS) - Windows Firewall: Allow inbound WinRM ports
PowerShell DSC for WinRM: ``powershell Configuration WinRMConfig { Node localhost { Script EnableWinRM { SetScript = { Enable-PSRemoting -Force Set-Item WSMan:\localhost\Client\TrustedHosts -Value '*' -Force } TestScript = { (Get-Service WinRM).Status -eq 'Running' } GetScript = { @{Result = 'Check WinRM'} } } } } ``
Ansible playbook to bootstrap Windows hosts: ``yaml - hosts: all gather_facts: no tasks: - name: Enable WinRM win_shell: Enable-PSRemoting -Force - name: Set TrustedHosts win_shell: Set-Item WSMan:\localhost\Client\TrustedHosts -Value '' -Force - name: Create HTTPS listener win_shell: | $cert = New-SelfSignedCertificate -DnsName $env:COMPUTERNAME -CertStoreLocation Cert:\LocalMachine\My New-WSManInstance -ResourceURI winrm/config/Listener -SelectorSet @{Address='';Transport='HTTPS'} -ValueSet @{CertificateThumbprint=$cert.Thumbprint} ``
Production insight: We used Ansible to bootstrap itself: first run with NTLM over HTTP, then switch to Kerberos over HTTPS.
10. Common WinRM Pitfalls and How to Avoid Them
Here are the most frequent issues I've encountered:
1. WinRM service not running: ``powershell Get-Service WinRM | Start-Service Set-Service WinRM -StartupType Automatic ``
2. Firewall blocking ports: Check with Test-NetConnection -ComputerName target -Port 5986.
3. Certificate validation failure: Set ansible_winrm_server_cert_validation: ignore for self-signed certs.
4. Authentication failure due to time skew: Kerberos requires time within 5 minutes. Use NTP.
5. Double-hop access denied: Use CredSSP or copy files locally first.
6. Module not found (e.g., win_package): Ensure you have the latest ansible.windows collection: ansible-galaxy collection install ansible.windows.
7. PowerShell execution policy: Set to RemoteSigned: ``powershell Set-ExecutionPolicy RemoteSigned -Scope LocalMachine -Force ``
Production insight: We once had a host where WinRM was configured but the user account was locked out. Always check account status.
11. Advanced: Using win_shell with PSSession for Complex Tasks
For tasks that require multiple commands in a single session (e.g., changing directory, setting variables), use a PSSession within win_shell.
Example: ``yaml - name: Run multi-step task win_shell: | $session = New-PSSession -ComputerName localhost Invoke-Command -Session $session -ScriptBlock { Set-Location C:\Deploy .\deploy.ps1 } Remove-PSSession $session ``
When to use: When you need to preserve state between commands (e.g., importing a module once, then using it).
Performance: Creating a PSSession adds overhead. Only use when necessary.
Production insight: We used this to run a script that required a specific working directory and environment variables.
12. Monitoring and Logging Ansible Windows Automation
Enable detailed logging for troubleshooting.
Ansible logging: ``bash ANSIBLE_LOG_PATH=/var/log/ansible.log ansible-playbook playbook.yml -vvvv ``
WinRM logging on Windows: ``powershell Set-Item WSMan:\localhost\Client\MaxEnvelopeSizekb -Value 2048 Enable-WSManTrace ``
Collect WinRM logs: ``powershell Get-WinEvent -LogName Microsoft-Windows-WinRM/Operational | Select-Object -First 10 ``
Integration with monitoring: Use win_shell to send custom metrics to your monitoring system.
Production insight: We set up a centralized log collector to capture all Ansible runs and WinRM errors. This helped us identify a pattern of failures related to memory pressure.
The Double-Hop CredSSP Nightmare
- Always test file copy from local paths first.
- If you need network shares, use CredSSP or use win_copy with local path and then copy from share via PowerShell (e.g., Copy-Item with -Credential).
Test-WSMan -ComputerName target -Credential $credGet-ChildItem WSMan:\localhost\ListenerKey takeaways
Common mistakes to avoid
8 patternsUsing win_command for PowerShell scripts
Not setting TrustedHosts
Using default timeouts for long-running tasks
Assuming win_copy works with UNC paths without CredSSP
Not enabling CredSSP on target for double-hop
Using HTTP with basic auth in production
Forgetting to install pywinrm on control node
Not using FQDN for Kerberos
Interview Questions on This Topic
What is the double-hop problem in WinRM and how do you solve it with Ansible?
Frequently Asked Questions
20+ years shipping production infrastructure and CI/CD at scale. Everything here is grounded in real deployments.
That's Ansible. Mark it forged?
7 min read · try the examples if you haven't