Senior 4 min · April 11, 2026

Docker Daemon Socket Errors — Why Disk Full Crashes CI/CD

100% disk usage silently crashes dockerd and blocks all CI/CD builds.

N
Naren · Founder
Plain-English first. Then code. Then the interview question.
About
 ● Production Incident 🔎 Debug Guide ⚙ Triage Commands
Quick Answer
  • Error: 'Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?' means the Docker CLI cannot reach the daemon process
  • The daemon listens on a Unix socket at /var/run/docker.sock by default
  • Common causes: daemon not running, permission denied, socket missing, WSL2 issues
  • Fix: start the daemon, add user to docker group, or check socket permissions
  • On Linux, run: sudo systemctl start docker
  • Biggest mistake: running everything with sudo instead of fixing group permissions
✦ Definition~90s read
What is Docker Daemon Socket Errors?

The 'Cannot connect to the Docker daemon' error is the most common and infuriating connectivity failure in containerized development. It occurs when the Docker CLI client cannot reach the Docker daemon (dockerd) through its Unix socket, typically located at /var/run/docker.sock.

Docker has two parts: the command-line tool you type commands into, and a background service (the daemon) that actually does the work.

This socket is the primary IPC endpoint — a file-based communication channel that the CLI uses to send API requests to the daemon. When the disk hosting that socket fills up, the daemon cannot write to its internal state or create new containers, causing it to silently drop the socket connection or refuse new ones.

The result is a cryptic error that halts CI/CD pipelines, local builds, and any Docker operation dead in its tracks.

This error is fundamentally different from permission-denied issues (which produce 'permission denied' or 'dial unix /var/run/docker.sock: connect: permission denied'). A disk-full scenario manifests as a connection refused or timeout because the daemon has effectively stopped listening.

In production CI/CD environments like GitHub Actions, Jenkins, or GitLab CI, this is catastrophic — a full disk on the build runner means every subsequent job fails immediately, often cascading across teams. The root cause is almost always log files, dangling images, or build cache consuming the partition where /var/lib/docker or the socket directory resides.

You should not confuse this with Docker Desktop's socket behavior on macOS/Windows, where the socket is actually a symlink to a gRPC FUSE proxy inside a VM. On those platforms, disk-full errors manifest differently — the VM disk fills up, not the host socket.

The fix differs accordingly. In production, the only reliable prevention is aggressive log rotation, periodic docker system prune, and monitoring disk usage on the Docker data directory. Systemd-based systems add another layer of complexity: the docker.socket unit can be activated independently of docker.service, meaning the socket file exists and is listening, but the daemon itself may be dead or unresponsive due to disk pressure — a silent betrayal that makes debugging even harder.

Plain-English First

Docker has two parts: the command-line tool you type commands into, and a background service (the daemon) that actually does the work. This error means the command-line tool tried to talk to the background service, but the line was dead. Either the service is not running, your user does not have permission to pick up the phone, or the phone line (socket file) is missing entirely.

The error 'Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?' occurs when the Docker CLI cannot communicate with the Docker daemon process. The daemon is the background service that manages containers, images, networks, and volumes.

This error blocks all Docker operations — every docker command will fail until the connection is restored. The root cause varies across environments: the daemon may not be running, the user may lack socket permissions, or the socket file may be missing entirely. This guide covers every cause, the exact fix for each, and prevention strategies for production systems.

What Causes This Error?

The error 'Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?' occurs when the Docker CLI cannot establish a connection to the Docker daemon process through the Unix socket.

The Docker architecture has two components: the Docker CLI (the command you type) and the Docker daemon (the background service that manages containers). The CLI communicates with the daemon through a Unix socket at /var/run/docker.sock. When this socket is unavailable, inaccessible, or the daemon process is not running, every Docker command fails with this error.

There are five primary causes: the daemon is not running, the user lacks socket permissions, the socket file is missing, the disk is full, or the environment is misconfigured (WSL2, remote Docker hosts). Each cause requires a different fix.

io.thecodeforge.docker.daemon_diagnostics.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
import os
import subprocess
import json
from dataclasses import dataclass
from typing import Optional, Dict, List
from enum import Enum


class DaemonStatus(Enum):
    RUNNING = "running"
    STOPPED = "stopped"
    FAILED = "failed"
    UNKNOWN = "unknown"
    NOT_INSTALLED = "not_installed"


class SocketStatus(Enum):
    EXISTS = "exists"
    MISSING = "missing"
    PERMISSION_DENIED = "permission_denied"
    WRONG_PATH = "wrong_path"


@dataclass
class DiagnosticResult:
    component: str
    status: str
    detail: str
    fix_command: Optional[str] = None
    severity: str = "info"


class DockerDaemonDiagnostics:
    """
    Diagnoses Docker daemon connection issues.
    """

    SOCKET_PATH = "/var/run/docker.sock"
    DAEMON_SERVICE = "docker"

    @staticmethod
    def check_daemon_installed() -> DiagnosticResult:
        """
        Check if Docker is installed.
        """
        try:
            result = subprocess.run(
                ["docker", "--version"],
                capture_output=True, text=True, timeout=5
            )
            if result.returncode == 0:
                return DiagnosticResult(
                    component="docker_installation",
                    status="installed",
                    detail=result.stdout.strip(),
                    severity="info",
                )
            return DiagnosticResult(
                component="docker_installation",
                status="error",
                detail="Docker command exists but returned an error",
                severity="warning",
            )
        except FileNotFoundError:
            return DiagnosticResult(
                component="docker_installation",
                status="not_installed",
                detail="Docker CLI not found in PATH",
                fix_command="Install Docker: https://docs.docker.com/engine/install/",
                severity="critical",
            )

    @staticmethod
    def check_daemon_running() -> DiagnosticResult:
        """
        Check if the Docker daemon process is running.
        """
        try:
            result = subprocess.run(
                ["systemctl", "is-active", "docker"],
                capture_output=True, text=True, timeout=5
            )
            status = result.stdout.strip()

            if status == "active":
                return DiagnosticResult(
                    component="daemon_process",
                    status="running",
                    detail="Docker daemon is active and running",
                    severity="info",
                )
            elif status == "inactive":
                return DiagnosticResult(
                    component="daemon_process",
                    status="stopped",
                    detail="Docker daemon is installed but not running",
                    fix_command="sudo systemctl start docker && sudo systemctl enable docker",
                    severity="critical",
                )
            elif status == "failed":
                return DiagnosticResult(
                    component="daemon_process",
                    status="failed",
                    detail="Docker daemon entered a failed state",
                    fix_command="sudo journalctl -u docker.service -n 50 --no-pager",
                    severity="critical",
                )
            else:
                return DiagnosticResult(
                    component="daemon_process",
                    status="unknown",
                    detail=f"Unexpected daemon status: {status}",
                    severity="warning",
                )
        except FileNotFoundError:
            return DiagnosticResult(
                component="daemon_process",
                status="unknown",
                detail="systemctl not found — possibly macOS or non-systemd Linux",
                severity="info",
            )

    @staticmethod
    def check_socket() -> DiagnosticResult:
        """
        Check if the Docker socket exists and is accessible.
        """
        socket_path = DockerDaemonDiagnostics.SOCKET_PATH

        if not os.path.exists(socket_path):
            return DiagnosticResult(
                component="socket_file",
                status="missing",
                detail=f"Socket file does not exist at {socket_path}",
                fix_command="sudo systemctl restart docker",
                severity="critical",
            )

        if not os.access(socket_path, os.R_OK | os.W_OK):
            stat_info = os.stat(socket_path)
            import grp
            try:
                group_name = grp.getgrgid(stat_info.st_gid).gr_name
            except KeyError:
                group_name = str(stat_info.st_gid)

            return DiagnosticResult(
                component="socket_permissions",
                status="permission_denied",
                detail=f"Socket exists but current user lacks read/write permissions. Socket group: {group_name}",
                fix_command=f"sudo usermod -aG {group_name} $USER && newgrp {group_name}",
                severity="critical",
            )

        return DiagnosticResult(
            component="socket_file",
            status="exists",
            detail=f"Socket file exists and is accessible at {socket_path}",
            severity="info",
        )

    @staticmethod
    def check_disk_space() -> DiagnosticResult:
        """
        Check disk space on the Docker data directory.
        """
        docker_root = "/var/lib/docker"

        if not os.path.exists(docker_root):
            return DiagnosticResult(
                component="disk_space",
                status="unknown",
                detail=f"Docker data directory not found at {docker_root}",
                severity="warning",
            )

        stat = os.statvfs(docker_root)
        free_gb = (stat.f_bavail * stat.f_frsize) / (1024 ** 3)
        total_gb = (stat.f_blocks * stat.f_frsize) / (1024 ** 3)
        used_pct = ((total_gb - free_gb) / total_gb) * 100

        if used_pct > 95:
            return DiagnosticResultcritical",
                detail=f"Disk usage at {used_pct:.1f}% — {free_gb:.1f}GB free of {total_gb:.1f}GB",
                fix_command="docker system prune -af --volumes",
                severity="critical",
            )
        elif used_pct > 80:
            return DiagnosticResult(
                component="disk_space",
                status="warning",
                detail=f"Disk usage at {used_pct:.1f}% — {free_gb:.1f}GB free of {total_gb:.1f}GB",
                fix_command="docker system prune -af --filter 'until=168h'",
                severity="warning",
            )

        return DiagnosticResult(
            component="disk_space",
            status="healthy",
            detail=f"Disk usage at {used_pct:.1f}% — {free_gb:.1f}GB free of {total_gb:.1f}GB",
            severity="info",
        )

    @staticmethod
    def check_user_group() -> DiagnosticResult:
        """
        Check if the current user is in the docker group.
        """
        try:
            result = subprocess.run(
                ["groups"],
                capture_output=True, text=True, timeout=5
            )
            groups = result.stdout.strip().split()

            if "docker" in groups:
                return DiagnosticResult(
                    component="user_group",
                    status="member",
                    detail="Current user is in the docker group",
                    severity="info",
                )
            else:
                return DiagnosticResult(
                    component="user_group",
                    status="not_member",
                    detail=f"Current user ({os.getlogin()}) is not in the docker group. Groups: {', '.join(groups)}",
                    fix_command="sudo usermod -aG docker $USER && newgrp docker",
                    severity(
                component="disk_space",
                status="="critical",
                )
        except Exception as e:
            return DiagnosticResult(
                component="user_group",
                status="unknown",
                detail=f"Could not check group membership: {e}",
                severity="warning",
            )

    @staticmethod
    def run_full_diagnosis() -> List[DiagnosticResult]:
        """
        Run all diagnostic checks and return results.
        """
        checks = [
            DockerDaemonDiagnostics.check_daemon_installed(),
            DockerDaemonDiagnostics.check_daemon_running(),
            DockerDaemonDiagnostics.check_socket(),
            DockerDaemonDiagnostics.check_disk_space(),
            DockerDaemonDiagnostics.check_user_group(),
        ]
        return checks

    @staticmethod
    def print_report(results: List[DiagnosticResult]) -> None:
        """
        Print a formatted diagnostic report.
        """
        severity_icons = {
            "critical": "[FAIL]",
            "warning": "[WARN]",
            "info": "[ OK ]",
        }

        print("\nDocker Daemon Diagnostic Report")
        print("=" * 60)

        for r in results:
            icon = severity_icons.get(r.severity, "[????]")
            print(f"\n{icon} {r.component}: {r.status}")
            print(f"    {r.detail}")
            if r.fix_command:
                print(f"    Fix: {r.fix_command}")

        critical = [r for r in results if r.severity == "critical"]
        if critical:
            print(f"\n{'=' * 60}")
            print(f"ACTION REQUIRED: {len(critical)} critical issue(s) found.")
            print("Run the fix commands above to resolve.")


# Example usage
if __name__ == "__main__":
    results = DockerDaemonDiagnostics.run_full_diagnosis()
    DockerDaemonDiagnostics.print_report(results)
Docker Architecture: CLI vs Daemon
  • Docker CLI = the command you type (docker run, docker ps, etc.)
  • Docker daemon = the background service (dockerd) that does the actual work
  • Unix socket = the communication channel between CLI and daemon at /var/run/docker.sock
  • If the socket is missing, broken, or permission-denied, all docker commands fail
  • The daemon is a system service managed by systemd on Linux or Docker Desktop on macOS
Production Insight
The error message does not tell you WHY the connection failed.
It only tells you the connection failed — diagnosis requires checking five components.
Rule: run the full diagnostic (daemon, socket, permissions, disk, group) before guessing.
Key Takeaway
The error means the CLI cannot reach the daemon through the Unix socket.
Five components to check: installation, daemon status, socket, permissions, disk.
Run a full diagnosis before applying fixes — do not guess.
Docker Daemon Connection Troubleshooting
Ifdocker --version fails with command not found
UseDocker is not installed — install Docker Engine or Docker Desktop
Ifsystemctl status docker shows inactive or failed
UseStart the daemon: sudo systemctl start docker. Check logs if it fails.
IfSocket file /var/run/docker.sock does not exist
UseDaemon is running but socket is missing — restart daemon: sudo systemctl restart docker
IfPermission denied on docker.sock
UseAdd user to docker group: sudo usermod -aG docker $USER then log out and back in
IfDisk usage above 95% on /var/lib/docker
UseClean up: docker system prune -af --volumes then restart daemon

Fixing on Linux

Linux is the most common environment for this error because Docker runs as a systemd service and socket permissions depend on group membership. The fix depends on whether the daemon is running, the socket exists, and the user has the correct permissions.

The most frequent cause on Linux is that the Docker daemon service is not running — either it was never started, it crashed, or it was not enabled to start on boot. The second most common cause is permission denied on the socket, which happens when the user is not in the docker group.

io.thecodeforge.docker.fix_linux.shBASH
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
#!/bin/bash
# ============================================
# Docker Daemon Connection Fix for Linux
# Run these commands in order until the error is resolved
# ============================================

set -e

# Step 1: Check if Docker is installed
if ! command -v docker &> /dev/null; then
    echo "Docker is not installed."
    echo "Install with: curl -fsSL https://get.docker.com | sh"
    exit 1
fi

echo "Docker version: $(docker --version)"

# Step 2: Check daemon status
echo "\nChecking daemon status..."
sudo systemctl status docker --no-pager

# Step 3: Start daemon if not running
if ! systemctl is-active --quiet docker; then
    echo "Daemon is not running. Starting..."
    sudo systemctl start docker
    sudo systemctl enable docker
    echo "Daemon started and enabled on boot."
fi

# Step 4: Check socket file
if [ ! -S /var/run/docker.sock ]; then
    echo "Socket file missing. Restarting daemon..."
    sudo# Step 5: Check user group membership
if ! groups | grep -q docker; then
    echo "Current user is not in the docker group systemctl restart docker
    sleep 2
fi

."
    echo "Adding user to docker group..."
    sudo usermod -aG docker $USER
    echo "Run 'newgrp docker' or log out and back in for changes to take effect."
    newgrp docker
fi

# Step 6: Check disk space
DOCKER_ROOT="/var/lib/docker"
if [ -d "$DOCKER_ROOT" ]; then
    USAGE=$(df "$DOCKER_ROOT" | tail -1 | awk '{print $5}' | tr -d '%')
    echo "\nDisk usage on $DOCKER_ROOT: ${USAGE}%"
    if [ "$USAGE" -gt 90 ]; then
        echo "WARNING: Disk usage above 90%. Consider running:"
        echo "  docker system prune -af --volumes"
    fi
fi

# Step 7: Verify connection
echo "\nVerifying Docker connection..."
if docker info &> /dev/null; then
    echo "SUCCESS: Docker daemon is accessible."
    docker info --format 'Server Version: {{.ServerVersion}}'
else
    echo "FAILED: Still cannot connect to Docker daemon."
    echo "Check logs: sudo journalctl -u docker.service -n 50"
    exit 1
fi
Linux-Specific Pitfalls
  • Running docker with sudo every time is a security risk — fix group permissions instead
  • Group changes require a new login session — newgrp docker or log out and back in
  • SELinux or AppArmor can block socket access even with correct group membership
  • Snap-installed Docker uses a different socket path — check /var/run/snap.docker.socket
  • systemd socket activation means the daemon starts on first docker command — check socket unit
Production Insight
Production Linux servers should have Docker enabled on boot.
Without systemctl enable docker, the daemon stops after every reboot.
Rule: always run systemctl enable docker on production servers.
Key Takeaway
On Linux, check daemon status first, then socket, then permissions, then disk.
Add users to the docker group instead of running everything with sudo.
Enable Docker on boot with systemctl enable docker for production servers.

Fixing on macOS and Windows (WSL2)

On macOS and Windows, Docker runs inside a lightweight path is ~/.docker/run/docker.sock (not /var/run/docker.sock). Docker Desktop symlinks this to the standard location. On Windows with WSL2, the socket is provided by Docker Desktop's WSL integration — if Docker Desktop is not running on Windows, the WSL2 environment cannot connect.

io.thecodeforge.docker.fix_macos_wsl.shBASH
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
#!/bin/bash
# ============================================
# Docker Daemon Connection Fix for macOS and WSL2
# ============================================

# ---- macOS Fix ----
fix_macos() {
    echo "Checking Docker Desktop on macOS..."

    # Check if Docker Desktop is running
    if ! pgrep -x "Docker Desktop" > /dev/null; then
        echo "Docker Desktop is not running. Starting..."
        open -a "Docker Desktop"
        echo "Waiting for Docker Desktop to start (up to 60 seconds)..."

        for i in $(seq 1 60); do
            if docker info &> /dev/null; then
                echo "Docker Desktop is ready."
                return 0
            fi
            sleep 1
        done

        echo "Docker Desktop failed to start within 60 seconds."
        echo "Try: killall Docker Desktop && open -a 'Docker Desktop'"
        return 1
    fi

    # Check socket symlink
    if [ ! -S /var/run/docker.sock ]; then
        echo "Socket symlink missing. Restarting Docker Desktop..."
        osascript -e 'quit app "Docker Desktop"'
        sleep 5
        open -a "Docker Desktop"
        sleep 30
    fi

    # Verify
    if docker info &> /dev/null; then
        echo "Docker connection OK."
    else
        echo "Still cannot connect. Try:"
        echo "  1. Docker Desktop > Troubleshoot > Clean / Purge data"
        echo "  2. Docker Desktop > Restart"
        echo "  3. rm -rf ~/.docker/run/docker.sock && restart Docker Desktop"
    fi
}

# ---- WSL2 Fix ----
fix_wsl2() {
    echo "Checking Docker in WSL2..."

    # Check if we are in WSL
    if ! grep -qi microsoft /proc/version 2>/dev/null; then
        echo "Not running in WSL. Use macOS fix or native Linux fix."
        return 1
    fi

    # Check if Docker Desktop is running on Windows
    echo "Ensure Docker Desktop is running on Windows."
    echo "Check: Docker Desktop > Settings > Resources > WSL Integration"
    echo "Enable integration for your WSL2 distribution."

    # Restart WSL
    echo "\nRestarting WSL to refresh connection..."
    echo "Run from Windows PowerShell: wsl --shutdown"
    echo "Then reopen your WSL terminal."

    # Check socket
    if [ -S /var/run/docker.sock ]; then
        echo "Socket exists."
    else
        echo "Socket missing. Docker Desktop WSL integration may be disabled."
        echo "Open Docker Desktop > Settings > Resources > WSL Integration"
        echo "Toggle your distro off and on again."
    fi
}

# Main
case "$(uname -s)" in
    Darwin*) fix_macos ;;
    Linux*)
        if grep -qi microsoft /proc/version 2>/dev/null; then
            fix_wsl2
        else
            echo "Use the Linux fix script instead."
        fi
        ;;
    *) echo "Unsupported OS: $(uname -s)" ;;
esac
macOS and WSL2 Tips
  • Docker Desktop must be running — it manages the VM that hosts the Docker daemon
  • On macOS, the socket is at ~/.docker/run/docker.sock — Docker Desktop symlinks it
  • On WSL2, Docker Desktop provides the daemon — do not install Docker Engine inside WSL2
  • If WSL integration breaks, toggle it off and on in Docker Desktop Settings
  • Docker Desktop uses significant RAM (2-4GB) — check if the VM has enough memory
Production Insight
Docker Desktop auto-start can be disabled by macOS updates or user settings.
Always verify Docker Desktop is running after OS updates or reboots.
Rule: add Docker Desktop to Login Items on macOS for reliable auto-start.
Key Takeaway
On macOS, Docker Desktop must be running — it manages the daemon VM.
On WSL2, Docker Desktop on Windows provides the daemon — do not install Docker inside WSL2.
Restart Docker Desktop or toggle WSL integration when connections break.

Preventing This Error in Production

Preventing Docker daemon connection errors in production requires proactive monitoring, automatic recovery, and proper system configuration. Reactive fixes are unacceptable when container orchestration depends on daemon availability.

Three prevention strategies are essential: systemd watchdog for automatic daemon restart, disk usage monitoring with automatic pruning, and socket permission enforcement across deployments. These strategies ensure the daemon recovers from crashes without manual intervention.

io.thecodeforge.docker.prevention.shBASH
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
#!/bin/bash
# ============================================
# Docker Daemon Prevention Strategies
# Run once on every production Docker host
# ============================================

set -e

# ---- Strategy 1: Systemd Watchdog ----
# Automatically restart Docker daemon if it becomes unresponsive

setup_watchdog() {
    echo "Setting up systemd watchdog for Docker daemon..."

    # Create override directory
    sudo mkdir -p /etc/systemd/system/docker.service.d

    # Create watchdog override
    sudo tee /etc/systemd/system/docker.service.d/watchdog.conf > /dev/null << 'EOF'
[Service]
WatchdogSec=60
Restart=always
RestartSec=5
EOF

    # Create health check script
    sudo tee /usr/local/bin/docker-healthcheck.sh > /dev/null << 'EOF'
#!/bin/bash
if docker info &> /dev/null; then
    exit 0
else
    exit 1
fi
EOF

    sudo chmod +x /usr/local/bin/docker-healthcheck.sh

    # Reload systemd
    sudo systemctl daemon-reload
    sudo systemctl restart docker

    echo "Watchdog configured. Daemon will restart if unresponsive for 60 seconds."
}

# ---- Strategy 2: Automatic Disk Pruning ----
# Prevent disk exhaustion that crashes the daemon

setup_auto_prune() {
    echo "Setting up automatic Docker disk pruning..."

    # Create pruning script
    sudo tee /usr/local/bin/docker-prune.sh > /dev/null << 'EOF'
#!/bin/bash
# Remove images older than 7 days
# Keep running containers and their images

LOGFILE="/var/log/docker-prune.log"

{
    echo "=== Docker Prune: $(date) ==="
    echo "Before:"
    docker system df

    # Prune stopped containers, unused networks, dangling images
    docker system prune -f --filter "until=168h"

    # Prune unused images (not just dangling)
    docker image prune -af --filter "until=168h"

    # Prune build cache older than 7 days
    docker builder prune -f --filter "until=168h"

    echo "After:"
    docker system df
    echo "=== Done ==="
} >> "$LOGFILE" 2>&1
EOF

    sudo chmod +x /usr/local/bin/docker-prune.sh

    # Add cron job: run every Sunday at 3 AM
    CRON_JOB="0 3 * * 0 /usr/local/bin/docker-prune.sh"
    (crontab -l 2>/dev/null | grep -v docker-prune; echo "$CRON_JOB") | crontab -

    echo "Auto-prune configured. Runs every Sunday at 3 AM."
}

# ---- Strategy 3: Disk Usage Alerting ----
# Alert before disk reaches critical levels

setup_disk_alerts() {
    echo "Setting up disk usage alerts..."

    sudo tee /usr/local/bin/docker-disk-alert.sh > /dev/null << 'EOF'
#!/bin/bash
DOCKER_ROOT="/var/lib/docker"
THRESHOLD=85

USAGE=$(df "$DOCKER_ROOT" | tail -1 | awk '{print $5}' | tr -d '%')

if [ "$USAGE" -ge "$THRESHOLD" ]; then
    echo "WARNING: Docker disk usage at ${USAGE}% (threshold: ${THRESHOLD}%)"
    echo "Running containers:"
    docker ps --format 'table {{.Names}}\t{{.Status}}\t{{.Size}}'
    echo "\nDisk breakdown:"
    docker system df

    # Send alert (customize for your alerting system)
    # curl -X POST https://alerts.example.com/webhook -d "{\"text\": \"Docker disk at ${USAGE}%\"}"
fi
EOF

    sudo chmod +x /usr/local/bin/docker-disk-alert.sh

    # Add cron job: check every 15 minutes
    CRON_JOB="*/15 * * * * /usr/local/bin/docker-disk-alert.sh"
    (crontab -l 2>/dev/null | grep -v docker-disk-alert; echo "$CRON_JOB") | crontab -

    echo "Disk alerting configured. Checks every 15 minutes at 85% threshold."
}

# ---- Strategy 4: Socket Permission Enforcement ----
# Ensure socket permissions survive Docker upgrades

setup_socket_permissions() {
    echo "Enforcing socket permissions..."

    # Ensure docker group exists
    sudo groupadd -f docker

    # Set socket ownership
    sudo tee /etc/docker/daemon.json > /dev/null << 'EOF'
{
  "group": "docker",
  "log-driver": "json-file",
  "log-opts": {
    "max-size": "10m",
    "max-file": "3"
  },
  "storage-driver": "overlay2"
}
EOF

    sudo systemctl restart docker

    echo "Socket permissions enforced via daemon.json."
}

# Main
setup_watchdog
setup_auto_prune
setup_disk_alerts
setup_socket_permissions

echo "\nAll prevention strategies configured."
echo "Docker daemon will now:"
echo "  - Auto-restart if unresponsive (60s watchdog)"
echo "  - Auto-prune old resources (weekly)"
echo "  - Alert at 85% disk usage (every 15 min)"
echo "  - Maintain socket permissions across restarts"
Production Docker Daemon Reliability
  • systemd watchdog restarts the daemon automatically if it becomes unresponsive
  • Automatic pruning prevents disk exhaustion — the #1 cause of daemon crashes
  • Disk alerts give you time to act before the daemon crashes
  • Socket permissions in daemon.json survive Docker upgrades and reboots
  • For zero-downtime, run Docker across multiple hosts with orchestration (Swarm, Kubernetes)
Production Insight
Docker daemon crashes are silent — no alert, no graceful degradation.
Without watchdog, a crashed daemon requires manual detection and restart.
Rule: configure systemd watchdog and disk monitoring on every Docker host.
Key Takeaway
Prevention requires three strategies: watchdog, pruning, and alerting.
The daemon is a single point of failure — monitor it like any critical service.
Disk exhaustion is the #1 cause of daemon crashes in production.

The Socket Showdown: Docker Group vs. Root Access

This isn't a permission bug. It's a design choice. Docker's daemon listens on a Unix socket owned by root. By default, only root or members of the docker group can talk to it. That's the entire root cause. Most tutorials skip the WHY: Docker does this to prevent any random user from spinning up containers that can escape and pwn your host. If you can run docker run --privileged, you can effectively become root inside the container. So the group membership is a security boundary, not a chmod oversight. On Linux, the fix is dead simple: create the group if it doesn't exist, add your user, restart the Docker daemon, then log out and back in. But here's the nuance: /var/run/docker.sock must exist and have 660 permissions. If someone manually tightened permissions or if systemd's socket activation is borked, you'll still get the error. Check with ls -la /var/run/docker.sock. The socket is the gate. Everything else is noise.

DockerSocketAudit.ymlYAML
1
2
3
4
5
6
7
8
9
10
11
12
13
14
// io.thecodeforge — devops tutorial

# Verify socket exists and permissions are correct
- name: Check Docker socket permissions
  shell: ls -la /var/run/docker.sock
  register: socket_info

- name: Fail if socket missing or wrong ownership
  fail:
    msg: "Docker socket is absent or owned by wrong group."
  when: socket_info.stdout.find('srw-rw---- 1 root docker') == -1

# Expected output:
# srw-rw---- 1 root docker 0 Jan 15 09:42 /var/run/docker.sock
Output
srw-rw---- 1 root docker 0 Jan 15 09:42 /var/run/docker.sock
Production Trap:
If you're in a CI/CD pipeline and get this error, never add the CI user to the docker group. That opens a hole to privilege escalate via container escape. Instead, run Docker commands over TCP with TLS certs or use a sidecar like Docker-in-Docker (DinD).
Key Takeaway
The Docker socket is root-only by design; group membership is the approved backdoor, but never use it in automation.

Systemd's Silent Betrayal: Service vs. Socket Units

When you run systemctl start docker, you assume the daemon starts. But Docker ships with two systemd units: docker.service and docker.socket. The socket unit activates the service on demand. If the socket unit isn't running, the service can't bind to the socket. You get the "Cannot connect to the Docker daemon" error even if Docker itself is installed. This is a frequent footgun on Fedora, CentOS, and Arch. The fix: systemctl enable --now docker.socket. But here's the real WTF: if you manually start only the service (systemctl start docker), systemd's socket activation may leave the socket in a broken state if the service crashes. Check both. Run systemctl status docker.socket and systemctl status docker.service. If the socket is active but the service is dead, you have a corrupted state. Kill the socket, kill the service, then restart both in the right order: socket first, then service. Docker's own documentation glosses over this because they assume you use the init script, not raw systemd units. Don't be that dev who spends hours debugging only to find an inactive socket unit.

SystemdDockerFix.ymlYAML
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
// io.thecodeforge — devops tutorial

- name: Ensure Docker socket unit is active and enabled
  systemd:
    name: docker.socket
    state: started
    enabled: yes

- name: Restart Docker service cleanly
  systemd:
    name: docker.service
    state: restarted
    daemon_reload: yes

- name: Test connection
  command: docker version
  changed_when: false

# Expected output:
# Client:
#  Version:           24.0.7
# Server:
#  Engine:
#   Version:          24.0.7
Output
# Client:
# Version: 24.0.7
# Server:
# Engine:
# Version: 24.0.7
Senior Shortcut:
Never just systemctl start docker. Always do systemctl enable --now docker.socket first. Then start the service. This prevents socket activation conflicts that plague SELinux-heavy distros.
Key Takeaway
Docker has two systemd units; ensure both are active and started in the right order — socket before service.

WSL2's Dual-Headed Docker: Windows vs. Linux Context

On Windows with WSL2, you get two Docker daemons. One runs inside the Windows Docker Desktop VM. Another could be running inside your WSL2 distro if you installed Docker Engine manually. The error "Cannot connect to the Docker daemon" happens when your WSL2 terminal talks to the wrong daemon or none at all. The fix: decide which one you want. If you use Docker Desktop, it exposes a socket at /var/run/docker.sock inside the WSL2 distro via a bind mount. This works out of the box, but only if Docker Desktop's WSL2 integration is enabled for your specific distro. Open Docker Desktop → Settings → Resources → WSL Integration. Toggle your distro on. If it's off, your socket points to nothing. If you installed Docker Engine inside WSL2, you have a separate daemon that needs its own systemd or init.d management. That's a recipe for port conflicts. Real senior move: use Docker Desktop's integration and never install Docker Engine inside WSL2. It's a maintenance nightmare. But if you must, set DOCKER_HOST to tcp://localhost:2375 and run a separate daemon on a different port. Either way, verify with docker volume ls — if it hangs, your socket is dead.

WSL2DockerCheck.ymlYAML
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
// io.thecodeforge — devops tutorial

# Check which socket is being used
- name: Inspect Docker socket in WSL2
  shell: ls -la /var/run/docker.sock
  register: socket_wsl

- name: Verify Docker Desktop integration
  shell: docker context ls
  register: context_list

- name: Set context to default if using Docker Desktop
  command: docker context use default
  when: context_list.stdout.find('desktop-linux') != -1

# Expected output of docker context ls:
# NAME                TYPE                DESCRIPTION
# default             moby                Current DOCKER_HOST based configuration
# desktop-linux *     moby                Docker Desktop
Output
# NAME TYPE DESCRIPTION
# default moby Current DOCKER_HOST based configuration
# desktop-linux * moby Docker Desktop
WTF Moment:
If you get the error but docker context show returns default, check if Docker Desktop is actually running. On WSL2, you can have Docker Desktop installed but not started. The socket file exists but points nowhere. Start Docker Desktop from the Windows tray.
Key Takeaway
On WSL2, never mix Docker Desktop and manual Docker Engine installations; use one socket or the other, not both.
● Production incidentPOST-MORTEMseverity: high

Docker Daemon Crash During Deployment Blocked All CI/CD Pipelines for 6 Hours

Symptom
All CI/CD builds failed with 'Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?'. Developers could not build or push container images. Deployments to staging and production halted.
Assumption
The Docker daemon crashed due to a software bug in the latest Docker Engine update.
Root cause
The build server's root partition reached 100% disk usage due to accumulated Docker images, build caches, and dangling volumes. When the disk filled, the Docker daemon process crashed because it could not write to /var/lib/docker. The systemd service entered a failed state and did not auto-restart. No monitoring existed for Docker daemon health or disk usage on the build server.
Fix
Recovered 80GB of disk space by running docker system prune -af --volumes. Restarted the Docker daemon with sudo systemctl restart docker. Added monitoring: a cron job running docker system df alerts at 80% usage, a systemd watchdog checking docker info every 60 seconds, and a CloudWatch alarm on disk utilization. Set up automatic pruning via a weekly cron: 0 3 0 docker system prune -af --filter 'until=168h'.
Key lesson
  • Docker daemon crashes silently when the disk fills — no graceful degradation
  • Monitor Docker daemon health with docker info and systemd watchdog
  • Set up automatic image pruning to prevent disk exhaustion
  • Build servers need disk monitoring as a first-class concern, not an afterthought
Production debug guideCommon symptoms and actions for docker.sock connection failures5 entries
Symptom · 01
docker: Cannot connect to the Docker daemon. Is the docker daemon running?
Fix
Check daemon status: sudo systemctl status docker. If inactive, start it: sudo systemctl start docker. If it fails to start, check logs: sudo journalctl -u docker.service --since '5 minutes ago'.
Symptom · 02
Got permission denied while trying to connect to the Docker daemon socket
Fix
Your user is not in the docker group. Run: sudo usermod -aG docker $USER. Then log out and log back in. Verify with: groups | grep docker.
Symptom · 03
docker.sock: connect: no such file or directory
Fix
The socket file is missing. The daemon is either not running or configured with a different socket path. Check: ls -la /var/run/docker.sock. If missing, restart the daemon.
Symptom · 04
Docker commands work with sudo but not without
Fix
Group membership has not taken effect. Run: newgrp docker. Or log out and log back in. Verify: id | grep docker.
Symptom · 05
WSL2: Cannot connect to Docker daemon
Fix
Ensure Docker Desktop is running on Windows. In Docker Desktop Settings > Resources > WSL Integration, enable your WSL2 distro. Restart WSL: wsl --shutdown then wsl.
★ Docker Daemon Quick Debug ReferenceFast commands for diagnosing and fixing Docker daemon connection issues
Daemon not running
Immediate action
Check and start the Docker service
Commands
sudo systemctl status docker
sudo systemctl start docker && sudo systemctl enable docker
Fix now
If daemon fails to start, check logs: sudo journalctl -u docker.service -n 50
Permission denied on docker.sock+
Immediate action
Add your user to the docker group
Commands
sudo usermod -aG docker $USER
newgrp docker
Fix now
Log out and log back in for group membership to take full effect
Socket file missing+
Immediate action
Verify socket exists and daemon is running
Commands
ls -la /var/run/docker.sock
sudo systemctl restart docker
Fix now
If socket still missing after restart, check daemon config: cat /etc/docker/daemon.json
Disk full causing daemon crash+
Immediate action
Check disk usage and clean Docker resources
Commands
df -h /var/lib/docker
docker system prune -af --volumes
Fix now
Free space then restart: sudo systemctl restart docker
WSL2 Docker not connecting+
Immediate action
Verify Docker Desktop integration
Commands
wsl --shutdown
wsl && docker info
Fix now
Open Docker Desktop > Settings > Resources > WSL Integration and enable your distro
Docker Daemon Error Causes and Fixes by Environment
EnvironmentMost Common CauseQuick FixPrevention
Linux (systemd)Daemon not runningsudo systemctl start dockersystemctl enable docker
Linux (permissions)User not in docker groupsudo usermod -aG docker $USERAdd to docker group at provision time
Linux (disk full)Docker disk usage at 100%docker system prune -af --volumesAutomatic weekly pruning via cron
macOSDocker Desktop not runningopen -a 'Docker Desktop'Add to Login Items
WSL2Docker Desktop WSL integration offToggle integration in SettingsEnable integration for all distros
Snap DockerSnap socket path differentexport DOCKER_HOST=unix:///var/run/snap.docker.socketSet DOCKER_HOST in shell profile

Key takeaways

1
The error means the Docker CLI cannot reach the daemon through /var/run/docker.sock
2
Five components to check
installation, daemon status, socket, permissions, disk
3
On Linux, fix permissions by adding users to the docker group
never run everything with sudo
4
Enable Docker on boot with systemctl enable docker for production servers
5
Disk exhaustion is the #1 cause of daemon crashes
set up automatic pruning
6
Configure systemd watchdog to auto-restart the daemon when it becomes unresponsive

Common mistakes to avoid

6 patterns
×

Running all docker commands with sudo instead of fixing permissions

Symptom
Every docker command requires sudo — scripts break when run as non-root, security risk from running containers as root
Fix
Add your user to the docker group: sudo usermod -aG docker $USER. Log out and back in. Verify with: docker ps (no sudo needed).
×

Not enabling Docker to start on boot

Symptom
After every server reboot, Docker is not running and all containers are stopped — manual intervention required
Fix
Run: sudo systemctl enable docker. This creates the systemd symlink so Docker starts automatically on boot.
×

Ignoring disk usage until the daemon crashes

Symptom
Docker daemon crashes silently when /var/lib/docker fills up — no error message, just connection refused
Fix
Set up monitoring: df -h /var/lib/docker. Configure automatic pruning: docker system prune -af --filter 'until=168h' in a weekly cron job.
×

Installing Docker Engine inside WSL2 when Docker Desktop is already running

Symptom
Two Docker daemons conflict — socket path confusion, unexpected behavior, containers running in the wrong environment
Fix
Uninstall Docker Engine from WSL2: sudo apt remove docker-ce. Use only Docker Desktop's WSL integration for WSL2 containers.
×

Not checking daemon logs when the fix commands do not work

Symptom
Trying random fixes without understanding the actual failure — wasting time on the wrong diagnosis
Fix
Always check logs first: sudo journalctl -u docker.service -n 50 --no-pager. The logs tell you exactly why the daemon failed to start.
×

Restarting the daemon without checking why it crashed

Symptom
Daemon crashes repeatedly — same root cause (disk full, config error) triggers the same failure on each restart
Fix
Before restarting, check: journalctl -u docker.service, df -h /var/lib/docker, and cat /etc/docker/daemon.json for config errors.
INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01JUNIOR
What does the error 'Cannot connect to the Docker daemon at unix:///var/...
Q02SENIOR
A developer reports this error on a shared development server. How do yo...
Q03SENIOR
How would you design a production monitoring system that prevents Docker...
Q01 of 03JUNIOR

What does the error 'Cannot connect to the Docker daemon at unix:///var/run/docker.sock' mean?

ANSWER
This error means the Docker CLI cannot communicate with the Docker daemon process through the Unix socket at /var/run/docker.sock. The Docker architecture has two components: the CLI (the command-line tool) and the daemon (the background service that manages containers). The CLI sends requests to the daemon through a Unix socket. When the socket is unavailable, inaccessible, or the daemon is not running, every Docker command fails with this error. The five primary causes are: 1. The Docker daemon is not running 2. The current user does not have permission to access the socket 3. The socket file is missing 4. The disk is full, causing the daemon to crash 5. The environment is misconfigured (WSL2, Snap, remote Docker host)
FAQ · 6 QUESTIONS

Frequently Asked Questions

01
Why do I need sudo to run Docker commands on Linux?
02
Is it safe to add users to the docker group?
03
Can I change the Docker socket path?
04
What is the difference between Docker Engine and Docker Desktop?
05
How do I check if the Docker daemon is running?
06
What does 'Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?' mean?
🔥

That's Docker. Mark it forged?

4 min read · try the examples if you haven't

Previous
Docker in Production
18 / 18 · Docker
Next
Introduction to Kubernetes