Senior 3 min · March 17, 2026

CI/CD Silent Failure — Expired Docker Credentials

Docker registry credentials expired, push succeeded with exit code 0 but image never uploaded.

N
Naren · Founder
Plain-English first. Then code. Then the interview question.
About
 ● Production Incident 🔎 Debug Guide
Quick Answer
  • CI automatically builds and tests code on every push
  • CD produces a deployable artifact after every successful CI run
  • Continuous Deployment auto-deploys to production
  • Pipelines catch bugs early, reduce deployment risk
  • Aim for pipeline under 10 minutes — longer loses value
  • A failing pipeline that doesn't alert is a silent time bomb
Plain-English First

CI/CD is like having an automated quality check for your code. Every time you make a change, the system automatically checks if it works and prepares it for release, so you don't have to remember all the steps.

CI/CD is the backbone of modern software delivery. It replaces the old model of big-bang releases with a continuous flow of small, validated changes. The core idea: every code change goes through an automated pipeline that builds, tests, and — optionally — deploys it. This isn't a luxury; teams that skip CI/CD spend 2x to 3x more time resolving integration conflicts and debugging production failures. Here's the catch: a poorly designed pipeline can be worse than no pipeline — it can give false confidence. This guide covers what CI/CD actually means, how to build one that works, and the production failures you'll face if you don't.

A GitHub Actions CI Pipeline

This GitHub Actions workflow runs on every push to main/develop and on pull requests. It sets up Python, caches pip, installs dependencies, runs ruff linter, runs pytest with coverage, and uploads results. The cache step is critical: without it, each pipeline run downloads all packages from scratch, turning a 2-minute install into a 10-minute one.

ExampleYAML
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
# .github/workflows/ci.yml
name: CI

on:
  push:
    branches: [main, develop]
  pull_request:
    branches: [main]

jobs:
  test:
    runs-on: ubuntu-latest

    steps:
      - name: Checkout code
        uses: actions/checkout@v4

      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: '3.12'

      - name: Cache pip packages
        uses: actions/cache@v4
        with:
          path: ~/.cache/pip
          key: ${{ runner.os }}-pip-${{ hashFiles('requirements.txt') }}

      - name: Install dependencies
        run: pip install -r requirements.txt

      - name: Run linter
        run: ruff check .

      - name: Run tests with coverage
        run: pytest --cov=. --cov-report=xml

      - name: Upload coverage
        uses: codecov/codecov-action@v4
Output
# Pipeline runs on every push and PR to main
Production Insight
If the cache key doesn't match, each pipeline run downloads all dependencies from scratch, turning a 2-minute install into a 10-minute one.
Always validate cache hit rate in pipeline metrics.
Key Takeaway
Cache dependencies carefully — a missed cache doubles install time.
Monitor cache hit rate to know if your caching strategy works.

Adding CD — Automatic Deployment

The deploy job runs after the test job passes, only on main. It builds a Docker image, tags it with the commit SHA, pushes to a registry, then SSHes to a staging server and redeploys using docker compose. This is the core of Continuous Deployment to staging — a human still gates production deployments?

ExampleYAML
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
# Add to the same file — deploy job runs after test passes
  deploy:
    needs: test        # only run if test job passes
    runs-on: ubuntu-latest
    if: github.ref == 'refs/heads/main'  # only deploy from main

    steps:
      - uses: actions/checkout@v4

      - name: Build Docker image
        run: |
          docker build -t myapp:${{ github.sha }} .
          docker tag myapp:${{ github.sha }} myapp:latest

      - name: Push to registry
        run: |
          echo ${{ secrets.DOCKER_PASSWORD }} | docker login -u ${{ secrets.DOCKER_USERNAME }} --password-stdin
          docker push myapp:${{ github.sha }}
          docker push myapp:latest

      - name: Deploy to staging
        run: |
          # SSH to server and pull new image
          ssh -o StrictHostKeyChecking=no deploy@${{ secrets.STAGING_HOST }} \
            "docker pull myapp:latest && docker compose up -d"
Output
# Deploy runs only on main branch after tests pass
Production Insight
Deploy step can fail silently if the SSH host is unreachable or secrets are missing.
Always add a health check after deploy to verify the new container is serving traffic.
Key Takeaway
Deployment is not done until the service responds correctly.
Add a curl check after docker compose up -d to confirm the app started.

CI/CD Pipeline Stages and Their Purpose

Every CI/CD pipeline follows a set of stages that gate each other. The typical flow: lint → unit test → build → integration test → deploy → smoke test. Each stage acts as a safety net. Lint catches formatting and logic errors fast. Unit tests validate individual functions. Build produces the artifact. Integration tests confirm the artifact works in a real environment. Deploy publishes it. Smoke test verifies the service is alive.

The order matters — you want the fastest checks first so failures are caught early, without wasting time on slower stages.

ExampleYAML
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
name: CI/CD Pipeline

on: [push]

jobs:
  lint:
    runs-on: ubuntu-latest
    steps:
      - run: npm run lint
  test:
    needs: lint
    runs-on: ubuntu-latest
    steps:
      - run: npm test
  build:
    needs: test
    runs-on: ubuntu-latest
    steps:
      - run: docker build -t myapp .
  deploy:
    needs: build
    runs-on: ubuntu-latest
    steps:
      - run: ./deploy.sh
Output
Stages run sequentially: lint, test, build, deploy
The Pipeline as a Gate Chain
  • Fastest stages first — fail early, waste less compute
  • Each stage should be deterministic — same commit always produces same result
  • Stages that depend on external services (DB, API) should run integration tests, not just unit tests
  • A stage that takes more than 5 minutes is a candidate for parallelisation
Production Insight
A skipped stage (e.g., skipping lint to save time) lets bad code through.
One un-linted commit caused a production outage when a missing semicolon broke the minifier.
Never skip stages — instead, make them faster.
Key Takeaway
Pipeline stages are safety nets — order them from fastest to slowest.
Never skip a stage for speed; optimise the slow stage instead.
Deciding Which Stages to Include
IfSingle developer, no external integrations
UseLint, test, build only — deploy manually
IfTeam of 5+, production deploy to one environment
UseAdd integration tests and a deploy stage
IfMultiple environments (staging, prod)
UseAdd smoke tests after each deploy, separate deploy stages
IfRegulatory compliance or critical uptime
UseAdd security scanning, performance tests, manual approval gates

Continuous Delivery vs Continuous Deployment — The Trade-off

Continuous Delivery means every successful build produces a deployable artifact, but a human decides when to push it to production. Continuous Deployment goes all the way — each successful build is automatically deployed to production. The choice depends on your risk tolerance and deployment processes.

Continuous Delivery is safer for regulated industries or when you need a manual QA sign-off. Continuous Deployment is faster for teams with strong automated testing and rollback capability. The real question: can you detect and fix a bad deploy in under 5 minutes? If not, start with Delivery, not Deployment.

Production Insight
A team running Continuous Deployment with flaky tests had 3 bad deploys out of 100 — each took 20 minutes to roll back.
They switched to Continuous Delivery with a 10-minute manual gate, which caught 2 of the 3 bad builds before they hit production.
Lesson: automated deployment without reliable tests is just speeding up failure.
Key Takeaway
Continuous Delivery is safer; Continuous Deployment is faster.
Choose Deployment only when you trust your tests and can roll back in minutes.
Delivery vs Deployment — Which Fits Your Team?
IfStrong automated test coverage (>80%), fast rollback possible
UseContinuous Deployment — go for it
IfManual QA is required before production
UseContinuous Delivery — human approval gate
IfNo rollback mechanism, tests are flaky
UseContinuous Delivery with extra staging verification
IfHigh compliance (finance, healthcare)
UseContinuous Delivery with multiple approval steps

CI/CD Best Practices for Production Teams

Over years of building and debugging pipelines, these practices separate effective CI/CD from pipelines that cause more harm than good:

  1. Fast feedback — Keep the pipeline under 10 minutes. Long pipelines discourage frequent pushes. Split long tests into separate workflows or parallelise.
  2. Idempotent steps — Every step should produce the same result given the same input. Avoid steps that depend on global state or mutable external resources.
  3. Secrets management — Never hardcode secrets. Use encrypted environment variables (GitHub Actions secrets, GitLab CI variables, etc.) and rotate them regularly.
  4. Health checks after deploy — Deploy is not complete until the new version responds correctly. Add a curl or similar check in the deploy job.
  5. Rollback capability — Every deploy should be rollback-able. Tag Docker images with commit SHA so you can redeploy a known-good version.
  6. Pipeline as code — Version your pipeline definitions alongside your code. This makes changes reviewable and traceable.
Production Insight
A team had a pipeline step that ran a database migration — it was not idempotent.
When the pipeline re-ran after a failure, the migration ran twice and corrupted the schema.
Fix: make migrations idempotent using IF NOT EXISTS checks and version tracking.
Key Takeaway
Idempotency, fast feedback, and rollback are non-negotiable.
A pipeline that can't be safely re-run is a liability.

Common CI/CD Pipeline Failures and How to Debug Them

Even well-designed pipelines fail. The most common failures fall into a few categories: - Environment drift: Your pipeline uses a Docker image that is updated upstream, breaking your build. Pin base image versions. - Cache poisoning: An old cache contains corrupt or outdated dependencies. Clear cache periodically. - Flaky tests: Tests that pass locally but fail in CI due to timing or order dependence. Use --reruns or retry strategy. - Secret expiration: Tokens or passwords expire. Automate rotation and alert on failure. - Resource exhaustion: Disk space or memory runs out during build. Add cleanup steps and monitor usage.

Watch Out: Silent Failures
The most dangerous failures are the ones that don't wake anyone up. A pipeline that passes but doesn't actually deploy is worse than one that fails loudly. Always add a notification for every stage outcome, even successes—so you notice when successes stop coming.
Production Insight
A common pattern: the deploy step succeeds but the application crashes on startup due to a missing environment variable.
The pipeline passes, alerting fires, but no one notices for 30 minutes because the deploy step returned exit code 0.
Fix: add a health check step that verifies the service is responsive after deploy.
Key Takeaway
Pipeline success only means the steps ran without error.
Always validate the actual behaviour of the deployed service.
Add a health check after every deploy.
● Production incidentPOST-MORTEMseverity: high

Silent Pipeline Failure: Image Not Found in Registry

Symptom
The deploy job logs showed 'manifest for myapp:abc123 not found'. The pipeline marked as failed after retries, but the on-call engineer was not alerted because the failure was attributed to a transient network issue.
Assumption
The team assumed that if the Docker build step succeeded, the image would always be accessible from the registry. They also assumed that a push failure would cause the pipeline to fail immediately.
Root cause
The Docker registry credentials had expired. The docker login step succeeded because it used cached credentials, but the push step failed silently (exit code 0 due to a bug in the action). The image was never uploaded, but the pipeline continued, then the deploy step failed trying to pull a non-existent image.
Fix
1. Updated the credentials and set up a cron job to rotate them monthly. 2. Replaced the push step with a version that exits non-zero on failure. 3. Added a verification step after push: pull the image back and run a smoke test. 4. Added alerting on any deploy step failure, even retryable ones.
Key lesson
  • Never assume a push succeeded — verify by pulling and running the artifact in a test container.
  • All pipeline steps should have proper exit code handling — don't rely on default behavior.
  • Rotate secrets proactively; do not wait for them to expire at 2 AM on a Sunday.
Production debug guideSymptom → Action guide for pipeline operators5 entries
Symptom · 01
Pipeline fails on lint step with 'error' but linter passes locally
Fix
Check if the local linter version matches CI. Pin linter version in pipeline. Run linter with same config file as CI.
Symptom · 02
Docker build fails with 'no space left on device'
Fix
Run docker system prune -af to free space. Also check if build cache is too large — consider multi-stage builds.
Symptom · 03
Deploy step hangs for more than 5 minutes then times out
Fix
Check SSH connectivity: ssh -v user@host. Verify the remote Docker daemon is running. Check if the compose file is valid.
Symptom · 04
Tests pass locally but fail intermittently in CI
Fix
Add --reruns=2 to test command. Order-dependent tests: use --shuffle to reproduce. Check for timing issues with external services.
Symptom · 05
Pipeline passes but no new version is deployed
Fix
Check the deploy job's if condition. Verify the branch name matches. Check that the artifact was actually pushed — look for registry tags.
★ CI/CD Pipeline Debugging Cheat SheetImmediate actions and commands for common pipeline failures.
Lint/test step fails unexpectedly
Immediate action
Read the full log output, not just the summary.
Commands
npx eslint . --format compact
pytest --lf --tb=long
Fix now
If CI uses a different OS or architecture, replicate locally with Docker: docker run --rm -v $(pwd):/repo -w /repo node:20 bash -c 'npm ci && npm run lint'
Docker build consumes all disk space+
Immediate action
Stop the pipeline, free space on CI runner.
Commands
docker system prune -af --volumes
du -sh /var/lib/docker/
Fix now
Add a cleanup step before build: docker system prune -f --filter 'until=24h'. Use multi-stage builds to reduce image size.
Deploy step fails with connection refused+
Immediate action
Check if the target server is reachable and SSH port is open.
Commands
ssh -o ConnectTimeout=5 -v deploy@$STAGING_HOST
curl -I --connect-timeout 3 https://app-staging.example.com
Fix now
If SSH fails, check security group rules and host key. If app health check fails, restart the old version: docker-compose up -d --no-deps app
CI vs CD vs Continuous Deployment
DimensionContinuous IntegrationContinuous DeliveryContinuous Deployment
TriggerEvery pushEvery successful CI runEvery successful CI run
Artifact producedTest results, build outputDeployable artifact (e.g., Docker image)Same as CD
Production deploymentNot performedManual approval requiredAutomated, no human gate
Risk levelLow (catches bugs before release)Medium (human review catch issues)High (trust tests entirely)
Feedback cycleMinutesMinutes to hoursSeconds to minutes
Typical use caseAll projectsProjects with QA or compliance gatesHighly automated teams with fast rollback

Key takeaways

1
CI catches integration bugs early
on every push, not at release time.
2
Continuous Delivery
every successful build is deployable. Continuous Deployment: it deploys automatically.
3
Pipeline stages
lint → test → build → deploy. Each stage gates the next.
4
Secrets in CI must be stored as encrypted environment variables
never hardcode credentials.
5
Fast feedback loop is the goal
a CI pipeline longer than 10 minutes loses its value.
6
A CI/CD pipeline is only as good as its alerting
silent failures lose trust.

Common mistakes to avoid

5 patterns
×

Hardcoding secrets in pipeline YAML

Symptom
Secrets are visible in pipeline logs or repository. Attackers can gain access to production systems.
Fix
Use pipeline secrets variables (e.g., GitHub Actions secrets, GitLab CI variables). Never write secrets in plain text. Rotate secrets every 90 days.
×

Using depends_on without a healthcheck in Docker Compose for pipelines

Symptom
The dependent container starts but the service inside is not ready, causing test failures or deployment errors.
Fix
Add healthcheck to the service and use condition: service_healthy in the depends_on in your CI pipeline deployment steps.
×

Not pinning base image versions in Dockerfile

Symptom
A future update to python:3.12 breaks your build. The pipeline fails unpredictably.
Fix
Use a specific SHA-based digest or a versioned tag like python:3.12-slim@sha256:abc.... Update intentionally and test.
×

Ignoring flaky tests in CI

Symptom
Tests randomly fail and are re-run until they pass, masking real failures.
Fix
Run flaky tests separately with a retry mechanism, but also fix the underlying cause: use --seeds to reproduce order-dependent failures, add wait strategies for async code.
×

Manually managing pipeline deployment steps without rollback

Symptom
A bad deploy goes to production and there is no quick way to revert.
Fix
Always tag artifacts with commit SHA. Keep the last known-good image tag. Automate rollback: kubectl rollout undo or docker-compose pull && docker-compose up -d with previous version.
INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01JUNIOR
What is the difference between CI and CD?
Q02SENIOR
What stages should a good CI/CD pipeline have?
Q03SENIOR
What is the difference between Continuous Delivery and Continuous Deploy...
Q04SENIOR
How do you handle flaky tests in a CI pipeline?
Q05SENIOR
What is a pipeline gate and why is it important?
Q01 of 05JUNIOR

What is the difference between CI and CD?

ANSWER
CI (Continuous Integration) is the practice of automatically building and testing every code change, usually on every push to a shared branch. CD (Continuous Delivery) is the practice of automatically preparing that build for deployment — producing a deployable artifact and optionally deploying to a staging environment. The key difference: CI ends when tests pass; CD starts from that point to produce a deployable artifact.
FAQ · 3 QUESTIONS

Frequently Asked Questions

01
What is the difference between Continuous Delivery and Continuous Deployment?
02
What should a good CI pipeline include?
03
Why shouldn't I hardcode secrets in pipeline config?
🔥

That's CI/CD. Mark it forged?

3 min read · try the examples if you haven't

Previous
Service Mesh — Istio Basics
1 / 14 · CI/CD
Next
GitHub Actions Tutorial