Senior 5 min · March 17, 2026

CI/CD Silent Failure — Expired Docker Credentials

Q: What is the difference between Continuous Delivery and Continuous Deployment?

Continuous Delivery means every successful build produces an artifact that could be deployed — but a human decides when. Continuous Deployment goes all the way: every successful build is automatically deployed to production with no human approval step. Most teams practice Continuous Delivery for production (human approval gate) but Continuous Deployment to staging.

Q: What should a good CI pipeline include?

Minimum: linting, unit tests, integration tests. Better: security scanning (SAST), dependency vulnerability check, test coverage threshold enforcement, and Docker image build verification. For production services: end-to-end tests in a staging environment before deploying to production.

Q: Why shouldn't I hardcode secrets in pipeline config?

Pipeline configs are often stored in version control and may be visible to all repository collaborators. Hardcoded secrets can be exposed in logs, error messages, or through pull request review. Use encrypted pipeline variables (e.g., GitHub Actions secrets, GitLab CI variables) that are injected at runtime and never stored in the repository.

Docker registry credentials expired, push succeeded with exit code 0 but image never uploaded.

Naren Founder & Principal Engineer

20+ years shipping production infrastructure and CI/CD at scale. Everything here is grounded in real deployments.

✓ Production

production tested

May 24, 2026

last updated

1,554

articles · all by Naren

● Production Incident 🔎 Debug Guide ⚙ Triage Commands

⚡Quick Answer

CI automatically builds and tests code on every push
CD produces a deployable artifact after every successful CI run
Continuous Deployment auto-deploys to production
Pipelines catch bugs early, reduce deployment risk
Aim for pipeline under 10 minutes — longer loses value
A failing pipeline that doesn't alert is a silent time bomb

✦ Definition~90s read

What is Introduction to CI/CD?

CI/CD (Continuous Integration/Continuous Delivery or Deployment) is the automated pipeline that takes code from commit to production. Continuous Integration merges developer changes into a shared branch multiple times daily, automatically building and testing each change to catch integration bugs early.

★

CI/CD is like having an automated quality check for your code.

Continuous Delivery extends this by ensuring every passing build is deployable to production with a manual trigger, while Continuous Deployment removes that manual gate and pushes every passing build straight to users. The core problem CI/CD solves is the human error and delay inherent in manual integration and deployment — teams that deploy manually see failure rates 5x higher than those using automated pipelines, according to DORA metrics.

In practice, a CI/CD pipeline consists of stages: source (trigger on push), build (compile dependencies), test (unit, integration, lint), package (create Docker image or artifact), and deploy (push to staging/production). GitHub Actions, GitLab CI, and Jenkins are common orchestrators.

The silent failure you'll encounter — expired Docker credentials — typically manifests in the package or deploy stage: the pipeline builds fine locally because your local Docker daemon has cached credentials, but the CI runner fails when pulling from a private registry like Docker Hub, ECR, or GCR. The pipeline exits with a non-zero code, but the error message is often buried in logs as "unauthorized: authentication required" or "denied: requested access to the resource is denied."

You should not use CI/CD if your project is a solo prototype with no collaborators, or if you're in a regulated environment where every deployment requires manual sign-off and audit trail — in those cases, Continuous Delivery with a manual approval gate is safer than full Continuous Deployment. The trade-off is speed versus control: Continuous Deployment gives you sub-minute time-to-production but requires robust automated testing and rollback strategies; Continuous Delivery adds a human check that catches 15-20% of issues before they hit users, per industry surveys.

For production teams, the best practice is to start with Continuous Delivery, add canary deployments and feature flags, then graduate to Continuous Deployment only when your test coverage exceeds 80% and your rollback time is under 5 minutes.

Plain-English First

CI/CD is like having an automated quality check for your code. Every time you make a change, the system automatically checks if it works and prepares it for release, so you don't have to remember all the steps.

CI/CD is the backbone of modern software delivery. It replaces the old model of big-bang releases with a continuous flow of small, validated changes. The core idea: every code change goes through an automated pipeline that builds, tests, and — optionally — deploys it. This isn't a luxury; teams that skip CI/CD spend 2x to 3x more time resolving integration conflicts and debugging production failures. Here's the catch: a poorly designed pipeline can be worse than no pipeline — it can give false confidence. This guide covers what CI/CD actually means, how to build one that works, and the production failures you'll face if you don't.

Why CI/CD Pipelines Fail Silently — And How to Fix It

CI/CD (Continuous Integration/Continuous Deployment) is the automated pipeline that builds, tests, and deploys your code. The core mechanic: every commit triggers a sequence of jobs — compile, test, package, deploy — each step running in an isolated environment with its own credentials. When those credentials expire, the pipeline doesn't always fail loudly. Instead, it may silently skip deployment, push a stale artifact, or fail halfway through with a cryptic error like '401 Unauthorized' or 'Access denied'. This is the silent failure mode.

In practice, CI/CD pipelines rely on stored secrets — Docker registry tokens, cloud provider keys, API tokens — to authenticate during build and deploy steps. These secrets have lifetimes: some expire in 1 hour, others in 30 days. If your pipeline doesn't refresh them before expiry, the step that pulls the base image or pushes the final artifact fails. But because CI/CD systems often retry steps or cache layers, the failure may not surface until the next deployment attempt, wasting developer time and eroding trust in automation.

Use credential rotation with automated refresh for any CI/CD pipeline that runs longer than the credential's lifetime. For Docker registries, set up a token refresh job that runs before the build step. For cloud providers, use short-lived tokens from a secrets manager (e.g., AWS Secrets Manager, HashiCorp Vault) that are generated per pipeline run. This matters because a single expired credential can silently break your deployment pipeline, causing production drift and delayed rollouts — exactly the problems CI/CD is supposed to prevent.

Expired Credentials Are Silent Killers

A pipeline that 'succeeds' but doesn't actually deploy the latest artifact is worse than a failed pipeline — it creates a false sense of safety.

Production Insight

A team's Docker Hub token expired at 2 AM. The nightly build pulled the cached base image (still valid), built the app, and pushed the image — but the push failed silently because the registry rejected the expired token. The deployment step saw no new image and deployed yesterday's artifact. The team only noticed 8 hours later when users reported missing features.

Symptom: CI/CD pipeline shows green, but the deployed artifact is stale. The push step logged '401 Unauthorized' but the pipeline didn't treat it as a failure because the step's exit code was not checked.

Rule of thumb: Every credential used in a pipeline must have a refresh mechanism that runs before the credential's expiry, and every step that uses credentials must fail the pipeline on non-zero exit codes.

Key Takeaway

1. CI/CD pipelines are only as reliable as their credential lifecycle management — expired tokens cause silent failures.

2. Always check exit codes on credential-dependent steps; a 'success' with a 401 is a failure.

3. Use short-lived, per-run tokens from a secrets manager instead of long-lived static credentials.

thecodeforge.io

CI/CD Silent Failure: Expired Docker Credentials

Introduction Cicd

A GitHub Actions CI Pipeline

This GitHub Actions workflow runs on every push to main/develop and on pull requests. It sets up Python, caches pip, installs dependencies, runs ruff linter, runs pytest with coverage, and uploads results. The cache step is critical: without it, each pipeline run downloads all packages from scratch, turning a 2-minute install into a 10-minute one.

ExampleYAML

# .github/workflows/ci.yml
name: CI

on:
  push:
    branches: [main, develop]
  pull_request:
    branches: [main]

jobs:
  test:
    runs-on: ubuntu-latest

    steps:
      - name: Checkout code
        uses: actions/checkout@v4

      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: '3.12'

      - name: Cache pip packages
        uses: actions/cache@v4
        with:
          path: ~/.cache/pip
          key: ${{ runner.os }}-pip-${{ hashFiles('requirements.txt') }}

      - name: Install dependencies
        run: pip install -r requirements.txt

      - name: Run linter
        run: ruff check .

      - name: Run tests with coverage
        run: pytest --cov=. --cov-report=xml

      - name: Upload coverage
        uses: codecov/codecov-action@v4

Output

# Pipeline runs on every push and PR to main

Production Insight

If the cache key doesn't match, each pipeline run downloads all dependencies from scratch, turning a 2-minute install into a 10-minute one.

Always validate cache hit rate in pipeline metrics.

Key Takeaway

Cache dependencies carefully — a missed cache doubles install time.

Monitor cache hit rate to know if your caching strategy works.

Adding CD — Automatic Deployment

The deploy job runs after the test job passes, only on main. It builds a Docker image, tags it with the commit SHA, pushes to a registry, then SSHes to a staging server and redeploys using docker compose. This is the core of Continuous Deployment to staging — a human still gates production deployments?

ExampleYAML

# Add to the same file — deploy job runs after test passes
  deploy:
    needs: test        # only run if test job passes
    runs-on: ubuntu-latest
    if: github.ref == 'refs/heads/main'  # only deploy from main

    steps:
      - uses: actions/checkout@v4

      - name: Build Docker image
        run: |
          docker build -t myapp:${{ github.sha }} .
          docker tag myapp:${{ github.sha }} myapp:latest

      - name: Push to registry
        run: |
          echo ${{ secrets.DOCKER_PASSWORD }} | docker login -u ${{ secrets.DOCKER_USERNAME }} --password-stdin
          docker push myapp:${{ github.sha }}
          docker push myapp:latest

      - name: Deploy to staging
        run: |
          # SSH to server and pull new image
          ssh -o StrictHostKeyChecking=no deploy@${{ secrets.STAGING_HOST }} \
            "docker pull myapp:latest && docker compose up -d"

Output

# Deploy runs only on main branch after tests pass

Production Insight

Deploy step can fail silently if the SSH host is unreachable or secrets are missing.

Always add a health check after deploy to verify the new container is serving traffic.

Key Takeaway

Deployment is not done until the service responds correctly.

Add a curl check after docker compose up -d to confirm the app started.

CI/CD Pipeline Stages and Their Purpose

Every CI/CD pipeline follows a set of stages that gate each other. The typical flow: lint → unit test → build → integration test → deploy → smoke test. Each stage acts as a safety net. Lint catches formatting and logic errors fast. Unit tests validate individual functions. Build produces the artifact. Integration tests confirm the artifact works in a real environment. Deploy publishes it. Smoke test verifies the service is alive.

The order matters — you want the fastest checks first so failures are caught early, without wasting time on slower stages.

ExampleYAML

name: CI/CD Pipeline

on: [push]

jobs:
  lint:
    runs-on: ubuntu-latest
    steps:
      - run: npm run lint
  test:
    needs: lint
    runs-on: ubuntu-latest
    steps:
      - run: npm test
  build:
    needs: test
    runs-on: ubuntu-latest
    steps:
      - run: docker build -t myapp .
  deploy:
    needs: build
    runs-on: ubuntu-latest
    steps:
      - run: ./deploy.sh

Output

Stages run sequentially: lint, test, build, deploy

The Pipeline as a Gate Chain

Fastest stages first — fail early, waste less compute
Each stage should be deterministic — same commit always produces same result
Stages that depend on external services (DB, API) should run integration tests, not just unit tests
A stage that takes more than 5 minutes is a candidate for parallelisation

Production Insight

A skipped stage (e.g., skipping lint to save time) lets bad code through.

One un-linted commit caused a production outage when a missing semicolon broke the minifier.

Never skip stages — instead, make them faster.

Key Takeaway

Pipeline stages are safety nets — order them from fastest to slowest.

Never skip a stage for speed; optimise the slow stage instead.

Deciding Which Stages to Include

IfSingle developer, no external integrations

→

UseLint, test, build only — deploy manually

IfTeam of 5+, production deploy to one environment

→

UseAdd integration tests and a deploy stage

IfMultiple environments (staging, prod)

→

UseAdd smoke tests after each deploy, separate deploy stages

IfRegulatory compliance or critical uptime

→

UseAdd security scanning, performance tests, manual approval gates

Continuous Delivery vs Continuous Deployment — The Trade-off

Continuous Delivery means every successful build produces a deployable artifact, but a human decides when to push it to production. Continuous Deployment goes all the way — each successful build is automatically deployed to production. The choice depends on your risk tolerance and deployment processes.

Continuous Delivery is safer for regulated industries or when you need a manual QA sign-off. Continuous Deployment is faster for teams with strong automated testing and rollback capability. The real question: can you detect and fix a bad deploy in under 5 minutes? If not, start with Delivery, not Deployment.

Production Insight

A team running Continuous Deployment with flaky tests had 3 bad deploys out of 100 — each took 20 minutes to roll back.

They switched to Continuous Delivery with a 10-minute manual gate, which caught 2 of the 3 bad builds before they hit production.

Lesson: automated deployment without reliable tests is just speeding up failure.

Key Takeaway

Continuous Delivery is safer; Continuous Deployment is faster.

Choose Deployment only when you trust your tests and can roll back in minutes.

Delivery vs Deployment — Which Fits Your Team?

IfStrong automated test coverage (>80%), fast rollback possible

→

UseContinuous Deployment — go for it

IfManual QA is required before production

→

UseContinuous Delivery — human approval gate

IfNo rollback mechanism, tests are flaky

→

UseContinuous Delivery with extra staging verification

IfHigh compliance (finance, healthcare)

→

UseContinuous Delivery with multiple approval steps

CI/CD Best Practices for Production Teams

Over years of building and debugging pipelines, these practices separate effective CI/CD from pipelines that cause more harm than good:

Fast feedback — Keep the pipeline under 10 minutes. Long pipelines discourage frequent pushes. Split long tests into separate workflows or parallelise.
Idempotent steps — Every step should produce the same result given the same input. Avoid steps that depend on global state or mutable external resources.
Secrets management — Never hardcode secrets. Use encrypted environment variables (GitHub Actions secrets, GitLab CI variables, etc.) and rotate them regularly.
Health checks after deploy — Deploy is not complete until the new version responds correctly. Add a curl or similar check in the deploy job.
Rollback capability — Every deploy should be rollback-able. Tag Docker images with commit SHA so you can redeploy a known-good version.
Pipeline as code — Version your pipeline definitions alongside your code. This makes changes reviewable and traceable.

Production Insight

A team had a pipeline step that ran a database migration — it was not idempotent.

When the pipeline re-ran after a failure, the migration ran twice and corrupted the schema.

Fix: make migrations idempotent using IF NOT EXISTS checks and version tracking.

Key Takeaway

Idempotency, fast feedback, and rollback are non-negotiable.

A pipeline that can't be safely re-run is a liability.

Common CI/CD Pipeline Failures and How to Debug Them

Even well-designed pipelines fail. The most common failures fall into a few categories: - Environment drift: Your pipeline uses a Docker image that is updated upstream, breaking your build. Pin base image versions. - Cache poisoning: An old cache contains corrupt or outdated dependencies. Clear cache periodically. - Flaky tests: Tests that pass locally but fail in CI due to timing or order dependence. Use --reruns or retry strategy. - Secret expiration: Tokens or passwords expire. Automate rotation and alert on failure. - Resource exhaustion: Disk space or memory runs out during build. Add cleanup steps and monitor usage.

Watch Out: Silent Failures

The most dangerous failures are the ones that don't wake anyone up. A pipeline that passes but doesn't actually deploy is worse than one that fails loudly. Always add a notification for every stage outcome, even successes—so you notice when successes stop coming.

Production Insight

A common pattern: the deploy step succeeds but the application crashes on startup due to a missing environment variable.

The pipeline passes, alerting fires, but no one notices for 30 minutes because the deploy step returned exit code 0.

Fix: add a health check step that verifies the service is responsive after deploy.

Key Takeaway

Pipeline success only means the steps ran without error.

Always validate the actual behaviour of the deployed service.

Add a health check after every deploy.

What Breaks Before CI/CD: The Merge Hell Tax

Before automation, release day was a war room. Developers worked in isolation for weeks, merging everything into a single branch right before deployment. The result? Merge conflicts that took days to resolve. Builds that broke because someone forgot to commit a dependency. Bugs that surfaced only in production because testing happened once, at the end. This is the 'merge hell tax.' It cost teams velocity and morale. Operations teams manually deployed artifacts, often with copy-paste errors. Rollbacks meant restoring from database snapshots, hoping you didn't lose customer data. The core problem wasn't bad developers. It was a process that punished frequent changes. CI/CD flips that: small, frequent integrations mean conflicts are caught in minutes, not weeks. The WHY is simple—decrease batch size to reduce risk. The HOW is automation. If your team still has a dedicated 'release manager' who schedules deployments, you're paying the tax.

Production Trap:

If your last three production incidents involved manual steps by different engineers, your pipeline is the root cause. Automate every handoff.

Key Takeaway

Small, frequent integrations catch conflicts in minutes; large batches catch them on deployment day.

The Three Pillars of CI/CD: Pick the Right CD

CI/CD isn't one thing. It's three distinct practices with different risk profiles. Continuous Integration (CI) is non-negotiable: every commit triggers build + tests. Fail fast or don't merge. Continuous Delivery means every CI pass produces a deployable artifact, but a human decides when to push to production. This is for regulated industries where audits require approval gates. Continuous Deployment automates the release entirely—every commit that passes CI goes to production. This suits teams with robust test coverage and feature flags. The mistake? Teams jump to Continuous Deployment without investing in test reliability. Your pipeline becomes a noisy flake factory. Measure your test suite's false-positive rate. If >5% of builds fail due to flaky tests, stay on Continuous Delivery until you fix it. Your org's tolerance for risk determines which CD you adopt—not your desire for automation.

.github/workflows/deploy-choice.ymlYAML

// io.thecodeforge
name: Deploy
on: [push]

jobs:
  decide:
    runs-on: ubuntu-latest
    outputs:
      env: ${{ steps.check-ref.outputs.env }}
    steps:
      - id: check-ref
        run: |
          if [[ "${{ github.ref }}" == "refs/heads/main" ]]; then
            echo "env=production" >> $GITHUB_OUTPUT
          else
            echo "env=staging" >> $GITHUB_OUTPUT
          fi

  deploy:
    needs: decide
    environment: ${{ needs.decide.outputs.env }}
    steps:
      - run: echo "Deploying to ${{ needs.decide.outputs.env }}"

Output

Deploying to staging (on feature branch)

Deploying to production (on main branch)

Decision Tree:

Use Continuous Delivery if your production deploy requires sign-off. Use Continuous Deployment only if you can roll back in under 60 seconds and trust your test suite.

Key Takeaway

Continuous Delivery is for compliance; Continuous Deployment is for speed. Know which risk profile your business accepts.

● Production incidentPOST-MORTEMseverity: high

Silent Pipeline Failure: Image Not Found in Registry

Symptom

The deploy job logs showed 'manifest for myapp:abc123 not found'. The pipeline marked as failed after retries, but the on-call engineer was not alerted because the failure was attributed to a transient network issue.

Assumption

The team assumed that if the Docker build step succeeded, the image would always be accessible from the registry. They also assumed that a push failure would cause the pipeline to fail immediately.

Root cause

The Docker registry credentials had expired. The docker login step succeeded because it used cached credentials, but the push step failed silently (exit code 0 due to a bug in the action). The image was never uploaded, but the pipeline continued, then the deploy step failed trying to pull a non-existent image.

Fix

1. Updated the credentials and set up a cron job to rotate them monthly. 2. Replaced the push step with a version that exits non-zero on failure. 3. Added a verification step after push: pull the image back and run a smoke test. 4. Added alerting on any deploy step failure, even retryable ones.

Key lesson

Never assume a push succeeded — verify by pulling and running the artifact in a test container.
All pipeline steps should have proper exit code handling — don't rely on default behavior.
Rotate secrets proactively; do not wait for them to expire at 2 AM on a Sunday.

Production debug guideSymptom → Action guide for pipeline operators5 entries

Symptom · 01

Pipeline fails on lint step with 'error' but linter passes locally

→

Fix

Check if the local linter version matches CI. Pin linter version in pipeline. Run linter with same config file as CI.

Symptom · 02

Docker build fails with 'no space left on device'

→

Fix

Run docker system prune -af to free space. Also check if build cache is too large — consider multi-stage builds.

Symptom · 03

Deploy step hangs for more than 5 minutes then times out

→

Fix

Check SSH connectivity: ssh -v user@host. Verify the remote Docker daemon is running. Check if the compose file is valid.

Symptom · 04

Tests pass locally but fail intermittently in CI

→

Fix

Add --reruns=2 to test command. Order-dependent tests: use --shuffle to reproduce. Check for timing issues with external services.

Symptom · 05

Pipeline passes but no new version is deployed

→

Fix

Check the deploy job's if condition. Verify the branch name matches. Check that the artifact was actually pushed — look for registry tags.

★ CI/CD Pipeline Debugging Cheat SheetImmediate actions and commands for common pipeline failures.

Lint/test step fails unexpectedly−

Immediate action

Read the full log output, not just the summary.

Commands

npx eslint . --format compact

pytest --lf --tb=long

Fix now

If CI uses a different OS or architecture, replicate locally with Docker: docker run --rm -v $(pwd):/repo -w /repo node:20 bash -c 'npm ci && npm run lint'

Docker build consumes all disk space+

Deploy step fails with connection refused+

CI vs CD vs Continuous Deployment

Dimension	Continuous Integration	Continuous Delivery	Continuous Deployment
Trigger	Every push	Every successful CI run	Every successful CI run
Artifact produced	Test results, build output	Deployable artifact (e.g., Docker image)	Same as CD
Production deployment	Not performed	Manual approval required	Automated, no human gate
Risk level	Low (catches bugs before release)	Medium (human review catch issues)	High (trust tests entirely)
Feedback cycle	Minutes	Minutes to hours	Seconds to minutes
Typical use case	All projects	Projects with QA or compliance gates	Highly automated teams with fast rollback

Key takeaways

CI catches integration bugs early

on every push, not at release time.

Continuous Delivery

every successful build is deployable. Continuous Deployment: it deploys automatically.

Pipeline stages

lint → test → build → deploy. Each stage gates the next.

Secrets in CI must be stored as encrypted environment variables

never hardcode credentials.

Fast feedback loop is the goal

a CI pipeline longer than 10 minutes loses its value.

Symptom

A bad deploy goes to production and there is no quick way to revert.

Fix

Always tag artifacts with commit SHA. Keep the last known-good image tag. Automate rollback: kubectl rollout undo or docker-compose pull && docker-compose up -d with previous version.

INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01JUNIOR

What is the difference between CI and CD?

Q02SENIOR

What stages should a good CI/CD pipeline have?

Q03SENIOR

What is the difference between Continuous Delivery and Continuous Deploy...

Q04SENIOR

How do you handle flaky tests in a CI pipeline?

Q05SENIOR

What is a pipeline gate and why is it important?

Q01 of 05JUNIOR

What is the difference between CI and CD?

ANSWER

CI (Continuous Integration) is the practice of automatically building and testing every code change, usually on every push to a shared branch. CD (Continuous Delivery) is the practice of automatically preparing that build for deployment — producing a deployable artifact and optionally deploying to a staging environment. The key difference: CI ends when tests pass; CD starts from that point to produce a deployable artifact.

FAQ · 3 QUESTIONS

Frequently Asked Questions

What is the difference between Continuous Delivery and Continuous Deployment?

What should a good CI pipeline include?

Why shouldn't I hardcode secrets in pipeline config?

Naren Founder & Principal Engineer

20+ years shipping production infrastructure and CI/CD at scale. Everything here is grounded in real deployments.

✓ Verified

production tested

May 24, 2026

last updated

1,554

articles · all by Naren

🔥

That's CI/CD. Mark it forged?

5 min read · try the examples if you haven't