CI/CD Silent Failure — Expired Docker Credentials
Docker registry credentials expired, push succeeded with exit code 0 but image never uploaded.
20+ years shipping production infrastructure and CI/CD at scale. Everything here is grounded in real deployments.
- CI automatically builds and tests code on every push
- CD produces a deployable artifact after every successful CI run
- Continuous Deployment auto-deploys to production
- Pipelines catch bugs early, reduce deployment risk
- Aim for pipeline under 10 minutes — longer loses value
- A failing pipeline that doesn't alert is a silent time bomb
CI/CD is like having an automated quality check for your code. Every time you make a change, the system automatically checks if it works and prepares it for release, so you don't have to remember all the steps.
CI/CD is the backbone of modern software delivery. It replaces the old model of big-bang releases with a continuous flow of small, validated changes. The core idea: every code change goes through an automated pipeline that builds, tests, and — optionally — deploys it. This isn't a luxury; teams that skip CI/CD spend 2x to 3x more time resolving integration conflicts and debugging production failures. Here's the catch: a poorly designed pipeline can be worse than no pipeline — it can give false confidence. This guide covers what CI/CD actually means, how to build one that works, and the production failures you'll face if you don't.
Why CI/CD Pipelines Fail Silently — And How to Fix It
CI/CD (Continuous Integration/Continuous Deployment) is the automated pipeline that builds, tests, and deploys your code. The core mechanic: every commit triggers a sequence of jobs — compile, test, package, deploy — each step running in an isolated environment with its own credentials. When those credentials expire, the pipeline doesn't always fail loudly. Instead, it may silently skip deployment, push a stale artifact, or fail halfway through with a cryptic error like '401 Unauthorized' or 'Access denied'. This is the silent failure mode.
In practice, CI/CD pipelines rely on stored secrets — Docker registry tokens, cloud provider keys, API tokens — to authenticate during build and deploy steps. These secrets have lifetimes: some expire in 1 hour, others in 30 days. If your pipeline doesn't refresh them before expiry, the step that pulls the base image or pushes the final artifact fails. But because CI/CD systems often retry steps or cache layers, the failure may not surface until the next deployment attempt, wasting developer time and eroding trust in automation.
Use credential rotation with automated refresh for any CI/CD pipeline that runs longer than the credential's lifetime. For Docker registries, set up a token refresh job that runs before the build step. For cloud providers, use short-lived tokens from a secrets manager (e.g., AWS Secrets Manager, HashiCorp Vault) that are generated per pipeline run. This matters because a single expired credential can silently break your deployment pipeline, causing production drift and delayed rollouts — exactly the problems CI/CD is supposed to prevent.
A GitHub Actions CI Pipeline
This GitHub Actions workflow runs on every push to main/develop and on pull requests. It sets up Python, caches pip, installs dependencies, runs ruff linter, runs pytest with coverage, and uploads results. The cache step is critical: without it, each pipeline run downloads all packages from scratch, turning a 2-minute install into a 10-minute one.
Adding CD — Automatic Deployment
The deploy job runs after the test job passes, only on main. It builds a Docker image, tags it with the commit SHA, pushes to a registry, then SSHes to a staging server and redeploys using docker compose. This is the core of Continuous Deployment to staging — a human still gates production deployments?
CI/CD Pipeline Stages and Their Purpose
Every CI/CD pipeline follows a set of stages that gate each other. The typical flow: lint → unit test → build → integration test → deploy → smoke test. Each stage acts as a safety net. Lint catches formatting and logic errors fast. Unit tests validate individual functions. Build produces the artifact. Integration tests confirm the artifact works in a real environment. Deploy publishes it. Smoke test verifies the service is alive.
The order matters — you want the fastest checks first so failures are caught early, without wasting time on slower stages.
- Fastest stages first — fail early, waste less compute
- Each stage should be deterministic — same commit always produces same result
- Stages that depend on external services (DB, API) should run integration tests, not just unit tests
- A stage that takes more than 5 minutes is a candidate for parallelisation
Continuous Delivery vs Continuous Deployment — The Trade-off
Continuous Delivery means every successful build produces a deployable artifact, but a human decides when to push it to production. Continuous Deployment goes all the way — each successful build is automatically deployed to production. The choice depends on your risk tolerance and deployment processes.
Continuous Delivery is safer for regulated industries or when you need a manual QA sign-off. Continuous Deployment is faster for teams with strong automated testing and rollback capability. The real question: can you detect and fix a bad deploy in under 5 minutes? If not, start with Delivery, not Deployment.
CI/CD Best Practices for Production Teams
Over years of building and debugging pipelines, these practices separate effective CI/CD from pipelines that cause more harm than good:
- Fast feedback — Keep the pipeline under 10 minutes. Long pipelines discourage frequent pushes. Split long tests into separate workflows or parallelise.
- Idempotent steps — Every step should produce the same result given the same input. Avoid steps that depend on global state or mutable external resources.
- Secrets management — Never hardcode secrets. Use encrypted environment variables (GitHub Actions secrets, GitLab CI variables, etc.) and rotate them regularly.
- Health checks after deploy — Deploy is not complete until the new version responds correctly. Add a curl or similar check in the deploy job.
- Rollback capability — Every deploy should be rollback-able. Tag Docker images with commit SHA so you can redeploy a known-good version.
- Pipeline as code — Version your pipeline definitions alongside your code. This makes changes reviewable and traceable.
Common CI/CD Pipeline Failures and How to Debug Them
Even well-designed pipelines fail. The most common failures fall into a few categories: - Environment drift: Your pipeline uses a Docker image that is updated upstream, breaking your build. Pin base image versions. - Cache poisoning: An old cache contains corrupt or outdated dependencies. Clear cache periodically. - Flaky tests: Tests that pass locally but fail in CI due to timing or order dependence. Use --reruns or retry strategy. - Secret expiration: Tokens or passwords expire. Automate rotation and alert on failure. - Resource exhaustion: Disk space or memory runs out during build. Add cleanup steps and monitor usage.
What Breaks Before CI/CD: The Merge Hell Tax
Before automation, release day was a war room. Developers worked in isolation for weeks, merging everything into a single branch right before deployment. The result? Merge conflicts that took days to resolve. Builds that broke because someone forgot to commit a dependency. Bugs that surfaced only in production because testing happened once, at the end. This is the 'merge hell tax.' It cost teams velocity and morale. Operations teams manually deployed artifacts, often with copy-paste errors. Rollbacks meant restoring from database snapshots, hoping you didn't lose customer data. The core problem wasn't bad developers. It was a process that punished frequent changes. CI/CD flips that: small, frequent integrations mean conflicts are caught in minutes, not weeks. The WHY is simple—decrease batch size to reduce risk. The HOW is automation. If your team still has a dedicated 'release manager' who schedules deployments, you're paying the tax.
The Three Pillars of CI/CD: Pick the Right CD
CI/CD isn't one thing. It's three distinct practices with different risk profiles. Continuous Integration (CI) is non-negotiable: every commit triggers build + tests. Fail fast or don't merge. Continuous Delivery means every CI pass produces a deployable artifact, but a human decides when to push to production. This is for regulated industries where audits require approval gates. Continuous Deployment automates the release entirely—every commit that passes CI goes to production. This suits teams with robust test coverage and feature flags. The mistake? Teams jump to Continuous Deployment without investing in test reliability. Your pipeline becomes a noisy flake factory. Measure your test suite's false-positive rate. If >5% of builds fail due to flaky tests, stay on Continuous Delivery until you fix it. Your org's tolerance for risk determines which CD you adopt—not your desire for automation.
Silent Pipeline Failure: Image Not Found in Registry
- Never assume a push succeeded — verify by pulling and running the artifact in a test container.
- All pipeline steps should have proper exit code handling — don't rely on default behavior.
- Rotate secrets proactively; do not wait for them to expire at 2 AM on a Sunday.
docker system prune -af to free space. Also check if build cache is too large — consider multi-stage builds.ssh -v user@host. Verify the remote Docker daemon is running. Check if the compose file is valid.--reruns=2 to test command. Order-dependent tests: use --shuffle to reproduce. Check for timing issues with external services.if condition. Verify the branch name matches. Check that the artifact was actually pushed — look for registry tags.npx eslint . --format compactpytest --lf --tb=longKey takeaways
Common mistakes to avoid
5 patternsHardcoding secrets in pipeline YAML
Using depends_on without a healthcheck in Docker Compose for pipelines
condition: service_healthy in the depends_on in your CI pipeline deployment steps.Not pinning base image versions in Dockerfile
python:3.12 breaks your build. The pipeline fails unpredictably.python:3.12-slim@sha256:abc.... Update intentionally and test.Ignoring flaky tests in CI
--seeds to reproduce order-dependent failures, add wait strategies for async code.Manually managing pipeline deployment steps without rollback
kubectl rollout undo or docker-compose pull && docker-compose up -d with previous version.Interview Questions on This Topic
What is the difference between CI and CD?
Frequently Asked Questions
20+ years shipping production infrastructure and CI/CD at scale. Everything here is grounded in real deployments.
That's CI/CD. Mark it forged?
5 min read · try the examples if you haven't