GitLab Pipeline — Silent Cache Poisoning from Build Caching
Deployed hash mismatch? Cache key based only on package-lock.
- You define stages (install, test, build, deploy) and jobs in a
.gitlab-ci.ymlfile. - A GitLab Runner executes each job in a fresh container, ensuring consistency.
- Artifacts pass files between jobs within one pipeline; cache speeds up future runs.
- Pipelines are triggered by events (push, MR, schedule).
- Jobs in the same stage run in parallel by default.
- Use
rulesfor conditional execution (branch, file changes). - Use
needsto create a DAG and start jobs as soon as dependencies finish. - The isolation is the source of truth; it eliminates "works on my machine."
- A failed pipeline blocks broken code from progressing, providing fast feedback.
- Protected variables and branches form the security perimeter for secrets.
Imagine you're baking a cake for a birthday party. Every time you bake, you follow the same steps: mix ingredients, bake, frost, then deliver. GitLab CI/CD is like hiring a robot baker that follows those exact steps automatically every time you change a recipe. You write the steps once in a file, and the robot handles the rest — mixing, baking, testing for taste, and delivering — without you lifting a finger. If the cake burns (your code breaks), the robot stops and tells you before the party (production) ever sees it.
Manual deployment processes are a liability. They're error-prone, undocumented, and create a feedback lag that lets bugs compound for days. GitLab CI/CD replaces tribal knowledge with a codified, automated pipeline that runs identically in an isolated environment every time.
The core value is reducing the cost of failure. A pipeline catches a regression in minutes, not days. This shifts the team's mindset from "avoid breaking things" to "ship with confidence." The .gitlab-ci.yml file becomes the single source of truth for your delivery process.
Common misconceptions include treating CI/CD as just running tests, or conflating caching with artifacts. Understanding the execution model, dependency management, and security boundaries is what separates a working pipeline from a production-grade one.
How GitLab Pipelines Actually Work (The Mental Model You Need)
Before writing a single line of YAML, you need the right mental model. A GitLab pipeline is a directed acyclic graph (DAG) of work. That's a fancy way of saying it's a series of jobs organized into stages, where each stage waits for the previous one to pass before running.
Here's the key hierarchy: a Pipeline contains Stages, and each Stage contains one or more Jobs. Jobs within the same stage run in parallel by default. Jobs in different stages run sequentially. A GitLab Runner — a separate process that can live on your own server or GitLab's shared infrastructure — picks up each job and executes it in an isolated environment, usually a Docker container.
Why does this matter? Because the isolation is what makes CI/CD trustworthy. Each job starts fresh, with no leftover state from previous jobs unless you explicitly pass artifacts or use caching. This means your tests can't accidentally pass because of something that only exists on one developer's machine. The pipeline environment is the single source of truth.
Every pipeline is triggered by an event: a git push, a merge request, a schedule, or a manual trigger. GitLab reads your .gitlab-ci.yml from the root of your repository and constructs the pipeline graph from it. If the file doesn't exist, no pipeline runs. If it has a syntax error, GitLab tells you immediately in the UI before anything executes.
only and except keywords are still documented but GitLab considers rules the modern replacement. rules supports complex conditional logic (if/when/changes/exists) in a single block, whereas only/except requires splitting conditions awkwardly across two keys. New pipelines should always use rules.default and before_script blocks are crucial for DRY (Don't Repeat Yourself) configuration, reducing the surface area for drift between jobs..gitlab-ci.yml file is the executable specification of your delivery process.Caching vs Artifacts: The Distinction That Changes Pipeline Speed
This is the most misunderstood concept in GitLab CI/CD, and getting it wrong will either break your pipeline or make it painfully slow. They look similar but serve completely different purposes.
Artifacts are files that jobs pass downstream within the same pipeline. When your build job creates a /dist folder, the deploy job needs that folder. You declare it as an artifact and GitLab uploads it to its object storage, then downloads it automatically for any downstream job that needs it. Artifacts are precise, pipeline-scoped, and short-lived.
Cache is a performance optimisation that persists across multiple pipelines. Your node_modules folder takes 45 seconds to download every run. Cache it with a key tied to your package-lock.json, and subsequent pipelines skip the download entirely unless your dependencies change. Cache is best-effort — GitLab can evict it, and you should never rely on it for correctness.
The mental model: artifacts are for passing work between jobs in a pipeline (correctness), cache is for skipping repeated work across pipelines (speed). If your deploy job can't find the built files, you have an artifact problem. If your pipeline is unnecessarily slow, you have a cache problem. These are never interchangeable.
/dist or build output is a trap. If the cache key doesn't change but your code did, you'll deploy stale code and spend hours debugging why your changes aren't showing up. Only cache things that are downloaded or generated from a lock file — never cache things your own code produces. Build outputs belong in artifacts, period.install job must always run npm ci or pip install even on a cache hit—to verify integrity and install any missing peer dependencies.artifacts.cache with a key based on a lock file.cache (for cross-pipeline speed) and artifacts (for intra-pipeline correctness).artifacts ONLY. Never cache.Environment-Based Deployments with Review Apps and Protected Branches
A mature CI/CD pipeline doesn't just have one deployment target. Real projects deploy to multiple environments: feature branches might spin up temporary 'review apps', merges to develop deploy to staging, and only merges to main reach production. This isn't complexity for its own sake — it's the safety net that lets teams ship fast without breaking things.
GitLab's environment keyword is what makes this elegant. When you define an environment in a job, GitLab tracks which pipeline version is running where. You can see at a glance in the GitLab UI that production is running commit f3a9c21 while staging has b7e1d04. You can also roll back to a previous deployment with one click directly from the Environments page.
Review Apps take this further. For every merge request, GitLab can automatically spin up a live, isolated environment just for that feature branch — complete with a unique URL. Product managers and designers can preview changes before they're merged. No more 'can you deploy this branch so I can see it?' conversations.
Protected branches add the security layer. When main is a protected branch, only Maintainers can push to it directly, and only pipelines triggered from protected branches can access protected CI/CD variables (like production API keys). This prevents a developer from accidentally deploying untested code to production.
when: manual on a production deploy if CI/CD is supposed to be automated. The answer: full automation (CD) makes sense when your test suite is comprehensive and your rollback is instant. For most teams, a human checkpoint before production is the right trade-off — you get automated testing and building, but a deliberate human decision to ship. This is Continuous Delivery (stop before prod) vs Continuous Deployment (fully automated to prod).environment keyword is more than a label; it's a state tracker. It enables one-click rollbacks by maintaining a deployment history. The on_stop job is critical for Review Apps to prevent resource leaks. Without it, every MR would leave behind a running container, burning infrastructure costs. Protected variables are the linchpin of security: they ensure production credentials are only exposed to pipelines that have passed the gate of a protected branch.environment key unlocks observability and rollback capabilities across all targets.develop branch.main branch.Pipeline Optimization: Parallelism, DAG, and Cutting Run Times in Half
Once your pipeline is working correctly, the next battle is speed. A 20-minute pipeline that runs on every commit destroys developer flow. The good news is that most slow pipelines have structural problems, not hardware problems, and they're fixable in YAML.
The first tool is the needs keyword, which unlocks GitLab's DAG (Directed Acyclic Graph) mode. By default, all jobs in stage 2 wait for ALL jobs in stage 1 to finish. With needs, a specific job can start the moment its direct dependencies finish — regardless of what stage it's in. If your build_api job doesn't depend on run_e2e_tests, why should it wait for it?
The second tool is parallel:matrix, which lets you run the same job multiple times with different variables simultaneously. Instead of running tests for Node 18, then Node 20, then Node 22 sequentially, you run all three at the same time. What was a 9-minute sequential test suite becomes a 3-minute parallel one.
The third tool is job-level rules with changes. If a push only touches markdown files in /docs, there's no reason to rebuild your entire application. The changes rule checks which files changed and skips jobs that don't need to run. Used aggressively, this can skip 70% of your pipeline on documentation-only commits.
needs: [] (an empty array). This tells GitLab to run it immediately when the pipeline starts, without waiting for any previous stage. It's the fastest possible job scheduling and works great for fast, independent checks.needs) is the single biggest lever for pipeline speed, but it introduces complexity in debugging. With sequential stages, the failure point is obvious. With a DAG, you must trace the dependency graph. Use the pipeline visualization in the GitLab UI to understand the actual execution flow. The changes rule is powerful but can be dangerous if your paths are too narrow—missing a critical file pattern can lead to skipping a necessary job.needs (DAG), parallel:matrix (fan-out), and rules:changes (skip). The goal is to minimize wall-clock time by maximizing concurrent work and eliminating unnecessary work. Profile your pipeline's critical path before adding more hardware.Essential .gitlab-ci.yml Keywords Reference Table
Mastering GitLab CI/CD begins with knowing the essential YAML keywords. This table provides a quick reference for the most commonly used keywords, their purpose, and usage notes. Use it as a cheat sheet when writing or reviewing your pipeline configuration.
| Keyword | Purpose | Usage Notes |
|---|---|---|
stages | Defines the global stage order for the pipeline. Jobs are grouped by stage. | All jobs in the same stage run in parallel unless dependencies constrain them. |
image | Specifies the Docker image for the job's execution environment. | Can be set globally via default or per job. Example: image: node:20-alpine. |
script | The shell commands executed by the job. | This is the only required keyword for a job. Multiple commands can be listed as an array. |
before_script | Commands executed before script in each job. | Useful for setting up environment variables, installing dependencies, or printing debug info. |
after_script | Commands executed after script, even if the job fails. | Often used for cleanup tasks (e.g., uploading logs, stopping services). |
cache | Reuses files across pipelines for speed. | Key should be based on a lock file (e.g., package-lock.json). Only cache downloaded dependencies, never build outputs. |
artifacts | Passes files between jobs within the same pipeline. | Use for build outputs, test reports, or any file a downstream job needs. Always set expire_in to manage storage. |
rules | Controls job execution based on conditions (branch, pipeline source, file changes, etc.). | Replaces the older only/except. Use if, changes, exists for flexible conditions. |
needs | Defines job dependencies in a DAG, allowing jobs to start before their stage is complete. | Essential for optimizing pipeline speed. Use needs: [] for immediate execution. |
parallel | Runs multiple instances of the same job concurrently. | Use with matrix to test across multiple configurations (e.g., Node versions, OS). |
environment | Tracks deployments and enables rollback. | Use name and url; on_stop for cleanup jobs like Review Apps. |
tags | Selects which runner executes the job. | Must match a runner's tag list. Without tags, the job can run on any runner without tags. |
variables | Defines custom CI/CD variables. | Can be set globally, per stage, per job, or in GitLab UI. Use for environment-specific config. |
default | Sets default values for all jobs (e.g., image, before_script). | Individual jobs can override these defaults. |
include | Imports external YAML files for modular pipelines. | Supports local files, remote URLs, and templates from GitLab. |
.gitlab-ci.yml. The most common mistakes come from forgetting the difference between cache and artifacts, or misusing rules without testing conditions. Bookmark this section for daily use.default keyword to avoid repetition. For example, if all jobs use the same image and before_script, define them once under default. This reduces the chance of drift between jobs and makes the pipeline file easier to maintain.GitLab Runner Registration & Tagging Guide
A GitLab Runner is the agent that executes your pipeline jobs. Without a runner, your .gitlab-ci.yml is just a file. Runners can run on any machine (bare metal, VM, container, Kubernetes) and are registered with your GitLab instance. Understanding how to register a runner and use tags is essential for controlling which jobs run where.
Registration process: 1. Install the Runner binary on your machine (or use the Docker image). 2. Run gitlab-runner register and provide your GitLab instance URL and a registration token (found in Settings > CI/CD > Runners). 3. Choose an executor (e.g., Docker, Shell, Kubernetes). For most projects, the Docker executor is recommended because it provides an isolated environment. 4. Specify default Docker image and tags.
Tags are the key mechanism to route jobs to specific runners. When you add tags: [\"aws\", \"gpu\"] to a job, only runners that have those exact tags will pick up that job. If no matching runner exists, the job stays pending indefinitely. Tags are also used to specify runner capabilities (e.g., docker, kubernetes, aws-prod).
Important: If a job has no tags keyword, it will be picked up by any runner that does NOT have tags defined. Runners with tags will ignore untagged jobs. To avoid confusion, either tag all your runners or leave some runners tagless for generic jobs.
Runner types: - Shared runners: Available to all projects in the GitLab instance (commonly used on GitLab SaaS). - Group runners: Available to all projects within a group. - Project-specific runners: Only available to a single project.
Use project-specific runners for sensitive deployments (production) to ensure no other project's jobs can use that runner.", "code": { "language": "bash", "filename": "runner_registration.sh", "code": "# ── 1. Install GitLab Runner (Ubuntu/Debian example) ──────────────────────── # See https://docs.gitlab.com/runner/install/ for other platforms curl -L \"https://packages.gitlab.com/install/repositories/runner/gitlab-runner/script.deb.sh\" | sudo bash sudo apt-get install gitlab-runner
# ── 2. Register the Runner ───────────────────────────────────────────────────── # You will be prompted for: # - GitLab instance URL (e.g., https://gitlab.com) # - Registration token (from GitLab UI: Settings > CI/CD > Runners) # - Description (e.g., \"aws-prod-runner\") # - Tags (comma-separated, e.g., \"aws,production\") # - Executor (e.g., \"docker\") # - Default Docker image (e.g., \"alpine:latest\") sudo gitlab-runner register
# ── Alternative: Non-interactive registration ──────────────────────────────── sudo gitlab-runner register \\ --non-interactive \\ --url \"https://gitlab.com\" \\ --registration-token \"YOUR_TOKEN\" \\ --description \"prod-runner\" \\ --tag-list \"aws,production\" \\ --executor \"docker\" \\ --docker-image \"alpine:latest\" \\ --docker-privileged # Only if needed (e.g., Docker-in-Docker)
# ── 3. Verify registration ──────────────────────────────────────────────────── sudo gitlab-runner verify
# ── 4. Using Tags in .gitlab-ci.yml ─────────────────────────────────────────── # Now in your pipeline, you can route jobs to this runner using the tags: # # deploy_production: # stage: deploy # tags: # - aws # - production # script: # - ./deploy.sh
# ── 5. Checking Runner Status ───────────────────────────────────────────────── sudo gitlab-runner status sudo gitlab-runner list
# ── 6. Stopping and Unregistering ───────────────────────────────────────────── sudo gitlab-runner unregister --name \"prod-runner\"", "output": "Registering runner... succeeded runner=abc123 Runner registered successfully. Feel free to start it, but if it's running already the config should be automatically reloaded!
Verifying runner... is alive runner=abc123 Runner \"prod-runner\" is alive and waiting to pick up jobs." }, "callout": { "type": "warning", "title": "Tagging Gotcha: Untagged Jobs vs Tagged Runners", "text": "If a job has no tags defined, GitLab will only assign it to runners that also have no tags. If all your runners have tags and your jobs don't, the jobs will remain pending forever. Always either tag your jobs or leave at least one runner tagless for general work. A common pattern is to have a 'default' runner without tags and specialized runners (e.g., 'aws', 'gpu') for specific jobs." }, "production_insight": "Tagging is also a security mechanism. For production deployments, use a runner with tags like 'production' and ensure only authorized pipelines can use it. Combine with protected branches and manual approvals for a robust deployment pipeline. Runner registration tokens should be rotated regularly; GitLab allows you to reset them in the UI.", "key_takeaway": "Runners are the execution engines of your pipeline. Tags are the routing keys that connect jobs to the appropriate infrastructure. Invest time in a clear tagging strategy early—it pays off when you scale to multiple environments and runner types." }, { "heading": "Advanced Environment Deployment Strategies: Canary, Blue-Green, and Rollback", "content": "Beyond basic environments and Review Apps, production-grade pipelines implement rollout strategies to minimize risk. GitLab's environment tracking supports these patterns through environment scopes, incremental deployments, and auto-rollback.
Canary Deployments: You deploy a new version to a small subset of users (the canary) and monitor for errors before rolling out to all users. In GitLab, you can achieve this by having two deployment jobs — one for the canary environment and one for production — with the canary job running first. If the canary fails health checks, the production job is skipped.
Blue-Green Deployments: Two identical environments (blue and green) run side-by-side. At any time, only one serves live traffic. A new version is deployed to the inactive environment, tested, then traffic is switched over. GitLab's environment tracking can help by noting which environment is currently active.
Rollback: GitLab's Environments page shows a history of all deployments with commit SHAs. You can roll back to any previous deployment with a single click, which triggers a new pipeline job that re-deploys the older artifact. For automated rollback, you can use the rollback action in a script that calls the GitLab API.
Environment Scopes: Use environment scopes to limit access to CI/CD variables. For instance, a DEPLOY_API_KEY variable can be scoped to production only, so the staging job cannot accidentally use it. This is set when creating the variable in GitLab UI.
Deployment Health: The environment:auto_stop_in and environment:action (start/stop/verify) help manage lifecycle. You can also add a verify job that runs after deployment to check that the application responds correctly. If verification fails, a retry or rollback job can be triggered automatically using GitLab's API or manual intervention.", "code": { "language": "yaml", "filename": ".gitlab-ci.yml", "code": "# ── CANARY + BLUE-GREEN DEPLOYMENT PATTERN ───────────────────────────────────
stages: - test - deploy-canary - verify-canary - deploy-production
variables: CURRENT_ENV: \"$CI_ENVIRONMENT_SLUG\"
# ── Shared test job ────────────────────────────────────────────────────────── run_tests: stage: test script: - npm test
# ── Canary Deployment (10% of traffic) ─────────────────────────────────────── deploy_canary: stage: deploy-canary script: - echo \"Deploying canary version $CI_COMMIT_SHORT_SHA\" - ./deploy.sh --canary # Deploys to canary infrastructure environment: name: canary url: https://canary.myapp.example.com on_stop: stop_canary rules: - if: '$CI_COMMIT_BRANCH == \"main\"' when: manual # Trigger canary deploy manually (or automatically)
# ── Canary Teardown ─────────────────────────────────────────────────────────── stop_canary: stage: deploy-canary # Same stage so it can be triggered later script: - ./teardown-canary.sh environment: name: canary action: stop rules: - if: '$CI_COMMIT_BRANCH == \"main\"' when: manual
# ── Health Verification ──────────────────────────────────────────────────────────── verify_canary: stage: verify-canary needs: [\"deploy_canary\"] script: - ./scripts/health-check.sh https://canary.myapp.example.com/health # If health check fails, the pipeline fails and production deploy does not run
# ── Production (full rollout) ──────────────────────────────────────────────── deploy_production: stage: deploy-production needs: [\"verify_canary\"] script: - echo \"Full production deploy of $CI_COMMIT_SHORT_SHA\" - ./deploy.sh --production environment: name: production url: https://myapp.example.com rules: - if: '$CI_COMMIT_BRANCH == \"main\"' when: manual
# ── Rollback via API (example job) ─────────────────────────────────────────── rollback_production: stage: deploy-production script: # Use GitLab API to deploy a previous artifact - curl --header \"PRIVATE-TOKEN: $CI_JOB_TOKEN\" \"$CI_API_V4_URL/projects/$CI_PROJECT_ID/jobs/artifacts/$CI_COMMIT_REF_NAME/download?job=build\" -o artifact.zip - ./deploy.sh --rollback rules: - if: '$CI_COMMIT_BRANCH == \"main\"' when: manual ", "output": "Pipeline #7201 — Main branch Stage: test ✅ run_tests Stage: deploy-canary ✅ deploy_canary (manual trigger) Stage: verify-canary ✅ verify_canary (health check passed) Stage: deploy-production ✅ deploy_production (manual trigger)
→ Production is now running commit a1b2c3d4
Note: canary environment is still running until manually stopped or auto-stopped." }, "callout": { "type": "tip", "title": "Production Insight: Automate Health Checks with Rollback", "text": "For a fully automated canary, add a script after verification that either promotes to production (if healthy) or initiates rollback (if unhealthy). You can use GitLab API to trigger the rollback job. This reduces manual toil and improves recovery time." }, "production_insight": "Environment scopes are critical when using multiple environments. If a variable is scoped to 'production', the deploy_canary job (environment: canary) cannot accidentally read it. This prevents secrets from leaking to less secure environments. Always scope sensitive variables to the minimum environment needed.", "key_takeaway": "Advanced deployment strategies (canary, blue-green) are built on GitLab's environment tracking and manual approvals. Use health checks to gate full rollouts. Rollback should be a first-class action, either manual via UI or automated through the API. Environment scopes enforce separation of secrets." } ]
The Silent Cache Poisoning: Deploying Stale Code to Production
build job was caching the compiled output directory (/dist). The cache key was based only on package-lock.json, not on the source code. Since dependencies hadn't changed, the cache was restored, overwriting the fresh build with a stale one. The artifact uploaded was the stale, cached version..gitlab-ci.yml: Move /dist from cache to artifacts only.
3. Implement a post-build verification step that compares the artifact's git SHA embedded in a version file against $CI_COMMIT_SHA.- Cache is for downloaded dependencies (node_modules, .m2), NEVER for build outputs your code generates.
- A passing pipeline doesn't guarantee a correct deployment if the artifact itself is wrong.
- Always embed and verify provenance metadata (git SHA, build timestamp) in your artifact.
ERROR: Job failed: exit code 1 and a generic script error.file not found for an expected artifact.expire_in – it may have expired. 3. If using DAG (needs), ensure the consuming job lists the producing job in its needs array. Stage order alone is insufficient with DAG.needs. 3. Look for jobs downloading large dependencies that should be cached. 4. Use parallel:matrix to split long test suites.Protected. If so, it's only available on protected branches (e.g., main). 2. Verify the variable is Masked if it's a secret (it shouldn't print in logs). 3. Ensure the runner has the correct permissions (e.g., AWS IAM role, SSH key) to access the deployment target.image to one that includes your tool, or install it in before_script.That's CI/CD. Mark it forged?
13 min read · try the examples if you haven't