Intermediate 13 min · March 06, 2026

GitLab Pipeline — Silent Cache Poisoning from Build Caching

Deployed hash mismatch? Cache key based only on package-lock.

N
Naren · Founder
Plain-English first. Then code. Then the interview question.
About
 ● Production Incident 🔎 Debug Guide
Quick Answer
  • You define stages (install, test, build, deploy) and jobs in a .gitlab-ci.yml file.
  • A GitLab Runner executes each job in a fresh container, ensuring consistency.
  • Artifacts pass files between jobs within one pipeline; cache speeds up future runs.
  • Pipelines are triggered by events (push, MR, schedule).
  • Jobs in the same stage run in parallel by default.
  • Use rules for conditional execution (branch, file changes).
  • Use needs to create a DAG and start jobs as soon as dependencies finish.
  • The isolation is the source of truth; it eliminates "works on my machine."
  • A failed pipeline blocks broken code from progressing, providing fast feedback.
  • Protected variables and branches form the security perimeter for secrets.
Plain-English First

Imagine you're baking a cake for a birthday party. Every time you bake, you follow the same steps: mix ingredients, bake, frost, then deliver. GitLab CI/CD is like hiring a robot baker that follows those exact steps automatically every time you change a recipe. You write the steps once in a file, and the robot handles the rest — mixing, baking, testing for taste, and delivering — without you lifting a finger. If the cake burns (your code breaks), the robot stops and tells you before the party (production) ever sees it.

Manual deployment processes are a liability. They're error-prone, undocumented, and create a feedback lag that lets bugs compound for days. GitLab CI/CD replaces tribal knowledge with a codified, automated pipeline that runs identically in an isolated environment every time.

The core value is reducing the cost of failure. A pipeline catches a regression in minutes, not days. This shifts the team's mindset from "avoid breaking things" to "ship with confidence." The .gitlab-ci.yml file becomes the single source of truth for your delivery process.

Common misconceptions include treating CI/CD as just running tests, or conflating caching with artifacts. Understanding the execution model, dependency management, and security boundaries is what separates a working pipeline from a production-grade one.

How GitLab Pipelines Actually Work (The Mental Model You Need)

Before writing a single line of YAML, you need the right mental model. A GitLab pipeline is a directed acyclic graph (DAG) of work. That's a fancy way of saying it's a series of jobs organized into stages, where each stage waits for the previous one to pass before running.

Here's the key hierarchy: a Pipeline contains Stages, and each Stage contains one or more Jobs. Jobs within the same stage run in parallel by default. Jobs in different stages run sequentially. A GitLab Runner — a separate process that can live on your own server or GitLab's shared infrastructure — picks up each job and executes it in an isolated environment, usually a Docker container.

Why does this matter? Because the isolation is what makes CI/CD trustworthy. Each job starts fresh, with no leftover state from previous jobs unless you explicitly pass artifacts or use caching. This means your tests can't accidentally pass because of something that only exists on one developer's machine. The pipeline environment is the single source of truth.

Every pipeline is triggered by an event: a git push, a merge request, a schedule, or a manual trigger. GitLab reads your .gitlab-ci.yml from the root of your repository and constructs the pipeline graph from it. If the file doesn't exist, no pipeline runs. If it has a syntax error, GitLab tells you immediately in the UI before anything executes.

.gitlab-ci.ymlYAML
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
# Define the order of stages — jobs in the same stage run in parallel
# Jobs in later stages only run if all jobs in prior stages pass
stages:
  - install      # Stage 1: Get dependencies ready
  - test         # Stage 2: Run all automated tests
  - build        # Stage 3: Compile/bundle the application
  - deploy       # Stage 4: Ship to the target environment

# A default block applies settings to ALL jobs unless a job overrides them
default:
  image: node:20-alpine  # Every job runs inside this Docker container
  before_script:
    - echo "Pipeline started for branch: $CI_COMMIT_BRANCH"

# ── STAGE: install ──────────────────────────────────────────────────────────
install_dependencies:
  stage: install
  script:
    - npm ci  # 'ci' is stricter than 'install' — uses package-lock.json exactly
  # Cache node_modules so later stages (and future pipelines) don't re-download
  cache:
    key:
      files:
        - package-lock.json   # Cache is invalidated only when lock file changes
    paths:
      - node_modules/
  artifacts:
    paths:
      - node_modules/         # Pass node_modules to downstream jobs in this pipeline
    expire_in: 1 hour         # Don't keep artifacts forever — saves storage

# ── STAGE: test ─────────────────────────────────────────────────────────────
run_unit_tests:
  stage: test
  script:
    - npm run test:unit -- --coverage  # Run unit tests and generate coverage report
  coverage: '/Lines\s*:\s*(\d+\.?\d*)%/'  # Regex to parse coverage % from output
  artifacts:
    reports:
      coverage_report:
        coverage_format: cobertura
        path: coverage/cobertura-coverage.xml  # Shown as coverage badge in GitLab UI

run_lint:
  stage: test   # Runs in PARALLEL with run_unit_tests — same stage
  script:
    - npm run lint  # Code style check — runs at the same time as unit tests

# ── STAGE: build ─────────────────────────────────────────────────────────────
build_production_bundle:
  stage: build
  script:
    - npm run build  # Creates optimised production assets in /dist
  artifacts:
    paths:
      - dist/         # Pass the compiled app to the deploy stage
    expire_in: 1 week

# ── STAGE: deploy ─────────────────────────────────────────────────────────────
deploy_to_production:
  stage: deploy
  script:
    - echo "Deploying commit $CI_COMMIT_SHA to production..."
    - ./scripts/deploy.sh  # Your actual deployment script
  environment:
    name: production
    url: https://myapp.example.com
  # CRITICAL: Only deploy automatically from the main branch
  # All other branches can run tests and build, but NOT deploy
  rules:
    - if: '$CI_COMMIT_BRANCH == "main"'
      when: on_success   # Deploy automatically if all prior stages passed
    - when: never        # For any other branch, skip this job entirely
Output
Pipeline #4821 triggered by push to main
Stage: install
✅ install_dependencies (42s)
Stage: test
✅ run_unit_tests (1m 12s) — Coverage: 87.4%
✅ run_lint (18s)
Stage: build
✅ build_production_bundle (34s)
Stage: deploy
✅ deploy_to_production (1m 03s)
Pipeline passed in 3m 49s
Pro Tip: Use `rules` Not `only/except`
The older only and except keywords are still documented but GitLab considers rules the modern replacement. rules supports complex conditional logic (if/when/changes/exists) in a single block, whereas only/except requires splitting conditions awkwardly across two keys. New pipelines should always use rules.
Production Insight
The isolation of each job is a double-edged sword. While it guarantees a clean environment, it also means any setup (tool installation, dependency download) must be repeated or explicitly cached/artifacted. The default and before_script blocks are crucial for DRY (Don't Repeat Yourself) configuration, reducing the surface area for drift between jobs.
Key Takeaway
A pipeline is a staged, event-driven sequence of isolated jobs. Trust comes from this isolation, but speed requires explicit state passing (artifacts/cache). The .gitlab-ci.yml file is the executable specification of your delivery process.

Caching vs Artifacts: The Distinction That Changes Pipeline Speed

This is the most misunderstood concept in GitLab CI/CD, and getting it wrong will either break your pipeline or make it painfully slow. They look similar but serve completely different purposes.

Artifacts are files that jobs pass downstream within the same pipeline. When your build job creates a /dist folder, the deploy job needs that folder. You declare it as an artifact and GitLab uploads it to its object storage, then downloads it automatically for any downstream job that needs it. Artifacts are precise, pipeline-scoped, and short-lived.

Cache is a performance optimisation that persists across multiple pipelines. Your node_modules folder takes 45 seconds to download every run. Cache it with a key tied to your package-lock.json, and subsequent pipelines skip the download entirely unless your dependencies change. Cache is best-effort — GitLab can evict it, and you should never rely on it for correctness.

The mental model: artifacts are for passing work between jobs in a pipeline (correctness), cache is for skipping repeated work across pipelines (speed). If your deploy job can't find the built files, you have an artifact problem. If your pipeline is unnecessarily slow, you have a cache problem. These are never interchangeable.

.gitlab-ci.ymlYAML
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
# ── DEMONSTRATING THE DIFFERENCE BETWEEN CACHE AND ARTIFACTS ────────────────

stages:
  - dependencies
  - test
  - package

install_python_packages:
  stage: dependencies
  image: python:3.12-slim
  script:
    - pip install -r requirements.txt --target=.packages
  cache:
    # Cache key is a hash of requirements.txt
    # The cache is ONLY invalidated when requirements.txt changes
    # This saves ~30-60s on pipelines where nothing changed
    key:
      files:
        - requirements.txt
    paths:
      - .packages/     # Cached across pipelines for speed
  artifacts:
    paths:
      - .packages/     # Also an artifact so test stage can USE these packages
    expire_in: 2 hours

run_pytest:
  stage: test
  image: python:3.12-slim
  script:
    # PYTHONPATH tells Python where to find the packages installed above
    - export PYTHONPATH="$CI_PROJECT_DIR/.packages:$PYTHONPATH"
    - python -m pytest tests/ -v --junitxml=report.xml
  artifacts:
    # Test reports are artifacts — GitLab reads them to display pass/fail in MR UI
    reports:
      junit: report.xml   # Displays individual test results in the merge request
    when: always          # Upload report EVEN if tests fail — you need the evidence

create_deployment_package:
  stage: package
  image: python:3.12-slim
  script:
    - zip -r deployment.zip src/ .packages/ config/
    - echo "Package size: $(du -sh deployment.zip | cut -f1)"
  artifacts:
    name: "app-package-$CI_COMMIT_SHORT_SHA"  # Dynamic name includes commit hash
    paths:
      - deployment.zip
    expire_in: 1 week   # Keep for a week so you can re-deploy without re-building
    # Note: NO cache here — the zip file is a one-off per pipeline, not reusable
Output
install_python_packages:
Checking cache... HIT (key: abc123def456)
Restoring cache from .packages/ (saved 34s)
Running: pip install -r requirements.txt --target=.packages
Requirements already satisfied (cache hit)
Uploading artifacts: .packages/ (12.4 MB)
run_pytest:
Downloading artifacts from install_python_packages...
Running: python -m pytest tests/ -v
========================= 47 passed in 8.31s =========================
Uploading test report: report.xml
create_deployment_package:
Package size: 14M
Uploading artifact: app-package-f3a9c21.zip
Watch Out: Never Cache Build Outputs
Caching your compiled /dist or build output is a trap. If the cache key doesn't change but your code did, you'll deploy stale code and spend hours debugging why your changes aren't showing up. Only cache things that are downloaded or generated from a lock file — never cache things your own code produces. Build outputs belong in artifacts, period.
Production Insight
Cache is a probabilistic optimization, not a guarantee. GitLab may evict your cache under storage pressure. Therefore, your pipeline must still work correctly (though slower) with a cache miss. This is why the install job must always run npm ci or pip install even on a cache hit—to verify integrity and install any missing peer dependencies.
Key Takeaway
Artifacts ensure correctness within a pipeline by passing exact files between jobs. Cache improves speed across pipelines by skipping redundant work. Confusing them leads to either broken deployments or wasted time. Rule of thumb: cache downloaded dependencies, artifact your built outputs.
Choosing Between Cache and Artifact
IfA file is needed by a job in a LATER STAGE of the SAME pipeline.
UseUse artifacts.
IfA file is expensive to create/download and can be reused ACROSS DIFFERENT pipelines.
UseUse cache with a key based on a lock file.
IfA file is BOTH needed downstream AND reusable across pipelines (e.g., node_modules).
UseUse BOTH cache (for cross-pipeline speed) and artifacts (for intra-pipeline correctness).
IfA file is a build output (e.g., /dist, .jar, binary).
UseUse artifacts ONLY. Never cache.

Environment-Based Deployments with Review Apps and Protected Branches

A mature CI/CD pipeline doesn't just have one deployment target. Real projects deploy to multiple environments: feature branches might spin up temporary 'review apps', merges to develop deploy to staging, and only merges to main reach production. This isn't complexity for its own sake — it's the safety net that lets teams ship fast without breaking things.

GitLab's environment keyword is what makes this elegant. When you define an environment in a job, GitLab tracks which pipeline version is running where. You can see at a glance in the GitLab UI that production is running commit f3a9c21 while staging has b7e1d04. You can also roll back to a previous deployment with one click directly from the Environments page.

Review Apps take this further. For every merge request, GitLab can automatically spin up a live, isolated environment just for that feature branch — complete with a unique URL. Product managers and designers can preview changes before they're merged. No more 'can you deploy this branch so I can see it?' conversations.

Protected branches add the security layer. When main is a protected branch, only Maintainers can push to it directly, and only pipelines triggered from protected branches can access protected CI/CD variables (like production API keys). This prevents a developer from accidentally deploying untested code to production.

.gitlab-ci.ymlYAML
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
# ── MULTI-ENVIRONMENT DEPLOYMENT PIPELINE ────────────────────────────────────
# This demonstrates: review apps, staging, and production with proper guards

stages:
  - test
  - deploy

variables:
  # These non-sensitive defaults can live in the YAML
  STAGING_URL: "https://staging.myapp.example.com"
  PRODUCTION_URL: "https://myapp.example.com"
  # DEPLOY_SSH_KEY and PRODUCTION_API_KEY are set in
  # GitLab Settings > CI/CD > Variables (masked + protected)

# ── Shared test job (runs for ALL branches) ───────────────────────────────────
run_all_tests:
  stage: test
  image: node:20-alpine
  script:
    - npm ci
    - npm test

# ── REVIEW APP: Deploys for every Merge Request ───────────────────────────────
deploy_review_app:
  stage: deploy
  image: alpine:latest
  script:
    # CI_ENVIRONMENT_SLUG is auto-generated from the environment name
    # e.g., environment name "review/fix-login-bug" becomes slug "review-fix-login-bug"
    - echo "Deploying review app for MR: $CI_MERGE_REQUEST_IID"
    - apk add --no-cache openssh-client rsync
    - ./scripts/deploy-review.sh $CI_ENVIRONMENT_SLUG  # Your deploy script
  environment:
    name: review/$CI_COMMIT_REF_SLUG   # Creates a unique environment per branch
    url: https://$CI_ENVIRONMENT_SLUG.review.myapp.example.com
    on_stop: teardown_review_app       # Tell GitLab which job cleans this up
  rules:
    - if: '$CI_PIPELINE_SOURCE == "merge_request_event"'  # Only runs for MRs

# ── Teardown job: Runs when MR is closed or merged ───────────────────────────
teardown_review_app:
  stage: deploy
  image: alpine:latest
  script:
    - echo "Tearing down review app: $CI_ENVIRONMENT_SLUG"
    - ./scripts/teardown-review.sh $CI_ENVIRONMENT_SLUG
  environment:
    name: review/$CI_COMMIT_REF_SLUG
    action: stop   # This is what links it to the on_stop in deploy_review_app
  rules:
    - if: '$CI_PIPELINE_SOURCE == "merge_request_event"'
      when: manual  # Triggered manually OR automatically when MR closes

# ── STAGING: Deploys automatically when develop branch is updated ─────────────
deploy_to_staging:
  stage: deploy
  image: alpine:latest
  script:
    - echo "Deploying $CI_COMMIT_SHORT_SHA to staging..."
    - ./scripts/deploy.sh staging
  environment:
    name: staging
    url: $STAGING_URL
  rules:
    - if: '$CI_COMMIT_BRANCH == "develop"'
      when: on_success

# ── PRODUCTION: Requires manual approval — never deploys automatically ─────────
deploy_to_production:
  stage: deploy
  image: alpine:latest
  script:
    - echo "Deploying $CI_COMMIT_SHORT_SHA to PRODUCTION"
    - ./scripts/deploy.sh production
  environment:
    name: production
    url: $PRODUCTION_URL
  rules:
    - if: '$CI_COMMIT_BRANCH == "main"'
      when: manual        # A human must click 'Run' in the GitLab UI to proceed
      allow_failure: false  # If this job fails, the pipeline is marked failed
Output
Pipeline #5102 — Branch: feature/login-redesign (MR !47)
Stage: test
✅ run_all_tests (55s)
Stage: deploy
✅ deploy_review_app (28s)
Environment: review/feature-login-redesign
URL: https://feature-login-redesign.review.myapp.example.com
⏸ teardown_review_app (manual — runs when MR closes)
---
Pipeline #5108 — Branch: main
Stage: test
✅ run_all_tests (55s)
Stage: deploy
⏸ deploy_to_production (manual approval required)
Click ▶ in GitLab UI to deploy to production
Interview Gold: Why Manual Production Deploys?
Interviewers love asking why you'd use when: manual on a production deploy if CI/CD is supposed to be automated. The answer: full automation (CD) makes sense when your test suite is comprehensive and your rollback is instant. For most teams, a human checkpoint before production is the right trade-off — you get automated testing and building, but a deliberate human decision to ship. This is Continuous Delivery (stop before prod) vs Continuous Deployment (fully automated to prod).
Production Insight
The environment keyword is more than a label; it's a state tracker. It enables one-click rollbacks by maintaining a deployment history. The on_stop job is critical for Review Apps to prevent resource leaks. Without it, every MR would leave behind a running container, burning infrastructure costs. Protected variables are the linchpin of security: they ensure production credentials are only exposed to pipelines that have passed the gate of a protected branch.
Key Takeaway
Environment progression (review -> staging -> production) with branch-based gates is how you enable velocity without sacrificing stability. Protected branches and variables form a security boundary that is architectural, not just policy-based. The environment key unlocks observability and rollback capabilities across all targets.
Deployment Strategy by Branch
IfFeature branch pushed (no MR).
UseRun tests only. No deployment.
IfMerge Request created/updated.
UseRun tests + deploy a ephemeral Review App for preview.
IfMerge to develop branch.
UseRun tests + automatically deploy to Staging.
IfMerge to main branch.
UseRun tests + require manual approval to deploy to Production.

Pipeline Optimization: Parallelism, DAG, and Cutting Run Times in Half

Once your pipeline is working correctly, the next battle is speed. A 20-minute pipeline that runs on every commit destroys developer flow. The good news is that most slow pipelines have structural problems, not hardware problems, and they're fixable in YAML.

The first tool is the needs keyword, which unlocks GitLab's DAG (Directed Acyclic Graph) mode. By default, all jobs in stage 2 wait for ALL jobs in stage 1 to finish. With needs, a specific job can start the moment its direct dependencies finish — regardless of what stage it's in. If your build_api job doesn't depend on run_e2e_tests, why should it wait for it?

The second tool is parallel:matrix, which lets you run the same job multiple times with different variables simultaneously. Instead of running tests for Node 18, then Node 20, then Node 22 sequentially, you run all three at the same time. What was a 9-minute sequential test suite becomes a 3-minute parallel one.

The third tool is job-level rules with changes. If a push only touches markdown files in /docs, there's no reason to rebuild your entire application. The changes rule checks which files changed and skips jobs that don't need to run. Used aggressively, this can skip 70% of your pipeline on documentation-only commits.

.gitlab-ci.ymlYAML
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
# ── OPTIMISED PIPELINE USING DAG + PARALLEL MATRIX + CHANGE DETECTION ────────

stages:
  - install
  - test          # In default mode, everything here waits for install to finish
  - build         # In default mode, everything here waits for ALL tests to pass
  - deploy

install_node_modules:
  stage: install
  image: node:20-alpine
  script:
    - npm ci
  cache:
    key:
      files: [package-lock.json]
    paths: [node_modules/]
  artifacts:
    paths: [node_modules/]
    expire_in: 1 hour

# ── PARALLEL MATRIX: Runs 3 simultaneous jobs instead of 3 sequential ones ───
test_across_node_versions:
  stage: test
  # 'needs' tells GitLab: start me as soon as install_node_modules passes
  # Don't wait for other jobs in the install stage that don't affect me
  needs: ["install_node_modules"]
  parallel:
    matrix:
      # GitLab spins up 3 separate jobs, one per entry — all run at the same time
      - NODE_VERSION: "18"
      - NODE_VERSION: "20"
      - NODE_VERSION: "22"
  image: node:${NODE_VERSION}-alpine  # Each job uses its own Node version
  script:
    - echo "Testing on Node $NODE_VERSION"
    - npm test
    - echo "Node $NODE_VERSION — PASSED"

# ── CHANGE-BASED SKIPPING: Only rebuild if source code actually changed ───────
build_docker_image:
  stage: build
  image: docker:24
  services:
    - docker:24-dind  # Docker-in-Docker: lets you build Docker images inside CI
  needs:
    # DAG: start building as soon as tests pass — don't wait for other build jobs
    - job: test_across_node_versions
  script:
    - docker build -t myapp:$CI_COMMIT_SHORT_SHA .
    - docker push myregistry.example.com/myapp:$CI_COMMIT_SHORT_SHA
  rules:
    - if: '$CI_COMMIT_BRANCH == "main"'
      changes:
        # Only run this job if one of these paths changed
        # A docs-only push? This job is SKIPPED entirely
        - src/**/*
        - Dockerfile
        - package.json
        - package-lock.json

# ── VISUALISING THE DAG EFFECT ────────────────────────────────────────────────
# WITHOUT needs (default stage ordering):
#   install(42s) → test_node18(3m) → test_node20(3m) → test_node22(3m) → build(2m)
#   Total: ~11 minutes sequential
#
# WITH needs + parallel matrix:
#   install(42s) → [test_node18 + test_node20 + test_node22](3m parallel) → build(2m)
#   Total: ~6 minutes  ← Almost 2x faster with zero new hardware
Output
Pipeline #6033 — Optimised with DAG + Parallel Matrix
Stage: install
✅ install_node_modules (42s)
Stage: test [parallel — all started immediately after install]
✅ test_across_node_versions: Node 18 (2m 48s)
✅ test_across_node_versions: Node 20 (2m 55s)
✅ test_across_node_versions: Node 22 (3m 02s)
← All 3 ran simultaneously. Wall-clock time: 3m 02s
Stage: build
✅ build_docker_image (1m 54s)
Total pipeline time: 6m 18s
(vs ~11 minutes with default sequential execution)
Pro Tip: Use `needs: []` to Jump Straight to Stage 1
If a job has no dependencies at all — like a quick security scan or a markdown linter — give it needs: [] (an empty array). This tells GitLab to run it immediately when the pipeline starts, without waiting for any previous stage. It's the fastest possible job scheduling and works great for fast, independent checks.
Production Insight
DAG optimization (needs) is the single biggest lever for pipeline speed, but it introduces complexity in debugging. With sequential stages, the failure point is obvious. With a DAG, you must trace the dependency graph. Use the pipeline visualization in the GitLab UI to understand the actual execution flow. The changes rule is powerful but can be dangerous if your paths are too narrow—missing a critical file pattern can lead to skipping a necessary job.
Key Takeaway
Speed is a structural problem solved by needs (DAG), parallel:matrix (fan-out), and rules:changes (skip). The goal is to minimize wall-clock time by maximizing concurrent work and eliminating unnecessary work. Profile your pipeline's critical path before adding more hardware.

Essential .gitlab-ci.yml Keywords Reference Table

Mastering GitLab CI/CD begins with knowing the essential YAML keywords. This table provides a quick reference for the most commonly used keywords, their purpose, and usage notes. Use it as a cheat sheet when writing or reviewing your pipeline configuration.

KeywordPurposeUsage Notes
stagesDefines the global stage order for the pipeline. Jobs are grouped by stage.All jobs in the same stage run in parallel unless dependencies constrain them.
imageSpecifies the Docker image for the job's execution environment.Can be set globally via default or per job. Example: image: node:20-alpine.
scriptThe shell commands executed by the job.This is the only required keyword for a job. Multiple commands can be listed as an array.
before_scriptCommands executed before script in each job.Useful for setting up environment variables, installing dependencies, or printing debug info.
after_scriptCommands executed after script, even if the job fails.Often used for cleanup tasks (e.g., uploading logs, stopping services).
cacheReuses files across pipelines for speed.Key should be based on a lock file (e.g., package-lock.json). Only cache downloaded dependencies, never build outputs.
artifactsPasses files between jobs within the same pipeline.Use for build outputs, test reports, or any file a downstream job needs. Always set expire_in to manage storage.
rulesControls job execution based on conditions (branch, pipeline source, file changes, etc.).Replaces the older only/except. Use if, changes, exists for flexible conditions.
needsDefines job dependencies in a DAG, allowing jobs to start before their stage is complete.Essential for optimizing pipeline speed. Use needs: [] for immediate execution.
parallelRuns multiple instances of the same job concurrently.Use with matrix to test across multiple configurations (e.g., Node versions, OS).
environmentTracks deployments and enables rollback.Use name and url; on_stop for cleanup jobs like Review Apps.
tagsSelects which runner executes the job.Must match a runner's tag list. Without tags, the job can run on any runner without tags.
variablesDefines custom CI/CD variables.Can be set globally, per stage, per job, or in GitLab UI. Use for environment-specific config.
defaultSets default values for all jobs (e.g., image, before_script).Individual jobs can override these defaults.
includeImports external YAML files for modular pipelines.Supports local files, remote URLs, and templates from GitLab.
Use This Table as a Quick Reference
Keep this table handy when writing your .gitlab-ci.yml. The most common mistakes come from forgetting the difference between cache and artifacts, or misusing rules without testing conditions. Bookmark this section for daily use.
Production Insight
A well-structured pipeline uses the default keyword to avoid repetition. For example, if all jobs use the same image and before_script, define them once under default. This reduces the chance of drift between jobs and makes the pipeline file easier to maintain.
Key Takeaway
Knowing these keywords by heart allows you to write concise, correct, and efficient CI configurations. Use this table as your cheat sheet until the syntax becomes second nature.

GitLab Runner Registration & Tagging Guide

A GitLab Runner is the agent that executes your pipeline jobs. Without a runner, your .gitlab-ci.yml is just a file. Runners can run on any machine (bare metal, VM, container, Kubernetes) and are registered with your GitLab instance. Understanding how to register a runner and use tags is essential for controlling which jobs run where.

Registration process: 1. Install the Runner binary on your machine (or use the Docker image). 2. Run gitlab-runner register and provide your GitLab instance URL and a registration token (found in Settings > CI/CD > Runners). 3. Choose an executor (e.g., Docker, Shell, Kubernetes). For most projects, the Docker executor is recommended because it provides an isolated environment. 4. Specify default Docker image and tags.

Tags are the key mechanism to route jobs to specific runners. When you add tags: [\"aws\", \"gpu\"] to a job, only runners that have those exact tags will pick up that job. If no matching runner exists, the job stays pending indefinitely. Tags are also used to specify runner capabilities (e.g., docker, kubernetes, aws-prod).

Important: If a job has no tags keyword, it will be picked up by any runner that does NOT have tags defined. Runners with tags will ignore untagged jobs. To avoid confusion, either tag all your runners or leave some runners tagless for generic jobs.

Runner types: - Shared runners: Available to all projects in the GitLab instance (commonly used on GitLab SaaS). - Group runners: Available to all projects within a group. - Project-specific runners: Only available to a single project.

Use project-specific runners for sensitive deployments (production) to ensure no other project's jobs can use that runner.", "code": { "language": "bash", "filename": "runner_registration.sh", "code": "# ── 1. Install GitLab Runner (Ubuntu/Debian example) ──────────────────────── # See https://docs.gitlab.com/runner/install/ for other platforms curl -L \"https://packages.gitlab.com/install/repositories/runner/gitlab-runner/script.deb.sh\" | sudo bash sudo apt-get install gitlab-runner

# ── 2. Register the Runner ───────────────────────────────────────────────────── # You will be prompted for: # - GitLab instance URL (e.g., https://gitlab.com) # - Registration token (from GitLab UI: Settings > CI/CD > Runners) # - Description (e.g., \"aws-prod-runner\") # - Tags (comma-separated, e.g., \"aws,production\") # - Executor (e.g., \"docker\") # - Default Docker image (e.g., \"alpine:latest\") sudo gitlab-runner register

# ── Alternative: Non-interactive registration ──────────────────────────────── sudo gitlab-runner register \\ --non-interactive \\ --url \"https://gitlab.com\" \\ --registration-token \"YOUR_TOKEN\" \\ --description \"prod-runner\" \\ --tag-list \"aws,production\" \\ --executor \"docker\" \\ --docker-image \"alpine:latest\" \\ --docker-privileged # Only if needed (e.g., Docker-in-Docker)

# ── 3. Verify registration ──────────────────────────────────────────────────── sudo gitlab-runner verify

# ── 4. Using Tags in .gitlab-ci.yml ─────────────────────────────────────────── # Now in your pipeline, you can route jobs to this runner using the tags: # # deploy_production: # stage: deploy # tags: # - aws # - production # script: # - ./deploy.sh

# ── 5. Checking Runner Status ───────────────────────────────────────────────── sudo gitlab-runner status sudo gitlab-runner list

# ── 6. Stopping and Unregistering ───────────────────────────────────────────── sudo gitlab-runner unregister --name \"prod-runner\"", "output": "Registering runner... succeeded runner=abc123 Runner registered successfully. Feel free to start it, but if it's running already the config should be automatically reloaded!

Verifying runner... is alive runner=abc123 Runner \"prod-runner\" is alive and waiting to pick up jobs." }, "callout": { "type": "warning", "title": "Tagging Gotcha: Untagged Jobs vs Tagged Runners", "text": "If a job has no tags defined, GitLab will only assign it to runners that also have no tags. If all your runners have tags and your jobs don't, the jobs will remain pending forever. Always either tag your jobs or leave at least one runner tagless for general work. A common pattern is to have a 'default' runner without tags and specialized runners (e.g., 'aws', 'gpu') for specific jobs." }, "production_insight": "Tagging is also a security mechanism. For production deployments, use a runner with tags like 'production' and ensure only authorized pipelines can use it. Combine with protected branches and manual approvals for a robust deployment pipeline. Runner registration tokens should be rotated regularly; GitLab allows you to reset them in the UI.", "key_takeaway": "Runners are the execution engines of your pipeline. Tags are the routing keys that connect jobs to the appropriate infrastructure. Invest time in a clear tagging strategy early—it pays off when you scale to multiple environments and runner types." }, { "heading": "Advanced Environment Deployment Strategies: Canary, Blue-Green, and Rollback", "content": "Beyond basic environments and Review Apps, production-grade pipelines implement rollout strategies to minimize risk. GitLab's environment tracking supports these patterns through environment scopes, incremental deployments, and auto-rollback.

Canary Deployments: You deploy a new version to a small subset of users (the canary) and monitor for errors before rolling out to all users. In GitLab, you can achieve this by having two deployment jobs — one for the canary environment and one for production — with the canary job running first. If the canary fails health checks, the production job is skipped.

Blue-Green Deployments: Two identical environments (blue and green) run side-by-side. At any time, only one serves live traffic. A new version is deployed to the inactive environment, tested, then traffic is switched over. GitLab's environment tracking can help by noting which environment is currently active.

Rollback: GitLab's Environments page shows a history of all deployments with commit SHAs. You can roll back to any previous deployment with a single click, which triggers a new pipeline job that re-deploys the older artifact. For automated rollback, you can use the rollback action in a script that calls the GitLab API.

Environment Scopes: Use environment scopes to limit access to CI/CD variables. For instance, a DEPLOY_API_KEY variable can be scoped to production only, so the staging job cannot accidentally use it. This is set when creating the variable in GitLab UI.

Deployment Health: The environment:auto_stop_in and environment:action (start/stop/verify) help manage lifecycle. You can also add a verify job that runs after deployment to check that the application responds correctly. If verification fails, a retry or rollback job can be triggered automatically using GitLab's API or manual intervention.", "code": { "language": "yaml", "filename": ".gitlab-ci.yml", "code": "# ── CANARY + BLUE-GREEN DEPLOYMENT PATTERN ───────────────────────────────────

stages: - test - deploy-canary - verify-canary - deploy-production

variables: CURRENT_ENV: \"$CI_ENVIRONMENT_SLUG\"

# ── Shared test job ────────────────────────────────────────────────────────── run_tests: stage: test script: - npm test

# ── Canary Deployment (10% of traffic) ─────────────────────────────────────── deploy_canary: stage: deploy-canary script: - echo \"Deploying canary version $CI_COMMIT_SHORT_SHA\" - ./deploy.sh --canary # Deploys to canary infrastructure environment: name: canary url: https://canary.myapp.example.com on_stop: stop_canary rules: - if: '$CI_COMMIT_BRANCH == \"main\"' when: manual # Trigger canary deploy manually (or automatically)

# ── Canary Teardown ─────────────────────────────────────────────────────────── stop_canary: stage: deploy-canary # Same stage so it can be triggered later script: - ./teardown-canary.sh environment: name: canary action: stop rules: - if: '$CI_COMMIT_BRANCH == \"main\"' when: manual

# ── Health Verification ──────────────────────────────────────────────────────────── verify_canary: stage: verify-canary needs: [\"deploy_canary\"] script: - ./scripts/health-check.sh https://canary.myapp.example.com/health # If health check fails, the pipeline fails and production deploy does not run

# ── Production (full rollout) ──────────────────────────────────────────────── deploy_production: stage: deploy-production needs: [\"verify_canary\"] script: - echo \"Full production deploy of $CI_COMMIT_SHORT_SHA\" - ./deploy.sh --production environment: name: production url: https://myapp.example.com rules: - if: '$CI_COMMIT_BRANCH == \"main\"' when: manual

# ── Rollback via API (example job) ─────────────────────────────────────────── rollback_production: stage: deploy-production script: # Use GitLab API to deploy a previous artifact - curl --header \"PRIVATE-TOKEN: $CI_JOB_TOKEN\" \"$CI_API_V4_URL/projects/$CI_PROJECT_ID/jobs/artifacts/$CI_COMMIT_REF_NAME/download?job=build\" -o artifact.zip - ./deploy.sh --rollback rules: - if: '$CI_COMMIT_BRANCH == \"main\"' when: manual ", "output": "Pipeline #7201 — Main branch Stage: test ✅ run_tests Stage: deploy-canary ✅ deploy_canary (manual trigger) Stage: verify-canary ✅ verify_canary (health check passed) Stage: deploy-production ✅ deploy_production (manual trigger)

→ Production is now running commit a1b2c3d4

Note: canary environment is still running until manually stopped or auto-stopped." }, "callout": { "type": "tip", "title": "Production Insight: Automate Health Checks with Rollback", "text": "For a fully automated canary, add a script after verification that either promotes to production (if healthy) or initiates rollback (if unhealthy). You can use GitLab API to trigger the rollback job. This reduces manual toil and improves recovery time." }, "production_insight": "Environment scopes are critical when using multiple environments. If a variable is scoped to 'production', the deploy_canary job (environment: canary) cannot accidentally read it. This prevents secrets from leaking to less secure environments. Always scope sensitive variables to the minimum environment needed.", "key_takeaway": "Advanced deployment strategies (canary, blue-green) are built on GitLab's environment tracking and manual approvals. Use health checks to gate full rollouts. Rollback should be a first-class action, either manual via UI or automated through the API. Environment scopes enforce separation of secrets." } ]

● Production incidentPOST-MORTEMseverity: high

The Silent Cache Poisoning: Deploying Stale Code to Production

Symptom
Users report the security vulnerability is still exploitable after a successful deployment. The deployed application's hash does not match the expected build artifact.
Assumption
The deployment script is faulty, or the load balancer is pointing to an old instance.
Root cause
The build job was caching the compiled output directory (/dist). The cache key was based only on package-lock.json, not on the source code. Since dependencies hadn't changed, the cache was restored, overwriting the fresh build with a stale one. The artifact uploaded was the stale, cached version.
Fix
1. Immediately invalidate the cache by changing the key or manually clearing it in GitLab UI. 2. Fix the .gitlab-ci.yml: Move /dist from cache to artifacts only. 3. Implement a post-build verification step that compares the artifact's git SHA embedded in a version file against $CI_COMMIT_SHA.
Key lesson
  • Cache is for downloaded dependencies (node_modules, .m2), NEVER for build outputs your code generates.
  • A passing pipeline doesn't guarantee a correct deployment if the artifact itself is wrong.
  • Always embed and verify provenance metadata (git SHA, build timestamp) in your artifact.
Production debug guideA systematic approach from symptom to root cause.4 entries
Symptom · 01
Job fails with ERROR: Job failed: exit code 1 and a generic script error.
Fix
1. Expand the job log. 2. Look for the specific command that failed (usually the last one). 3. Check the command's output, not just the exit code. 4. Common causes: missing environment variable, file not found (artifact dependency missing), permission denied, or a genuine test/build failure.
Symptom · 02
Downstream job fails with file not found for an expected artifact.
Fix
1. Verify the upstream job that should produce the artifact actually passed. 2. Check the artifact's expire_in – it may have expired. 3. If using DAG (needs), ensure the consuming job lists the producing job in its needs array. Stage order alone is insufficient with DAG.
Symptom · 03
Pipeline is unexpectedly slow.
Fix
1. Check the pipeline view for the critical path (longest chain of dependent jobs). 2. Identify jobs that run sequentially but could run in parallel using needs. 3. Look for jobs downloading large dependencies that should be cached. 4. Use parallel:matrix to split long test suites.
Symptom · 04
Deploy job fails, complaining about missing secrets or permissions.
Fix
1. Check if the variable is defined as Protected. If so, it's only available on protected branches (e.g., main). 2. Verify the variable is Masked if it's a secret (it shouldn't print in logs). 3. Ensure the runner has the correct permissions (e.g., AWS IAM role, SSH key) to access the deployment target.
★ GitLab CI/CD Triage Cheat SheetFirst-response commands and checks for common pipeline issues.
Job fails immediately with `command not found`.
Immediate action
Check the job's Docker `image` – does it contain the required binary?
Commands
Look at the job log's first line: `Using Docker image sha256:...`
Test locally: `docker run -it <image> sh -c 'which <command>'`
Fix now
Change the image to one that includes your tool, or install it in before_script.
Tests pass locally but fail in CI.+
Immediate action
Environment difference. Check for missing env vars, different OS, or timing issues.
Commands
Add `env` to your job's `script` to dump all environment variables.
Use `docker run` locally with the CI image to replicate the environment.
Fix now
Externalize configuration. Use CI/CD variables for environment-specific settings.
Pipeline is stuck on `pending` or `preparing`.+
Immediate action
Runner capacity or configuration issue.
Commands
GitLab Admin Area → CI/CD → Runners. Check active runners and their `Tag` list.
Check runner host: `gitlab-runner verify` and `gitlab-runner --debug run`.
Fix now
Ensure your job's tags match a registered, online runner. Check concurrent job limits.
🔥

That's CI/CD. Mark it forged?

13 min read · try the examples if you haven't

Previous
Jenkins Tutorial
4 / 14 · CI/CD
Next
CI/CD Pipeline Best Practices