Advanced 5 min · March 06, 2026

ArgoCD GitOps — Auto-Heal Reverted Scale-Down 3× Mid-Outage

Q: Can I use ArgoCD with Helm?

Yes. ArgoCD has native Helm support. Configure `spec.source.helm` with valueFiles, parameters, or a values file. The repo server runs `helm template` internally. You can also point to an OCI registry for Helm charts. One nuance: ArgoCD doesn't run Helm's `post-install` hooks; use ArgoCD's own hook system instead.

Q: How do I roll back a deployment in ArgoCD?

Rollback in GitOps is reverting the Git commit (or tag) and letting ArgoCD sync. `git revert ` and push. ArgoCD will sync the previous state. This is different from traditional CD where you revert by redeploying a previous artifact; here Git history is the source of truth. For emergency rollback, you can also change `spec.source.targetRevision` to a previous tag or commit SHA directly in the Application CRD (though this bypasses Git and is not recommended outside emergencies).

Q: Can ArgoCD manage CRDs that are installed by another operator?

Yes, but ordering matters. Use sync waves: set wave -1 on the CRD (or -2 for core CRDs), wave 0 on custom resources. If the CRD is installed by a separate Application (e.g., cert-manager), ensure that Application syncs first using the `syncWave` annotation on the Application itself or by declaring a dependency via `spec.syncPolicy.syncOptions: Replace=true`. If two Applications depend on each other's CRDs, you have a circular dependency — refactor.

Q: How does ArgoCD compare to Flux CD?

Both are GitOps controllers with similar goals. ArgoCD has a richer UI, native support for sync waves and hooks, and a more declarative Application CRD. Flux CD is simpler, uses `kustomize-controller` and `source-controller`, and has better support for multi-tenancy out of the box. ArgoCD is more popular in enterprise teams that need a UI and dashboard. Flux CD is lighter-weight and preferred in pure CLI environments. Both integrate with Kustomize and Helm. Choose ArgoCD if you want a UI, sync dependencies, and a vibrant community. Choose Flux if you prefer minimalism and already use Kustomize heavily.

ArgoCD self-heal fought kubectl scale-down every 60s during a CPU incident.

Naren Founder & Principal Engineer

20+ years shipping production infrastructure and CI/CD at scale. Written from production experience, not tutorials.

✓ Production

production tested

July 18, 2026

last updated

2,466

articles · all by Naren

Before you start⏱ 30 min

✓Production DevOps experience
✓Deep understanding of the tool's internals
✓Experience debugging distributed systems

● Production Incident 🔎 Debug Guide ⚙ Triage Commands

⚡Quick Answer

ArgoCD is a Kubernetes controller that continuously reconciles live cluster state against a Git repository — the source of truth
Core components: Application CRD (defines source repo + target cluster), Sync waves (ordering of resources), Webhook (auto-sync on push), AppSets (dynamic app generation)
Performance impact: Default sync interval 3 minutes — large manifests (>10K resources) can take 30-60 seconds to process
Production trap: Auto-sync enabled without validation — a bad commit rolls out instantly to production, no one reviews
Biggest mistake: Mutating resources outside Git (kubectl edit) — ArgoCD auto-heal overwrites changes, causing "my fix disappeared" confusion

✦ Definition~90s read

What is ArgoCD for GitOps?

ArgoCD is a declarative GitOps continuous delivery tool for Kubernetes that implements a reconciliation loop to enforce that the live state of a cluster matches the desired state defined in a Git repository. It solves the fundamental problem of configuration drift — when manual changes, rollbacks, or emergency fixes cause a cluster to deviate from its source of truth.

★

Imagine your Kubernetes cluster is a LEGO city, and your Git repository is the official blueprint book.

Instead of relying on imperative commands or CI/CD pipelines that push changes one-way, ArgoCD continuously pulls from Git and automatically corrects any divergence, including reverting unauthorized scale-downs or re-applying deleted resources mid-outage. This makes it the de facto standard for production Kubernetes deployments, used by companies like Adobe, Intuit, and Ticketmaster to manage thousands of clusters with strict audit trails and rollback guarantees.

ArgoCD sits in the GitOps ecosystem alongside alternatives like Flux CD and Jenkins X, but differentiates itself with a mature Web UI, deep Kubernetes-native RBAC integration, and support for sync waves, hooks, and progressive delivery patterns. You should not use ArgoCD for simple single-service deployments where a basic CI/CD pipeline suffices, or in environments where Git is not the single source of truth (e.g., when you need real-time scaling based on external metrics without Git commits).

For multi-cluster and multi-tenant patterns, ArgoCD provides ApplicationSets that dynamically generate applications per cluster or namespace, enabling hardened production setups with automated sync policies, prune prevention, and self-healing that can recover from a 3× scale-down reversion during an outage without human intervention.

Plain-English First

Imagine your Kubernetes cluster is a LEGO city, and your Git repository is the official blueprint book. ArgoCD is the obsessive city manager who constantly compares the actual city to the blueprint — and the moment someone sneaks in an extra block that isn't in the book, the manager rips it out and puts things back exactly as drawn. You never have to phone the manager; they're always watching. That's GitOps: the blueprint IS the truth, and ArgoCD enforces it automatically.

Most teams reach a point where 'kubectl apply -f' starts feeling like playing Jenga blindfolded. One engineer deploys a hotfix directly to production, another runs a Helm upgrade from their laptop, and within a week nobody actually knows what's running in the cluster. Config drift is silent, cumulative, and eventually catastrophic. ArgoCD was built to make that problem structurally impossible — not through discipline or process, but through automation backed by a source of truth that everyone can see and audit.

ArgoCD implements the GitOps operator pattern: it runs inside your cluster, watches a Git repository, and continuously reconciles the live cluster state against the desired state declared in that repo. If the live state drifts — whether from a rogue kubectl command, a failing node replacement, or a mischievous CronJob — ArgoCD detects the divergence and can automatically heal it. This isn't just CI/CD with extra steps; it's a fundamentally different mental model where deployments are a side-effect of merging a pull request, not a separate pipeline stage.

By the end you'll understand ArgoCD's reconciliation engine, how to model complex multi-service deployments with sync waves and hooks, how to harden a production installation with RBAC and SSO, and the non-obvious gotchas that only surface after six months of running it.

ArgoCD GitOps — The Reconciliation Loop That Won't Let You Drift

ArgoCD GitOps is a deployment strategy where a Kubernetes cluster's desired state is declared in a Git repository, and ArgoCD continuously reconciles the live cluster to match that declaration. The core mechanic is a pull-based controller that polls Git (or listens via webhooks) and applies any diff it finds — automatically reverting manual changes, scaling events, or configuration drift back to the committed YAML. This makes Git the single source of truth, not the cluster operator's kubectl history.

In practice, ArgoCD runs a reconciliation loop every 3 minutes by default (configurable via timeout.reconciliation). It compares the live state against the target state in Git and uses Kubernetes server-side apply to correct any divergence. Key properties: auto-heal (reverts manual changes), self-heal (reverts drift from external controllers), and prune (removes resources not in Git). These are not optional — they are the mechanism that enforces Git as the source of truth. Without auto-heal, a manual kubectl scale deployment nginx --replicas=5 would persist until the next sync.

Use ArgoCD GitOps when you need auditability, rollback speed, and drift prevention in multi-cluster or regulated environments. It matters most during outages: if a team manually scales down a deployment to debug, ArgoCD will auto-heal it back to the Git-defined replica count within seconds — which can either save you or surprise you. The trade-off is that any emergency override must go through Git (PR + merge), not a quick kubectl command.

⚠ Auto-Heal Is Not Optional — It's the Default

Many teams enable auto-heal thinking it's a safety net, then get bitten when a manual scale-down during an incident is instantly reverted, escalating the outage.

📊 Production Insight

During a production incident, a team scaled a critical service from 10 to 3 replicas to reduce load on a failing database. ArgoCD auto-healed it back to 10 within 90 seconds, overwhelming the database again and causing a cascading failure. The rule: always disable auto-heal (or use the argocd app set --sync-policy=none flag) before making emergency manual changes, and re-enable it only after the incident is resolved.

🎯 Key Takeaway

ArgoCD enforces Git as the single source of truth — any manual change is drift and will be reverted.

Auto-heal is a double-edged sword: it prevents configuration rot but can escalate incidents during emergency operations.

Always have a documented process to temporarily disable auto-heal before making manual changes in a live incident.

thecodeforge.io

Argocd Gitops

The Reconciliation Loop — How ArgoCD Continuously Enforces Git State

ArgoCD runs as a set of Kubernetes controllers inside your cluster. The core component is the Application Controller, which runs a continuous reconciliation loop (default 3 minutes) for each Application resource. It fetches the desired state from Git (via repo server), compares it with the live state from the target cluster (using the Kubernetes API), and if they differ, it marks the Application as 'OutOfSync'. If auto-sync is enabled, it immediately applies the diff.

The key insight is that ArgoCD doesn't just apply YAML. It performs a three-way diff: live state (current cluster), desired state (Git), and last applied state (stored in the Application CRD status). This three-way merge prevents the 'last write wins' problem when resources are updated outside ArgoCD.

When a sync happens, ArgoCD orders resources using sync waves (annotations). Resources with lower wave numbers sync first. By default, all resources are wave 0. Custom Resource Definitions (CRDs) must be wave -1 or lower to be installed before custom resources that depend on them. Hooks (PreSync, Sync, PostSync) run as Jobs at specific stages, allowing you to run database migrations before updating deployments or health checks after.

The repository server caches Git contents. It supports Helm, Kustomize, and plain YAML. For Helm, it runs helm template internally. For Kustomize, it runs kustomize build. The repo server caches rendered manifests to speed up subsequent syncs.

io/thecodeforge/argocd/application.yamlYAML

# io.thecodeforge/argocd/application.yaml
# Production ArgoCD Application with sync waves and health checks
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: api-gateway
  namespace: argocd
  finalizers:
    - resources-finalizer.argocd.argoproj.io  # Clean up resources when app is deleted
spec:
  project: production
  source:
    repoURL: https://github.com/thecodeforge/infra.git
    targetRevision: HEAD
    path: overlays/production/api-gateway
    helm:
      valueFiles:
        - values.yaml
        - secrets.yaml  # Decrypted via SOPS with age key in argocd-cm
  destination:
    server: https://kubernetes.default.svc
    namespace: api-gateway
  syncPolicy:
    automated:
      prune: true       # Delete resources not in Git
      selfHeal: true    # Revert manual changes (kubectl edit)
      allowEmpty: false # Don't sync if would wipe all resources
    syncOptions:
      - ApplyOutOfSyncOnly=true
      - PrunePropagationPolicy=foreground
      - CreateNamespace=true
    retry:
      limit: 5
      backoff:
        duration: 5s
        factor: 2
        maxDuration: 3m
  syncWave: "-1"  # Sync before dependent apps
status:
  sync:
    status: Synced
  health:
    status: Healthy

Output

# Apply the Application

kubectl apply -f application.yaml

# ArgoCD detects new app and syncs within 3 minutes (or webhook triggers instantly)

# Check sync status:

argocd app get api-gateway

# Output example:

Name: api-gateway

Project: production

Server: https://kubernetes.default.svc

Namespace: api-gateway

URL: https://argocd.example.com/applications/api-gateway

Repo: https://github.com/thecodeforge/infra.git

Target: HEAD

Path: overlays/production/api-gateway

SyncWave: -1

Health Status: Healthy

Sync Status: Synced

GROUP KIND NAMESPACE NAME STATUS HEALTH

apps Deployment api-gateway api-gateway Synced Healthy

Service api-gateway api Synced Healthy

Mental Model

The Reconciliation Loop — Git as Source of Truth

ArgoCD runs an infinite loop: fetch Git, compare to cluster, sync if different. The cluster is a cache of Git; any change outside Git is temporary drift.

Application Controller: runs loop every 3 minutes (default). Compares live with desired using 3-way diff (live, desired, last-applied).
Repo Server: fetches Git, renders Helm/Kustomize, caches results. Runs as a separate pod to isolate security.
Sync Waves: resources with wave 0 sync first, then wave 1, etc. CRDs must be wave -1 to install before custom resources.
Hooks: PreSync runs before sync, Sync runs during, PostSync after. Use for DB migrations, smoke tests, or notifications.
Auto-heal: reverts manual changes (kubectl edit) within 3 minutes. Disable in production if you need incident overrides.

📊 Production Insight

ArgoCD doesn't see live changes instantly. The default 3-minute reconciliation loop means drift exists for up to 3 minutes.

Set status.autoRefresh: 1m in argocd-cm to check more frequently, but know that API server load increases linearly.

Rule: Use webhooks for instant sync on Git push. Combine with a 1-minute poll as fallback.

🎯 Key Takeaway

ArgoCD's reconciliation loop is the heart of GitOps: compare, diff, sync, repeat.

The three-way merge (live, desired, last-applied) prevents last-write-wins conflicts when resources are updated outside ArgoCD.

Rule: For production, set selfHeal: true to enforce Git as truth, but have a runbook to disable auto-sync during incidents.

Sync Policy Decision Tree

IfDevelopment environment, multiple PRs, rapid iteration

→

UseManual sync. Developer clicks 'Sync' after reviewing diff in UI. Webhook triggers but requires approval if auto-sync disabled.

IfStaging environment, automated tests pass after deploy

→

UseAuto-sync with prune: true, selfHeal: false. Tests can manually trigger sync via argocd app sync. Rollback by reverting Git.

IfProduction environment, require Git as source of truth

→

UseAuto-sync with prune: true, selfHeal: true. But use syncOptions: [ApplyOutOfSyncOnly=true] to avoid full resource diff on every sync.

IfProduction with emergency overrides (incident response)

→

UseDisable auto-sync for critical apps. Use webhook on tags only (targetRevision: v1.2.3). Manual sync via argocd app sync after tag is pushed.

IfResources with finalizers (namespaces, custom resources)

→

UseSet prunePropagationPolicy: foreground in sync options. Ensure finalizer allows deletion; some require manual removal via kubectl patch.

Sync Waves and Hooks — Ordering Complex Deployments

Kubernetes has no built-in ordering for applying resources. If you need to install a CRD before creating a custom resource (like Prometheus CRDs before Prometheus operator), you must order the sync. ArgoCD's sync waves solve this.

Every resource can be annotated with argocd.argoproj.io/sync-wave. Lower numbers sync first. Resources with wave -1 sync before wave 0. Resources in the same wave sync in parallel (order not guaranteed).

CRDs must be in wave -1 or lower. Custom resources (like Prometheus, Istio) must be in wave 0 or higher. If a custom resource references a CRD not yet installed, sync will fail. PreSync hooks run before any wave -1 resources; PostSync hooks run after wave 100.

Hooks are Kubernetes Jobs or Pods that run at specific stages. Example: a database migration container must run before the new app version starts. Use a PreSync hook: the Job runs, ArgoCD waits for it to complete (successfully), then syncs the Deployment. If the hook fails, ArgoCD stops the sync.

Hook types: PreSync (before sync), Sync (during sync, rare), PostSync (after all resources healthy), Skip (don't apply to cluster, just run). Hooks can fail the sync if they return non-zero exit code.

io/thecodeforge/argocd/sync-wave-hooks.yamlYAML

---
# io.thecodeforge/argocd/sync-wave-hooks.yaml
# Example: Install CRDs first (wave -1), then create operator (wave 0), then run migration (PreSync hook)

# CRDs must exist before custom resources
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: gatlingruns.gatling-operator.io
  annotations:
    argocd.argoproj.io/sync-wave: "-1"
spec:
  group: gatling-operator.io
  names:
    kind: GatlingRun
    plural: gatlingruns
  scope: Namespaced
  versions:
    - name: v1
      served: true
      storage: true
---
# Custom resource (depends on CRD above) — sync after wave -1
apiVersion: gatling-operator.io/v1
kind: GatlingRun
metadata:
  name: load-test-run
  annotations:
    argocd.argoproj.io/sync-wave: "0"
spec:
  image: gatling:latest
  replicas: 3
---
# PreSync hook: database migration before app deployment
apiVersion: batch/v1
kind: Job
metadata:
  name: db-migration
  annotations:
    argocd.argoproj.io/hook: PreSync
    argocd.argoproj.io/hook-delete-policy: hook-succeeded
spec:
  template:
    spec:
      containers:
        - name: migrate
          image: migrate/migrate:v4
          command:
            - /bin/sh
            - -c
            - "migrate -database postgres://user:pass@postgres:5432/app?sslmode=disable -path /migrations up"
      restartPolicy: Never
  backoffLimit: 2
---
# PostSync hook: deploy smoke test runner
apiVersion: batch/v1
kind: Job
metadata:
  name: smoke-test
  annotations:
    argocd.argoproj.io/hook: PostSync
    argocd.argoproj.io/hook-delete-policy: hook-succeeded
spec:
  template:
    spec:
      containers:
        - name: tester
          image: curlimages/curl:latest
          command:
            - /bin/sh
            - -c
            - "curl -f https://api.example.com/health || exit 1"
      restartPolicy: Never

Output

# ArgoCD processes resources in wave order:

1. All CRDs (wave -1) are applied and become available.

2. Custom resources (wave 0) are applied — they can now reference CRDs.

3. PreSync hook runs (db-migration Job) — ArgoCD waits for job completion.

4. If job succeeds, remaining resources (Deployment, Service) are synced.

5. PostSync hook (smoke-test) runs.

6. If any hook fails, ArgoCD marks the sync as Failed and stops.

⚠ Watch Out: Sync Waves in Parallel May Race

Resources in the same wave sync in parallel, but if one resource depends on another (e.g., Service references a Deployment's pod selector), they could apply out of order and cause temporary errors. ArgoCD will retry, but the Deployment might be missing when the Service is created. Use separate waves for strict ordering.

📊 Production Insight

CRDs must be wave -1 or they cause sync failures. Custom resources referencing them must be wave 0 or higher.

PreSync hooks are for migrations, schema updates, or pre-deployment checks. They must finish within the sync timeout (default 5 minutes).

Rule: Use hook-delete-policy: hook-succeeded to clean up hook Jobs after success, preventing resource buildup.

🎯 Key Takeaway

Sync waves enforce ordering for CRDs, namespaces, and dependent resources. Lower waves sync first.

PreSync hooks run before any resources; PostSync hooks run after all resources are healthy. Use them for migrations and validation.

Rule: Start CRDs at wave -2, namespaces at -1, core resources at 0, application deployments at 1.

Sync Wave Strategy

IfInstalling CRDs (e.g., Istio, Prometheus, Cert Manager)

→

UseWave -2 (earliest). Wait for CRDs to be established before any custom resources. Use kubectl get crd check in PreSync hook.

IfCreating namespace before any resources in it

→

UseWave -1. Namespace must exist before resources inside it. Annotate namespace with wave -1.

IfDeployment depends on ConfigMap or Secret

→

UseWave 0 for ConfigMap/Secret, Wave 1 for Deployment. The Deployment will restart automatically when ConfigMap changes if you use hash annotations.

IfDatabase migration must run before new app version

→

UsePreSync hook (Job). The hook runs before any sync wave 0 resources. Application restarts will pick up migrated schema.

IfLoad balancer creation depends on existing Deployment

→

UseWave 1 for Service. The Service's endpoint is populated only after the Deployment's pods are ready. Use wave ordering to avoid temporary 503s.

thecodeforge.io

Argocd Gitops

Multi-Cluster and Multi-Tenant Patterns — Production Hardening

ArgoCD can manage thousands of clusters from a single control plane. Each target cluster is represented by a secret in the ArgoCD namespace containing the cluster API server endpoint and bearer token. The Application Controller uses these secrets to connect and sync.

For multi-cluster management, organise Applications by cluster and environment: clusters/prod/us-east-1/apps, clusters/staging/eu-west-1/apps. Use ApplicationSets to generate Applications dynamically for each cluster with a templated Git path.

Multi-tenancy within a cluster: use Projects to isolate teams. Each Project defines source repositories, destination clusters/namespaces, and role-based access. For example, the 'team-a' project can only deploy to namespaces prefixed with 'team-a-' and only from Git repos under github.com/team-a.

RBAC in ArgoCD is its own model: policies are defined in argocd-rbac-cm. A policy like p, role:admin, applications, , /, allow gives full access. p, role:viewer, applications, get, team-a/, allow allows read-only access to team-a apps. Map OIDC groups to roles via oidc.config.

For secrets management, ArgoCD integrates with SOPS, SealedSecrets, or Vault. The recommended pattern: commit encrypted secrets to Git, decrypt them in the repo server using a KMS key or age private key stored in Kubernetes secrets. Never put plaintext secrets in Git.

io/thecodeforge/argocd/argocd-project.yamlYAML

# io/thecodeforge/argocd/argocd-project.yaml
# Multi-tenant RBAC with Projects
apiVersion: argoproj.io/v1alpha1
kind: AppProject
metadata:
  name: team-payments
  namespace: argocd
spec:
  description: "Payments team microservices"
  # Restrict source repos
  sourceRepos:
    - 'https://github.com/org/payments-infra.git'
  # Only allow deploying to specific clusters/namespaces
  destinations:
    - namespace: 'payments-*'
      server: https://kubernetes.default.svc
    - namespace: 'payments-*'
      server: https://prod-eu-1.k8s.example.com
  # Deny cluster-scoped resources except specific ones
  clusterResourceWhitelist:
    - group: ''
      kind: Namespace
    - group: rbac.authorization.k8s.io
      kind: ClusterRole
  # Allow only specific resource kinds
  namespaceResourceBlacklist:
    - group: ''
      kind: Secret  # Prevent teams from creating secrets outside vault
  roles:
    - name: developer
      policies:
        - p, proj:team-payments:developer, applications, sync, team-payments/*, allow
        - p, proj:team-payments:developer, applications, get, team-payments/*, allow
      groups:
        - payments-developers

---
# RBAC policy in argocd-rbac-cm (ConfigMap)
apiVersion: v1
kind: ConfigMap
metadata:
  name: argocd-rbac-cm
  namespace: argocd
data:
  policy.csv: |
    # Admin access to all projects
    p, role:admin, applications, *, */*, allow
    p, role:admin, projects, *, *, allow
    p, role:admin, clusters, *, *, allow
    
    # Team leads can sync their own apps but not change project config
    p, role:team-lead, applications, sync, team-*/production/*, allow
    p, role:team-lead, applications, get, team-*/*, allow
    
    # Read-only access for auditors
    p, role:auditor, applications, get, */*, allow
    
  policy.default: role:readonly  # Default role if not matched
  scopes: '[groups]'

Output

# Apply project and RBAC

kubectl apply -f argocd-project.yaml

kubectl apply -f argocd-rbac-cm.yaml

# Test access

argocd login argocd.example.com --sso

# Team payments developer (member of payments-developers group)

argocd app list # Shows only apps in team-payments project

# Attempt to create app in another project (fails due to RBAC)

argocd app create test --project other-project --repo ... # Error: permission denied

# List projects visible to user

argocd proj list

NAME DESCRIPTION

team-payments Payments team microservices

💡Project RBAC Prevents 'Cluster-Wide' Accidents

Always define an AppProject before creating Applications. It limits source repos, destination namespaces, and cluster resources. A missing project means developers can deploy to the kube-system namespace or any cluster. Use ClusterResourceWhitelist to restrict creation of ClusterRoles and ClusterRoleBindings.

📊 Production Insight

Without Projects, any user with ArgoCD access can deploy to any namespace, including kube-system, and create cluster-scoped resources.

A single sourceRepos restriction prevents deploying from a fork that contains malicious manifests.

Rule: Start with a 'restricted' project that only allows specific namespaces. Create an 'elevated' project for platform team with cluster-scoped access.

🎯 Key Takeaway

Projects isolate teams and limit what they can deploy. Always define a Project before creating Applications.

Multi-cluster management is built in: register clusters via secrets, use ApplicationSets to generate apps per cluster.

Rule: Restrict sourceRepos to your organisation's Git org. Never allow https://github.com/* — one malicious fork is all it takes.

Multi-Cluster Organization

If1-5 clusters, simple topology

→

UseSingle ArgoCD instance. Register each cluster via argocd cluster add. Use ApplicationSets with cluster generator.

If5-50 clusters, different regions, strict compliance

→

UseHub-and-spoke: one ArgoCD per region/fleet, aggregated by a 'cluster-of-clusters' ArgoCD. Use ApplicationSet with git generator per cluster folder.

IfEphemeral clusters (CI/CD, preview environments)

→

UseUse ApplicationSet with pull request generator. Each PR gets a new namespace and app. Tear down on merge via PreSync hook.

IfMulti-tenant with isolated teams

→

UseUse Projects per team. Each Project has own sourceRepos, destinations, and RBAC roles. No sharing of Applications between Projects.

IfSecrets must not be in Git (PCI, HIPAA)

→

UseUse SealedSecrets or Vault CSI driver. ArgoCD can sync the SealedSecret resource; the controller unseals it in-cluster. Plaintext never touches Git.

GitOps vs Traditional CI/CD — Why Push-Based Pipelines Burn You

Traditional CI/CD is push-based. Your pipeline runs, authenticates to the cluster, and applies manifests. That means your CI system needs cluster credentials — a blast radius that keeps site reliability engineers up at night. One compromised build server and an attacker has kubectl access to production.

GitOps flips the model. Argo CD runs inside the cluster and pulls from Git. The cluster never exposes an API endpoint to your pipeline. Credentials stay inside the cluster boundary. Your CI system only builds and pushes an artifact to a container registry. It never touches kubectl.

There's another difference: drift detection. A push-based pipeline deploys once and assumes the cluster stays in that state. Manual edits, scaling events, or a network partition? You won't know until the pager goes off at 3 AM. Argo CD's reconciliation loop runs every 3 minutes by default, catching drift before it becomes an incident.

This isn't theoretical. A major e-commerce platform had a staging cluster diverge from Git state after a junior engineer ran kubectl scale. Push-based CI never caught it. When they promoted the same configs to prod, the autoscaler fought the manual scaling for four hours. With GitOps, the controller reverts that change in under 180 seconds.

argocd-cm.yamlYAML

// io.thecodeforge
// ConfigMap to adjust the reconciliation loop interval
apiVersion: v1
kind: ConfigMap
metadata:
  name: argocd-cm
  namespace: argocd
data:
  timeout.reconciliation: "60s"  # default 180s, lower for high-churn envs
  admin.enabled: "false"

⚠ Production Trap:

Don't set timeout.reconciliation to 10s unless you enjoy throttling the controller. Every reconcile hits the Kubernetes API and Git server. Start at 60s, measure API server CPU, then tune down.

🎯 Key Takeaway

Push CI gives you speed. Pull GitOps gives you safety. Pick safety in production.

Managing Deployments with Helm and Kustomize — Config Flexibility Without the Mess

Plain YAML works for pet projects. In production, you need parameterization. Helm and Kustomize are the two dominant tools, and Argo CD supports both natively. The choice matters for how you structure environments.

Helm uses templates injected with values files — good when you have deep configuration trees and package dependencies. Your chart lives in a repo, Argo CD points to it, and you override values per environment via an Application's source.helm.parameters field. No need to maintain three copies of the same template.

Kustomize is overlay-based. You have a base set of manifests, then per-environment overlays that add patches. Argo CD can point directly to a kustomization.yaml in your repo. This works well when your team prefers vanilla Kubernetes manifests and hates debugging template syntax at 2 AM.

The anti-pattern: using both for the same app. Pick one per repository. Helm if you publish reusable packages. Kustomize if you manage a single app across dev, staging, and prod. Argo CD handles either but mixing them in the same repo confuses new hires and your code review process.

Real example: a fintech startup used Helm for shared PostgreSQL configs but Kustomize for their microservice overlays. Clear boundary. New developers onboarded in a day.

argocd-helm-app.yamlYAML

// io.thecodeforge
// Application using Helm chart with environment-specific overrides
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: payments-api
  namespace: argocd
spec:
  project: default
  source:
    repoURL: 'https://charts.bitnami.com/bitnami'
    chart: nginx
    targetRevision: 15.x.x
    helm:
      values: |
        replicaCount: 3
        service:
          type: ClusterIP
      parameters:
        - name: image.tag
          value: 1.25.3
  destination:
    server: 'https://kubernetes.default.svc'
    namespace: prod-payments

🔥Smoke Test Commitment:

Always pin targetRevision to a specific chart version, not 'latest'. A Helm chart update can change default values and silently break your deployment.

🎯 Key Takeaway

Don't fight the tool. Helm for packaged dependencies, Kustomize for app-specific overlays. Never both in the same repo.

● Production incidentPOST-MORTEMseverity: high

The Auto-Heal That Wiped Production During a PagerDuty Incident

Symptom

Deployment was consuming 100% CPU. Senior engineer ran kubectl scale deploy api --replicas=0 to stop traffic while investigating. Within 1 minute, the deployment scaled back to 3 replicas. They scaled down again; it scaled back up. The team thought the incident was automated chaos until someone noticed ArgoCD logs showing successfully synced every 60 seconds. The team spent 20 minutes fighting their own tooling.

Assumption

The team assumed auto-heal was only for 'drift like changed image tags', not for operational scaling during incidents. They didn't realise ArgoCD treats ANY deviation from Git as drift, including scaling operations. They had no temporary suspension mechanism for incidents.

Root cause

ArgoCD Application had syncPolicy.autoSync = true but lacked syncPolicy.retry config and syncPolicy.syncOptions = ["ApplyOutOfSyncOnly=true"]. The manual scale-down was treated as drift. ArgoCD's controller compared live state with Git state, saw replica count mismatch (0 vs 3), and reverted it every reconciliation loop (default 3 minutes). The team had no way to pause ArgoCD for a single Application without temporarily editing the Application CRD, which they couldn't do while debugging under pressure.

Fix

Added automated: selfHeal: true (separate from sync) to distinguish between auto-sync and self-heal. Actually, the root fix: implemented an incident runbook step: kubectl patch application api -p '{"spec":{"syncPolicy":{"automated":null}}}' -n argocd to disable auto-sync temporarily. Better: used ArgoCD's manual sync policy for production and moved to feature flags to disable sync during incidents. Added a argocd app pause workflow pattern documented in the team runbook. Installed ArgoCD Notifications to alert when sync overrides manual changes.

Key lesson

Auto-heal without an incident suspension mechanism turns your GitOps tool into an adversary during outages. Always have a documented way to pause sync.
Production should use manual sync or have strict automated: prune: true, selfHeal: true only with pre-commit hooks that validate manifests.
Use ArgoCD's annotation: argocd.argoproj.io/manual-sync to prevent auto-sync for critical apps, or set automated: {} on prod and trigger syncs via webhook only on tagged releases.
Monitor for sync-related events: argocd app get <app> --refresh shows when the last sync overrode something. Alert if syncs happen outside planned deployment windows.

Production debug guideSymptom → Action mapping for common GitOps failures5 entries

Symptom · 01

Syncing stuck in 'OutOfSync' but status says 'Synced' — resources not updating

→

Fix

Check if resource is managed by another controller (e.g., HPA scales replicas, ArgoCD reverts). Run kubectl describe app <app> -n argocd. Look at 'Conditions' and 'Reconciliation ID'. The resource might be excluded via resource.exclude: <kind> in argocd-cm. Also check for CompareOptions: IgnoreExtraneous if fields are being added by webhooks.

Symptom · 02

Sync stuck in 'Running' or 'Pending' for hours

→

Fix

Check if webhook is unreachable or if Git repo has locked files. Run argocd app manifests <app> to see if manifests resolve. Check network policies blocking ArgoCD to Git (port 443). Delete the pod of the Application Controller to force requeue — kubectl delete pod -n argocd argocd-application-controller-0.

Symptom · 03

Prune failing — resources not deleted from cluster

→

Fix

Check finalizers: resources with finalizers (e.g., kubernetes finalizer on namespaces) block deletion. Use --prune-propagation-policy=foreground in sync options. Check if resource is protected by another app (fluentd-logging might be shared). Set prune: false for resources that should persist across app deletion.

Symptom · 04

Webhook not triggering sync — have to click 'Sync' manually

→

Fix

Check GitHub/GitLab webhook delivery logs. Verify argocd-server service is reachable from internet (or use a webhook proxy like Smee). Ensure webhook secret in argocd-secret matches. Use argocd app sync --force --prune for manual push if webhook broken.

Symptom · 05

Permissions error: 'forbidden: User "system:serviceaccount:argocd:argocd-application-controller" cannot get resource'

→

Fix

ArgoCD's service account lacks RBAC. Add cluster roles in argocd-server cluster role. For cross-cluster deployments, ensure cluster credentials in argocd-cm and argocd-ssh-known-hosts-cm. Use argocd cluster add <context> to generate correct RBAC.

★ ArgoCD Quick Debug Cheat SheetFast diagnostics for production GitOps issues. Run these before changing any Git manifests.

OutOfSync — resources not matching Git−

Immediate action

Check diff to see what ArgoCD thinks is different

Commands

argocd app get <app> --show-operation --refresh

argocd app diff <app> --revision HEAD

Fix now

Update Git manifest to match live state or force live state to match Git using argocd app sync <app> --force.

ArgoCD not syncing after Git push+

Deployment hangs — new pods not starting+

ArgoCD UI says 'Unknown' or 'Connection Refused' for cluster+

Sync failed: 'Failed to load target state: rpc error'+

ArgoCD Sync Strategies Compared

Strategy	Sync Trigger	Auto-Heal	Risk	Best For
Manual	User clicks 'Sync' in UI or CLI	No (drift detected but not corrected)	Human delay — config drift accumulates	Production with change approval process
Automated with selfHeal: false	Webhook or scheduled poll (3 min)	No (drift detected, requires manual fix)	Drift can persist until manual sync. Good for audit trails.	Staging environments, high-reg industries
Automated with selfHeal: true	Webhook or poll	Yes (any drift reverted)"	Emergency overrides impossible without pausing sync	Production where Git is absolute truth
ApplicationSet + PR Generator	GitHub/GitLab webhook on PR	No (PR apps are ephemeral)	Old apps may linger if not pruned	Preview environments per pull request
Image Updater (auto-update image tags)	Image registry webhook	Yes (drift on image tag changed manually)	Unpinned tags (latest) roll out untested images	Nightly builds, canary environments

⚙ Quick Reference

5 commands from this guide

File	Command / Code	Purpose
iothecodeforgeargocdapplication.yaml	apiVersion: argoproj.io/v1alpha1	The Reconciliation Loop
iothecodeforgeargocdsync-wave-hooks.yaml	apiVersion: apiextensions.k8s.io/v1	Sync Waves and Hooks
iothecodeforgeargocdargocd-project.yaml	apiVersion: argoproj.io/v1alpha1	Multi-Cluster and Multi-Tenant Patterns
argocd-cm.yaml	apiVersion: v1	GitOps vs Traditional CI/CD
argocd-helm-app.yaml	apiVersion: argoproj.io/v1alpha1	Managing Deployments with Helm and Kustomize

Key takeaways

ArgoCD's reconciliation loop is the core of GitOps

it continuously enforces Git as the single source of truth, either by manual approval or automatic sync.

Sync waves order resource creation (CRDs first, then resources). PreSync/PostSync hooks run migrations and validation before/after sync critical stateful changes.

Projects with RBAC and sourceRepo restrictions prevent cluster-wide accidents and multi-tenant chaos. Always define an AppProject before creating Applications.

Auto-heal is powerful but can fight operators during incidents. Disable it temporarily with kubectl patch when you need emergency overrides.

Never put plaintext secrets in Git. Use SealedSecrets or SOPS with age/KMS to encrypt, commit encrypted manifests, and decrypt in the repo server or cluster.

INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01SENIOR

Explain the difference between `selfHeal` and `prune` in ArgoCD sync pol...

Q02SENIOR

How does ArgoCD handle secrets? Walk me through a secure pattern for dat...

Q03SENIOR

What is the difference between an Application and an ApplicationSet? Whe...

Q04SENIOR

What happens when you delete an ArgoCD Application but forget to set the...

Q01 of 04SENIOR

Explain the difference between `selfHeal` and `prune` in ArgoCD sync policy — and why you might disable selfHeal in production.

ANSWER

prune controls whether ArgoCD deletes resources that exist in the cluster but are not present in Git. If disabled, removing a Deployment from Git leaves it running in the cluster (resource leak). selfHeal controls whether ArgoCD reverts manual changes made directly to the cluster (e.g., kubectl edit deployment). If enabled, any drift detected in the next reconciliation loop is overwritten with the Git state. In production, you might disable selfHeal temporarily during incident response: if a pod is crashing and you need to scale down manually to stop the crash loop, selfHeal would immediately revert the scale down. Instead, you disable selfHeal, make your operational changes, investigate, then re-enable it. For long-running production, selfHeal is usually enabled to enforce Git as the absolute source of truth — but you need a documented process to suspend it during emergencies.

FAQ · 4 QUESTIONS

Frequently Asked Questions

Can I use ArgoCD with Helm?

How do I roll back a deployment in ArgoCD?

Can ArgoCD manage CRDs that are installed by another operator?

How does ArgoCD compare to Flux CD?

Naren Founder & Principal Engineer

20+ years shipping production infrastructure and CI/CD at scale. Written from production experience, not tutorials.

✓ Verified

production tested

July 18, 2026

last updated

2,466

articles · all by Naren

🔥

That's CI/CD. Mark it forged?

5 min read · try the examples if you haven't