Senior 6 min · March 06, 2026

Cloud Run — Health Check Triggered Cascading 503 Failures

At 500+ RPS, a Cloud Run health check that runs a full DB query triggers cascading 503 failures.

N
Naren · Founder
Plain-English first. Then code. Then the interview question.
About
 ● Production Incident 🔎 Debug Guide
Quick Answer
  • Cloud Run runs any container that listens on HTTP and is fully managed by Google
  • Scales to zero when idle – you pay only for request processing time per 100ms
  • Key components: container image, service account, Secret Manager, VPC connector
  • Cold start adds 200ms–2s on first request; fix with --min-instances 1
  • Production break: hardcoding the port causes health check failures silently
  • Biggest mistake: using the default Compute Engine service account gives blanket editor access
Plain-English First

Imagine you run a lemonade stand, but instead of renting a shop 24/7, you only pay for the exact minutes customers are buying lemonade — the stand appears instantly when someone walks up and vanishes when they leave. Google Cloud Run works exactly like that for your code: you package your app into a container, hand it to Google, and they handle spinning it up when requests arrive and shutting it down when things go quiet. You never touch a server, never patch an OS, and never pay for idle time.

Every developer eventually hits the same wall: the app works perfectly on your laptop, but getting it into production means provisioning servers, configuring load balancers, managing autoscaling groups, and babysitting infrastructure at 2am. For teams that just want their code to run reliably at scale, that overhead is expensive, slow, and frankly soul-crushing. Cloud Run was built to eliminate exactly that gap between 'it works on my machine' and 'it's live in production'.

Cloud Run is Google's fully managed serverless container platform. Unlike AWS Lambda, which forces you into specific runtimes and tiny deployment packages, Cloud Run runs any container that listens on a port and responds to HTTP. That one constraint — your app must be stateless and HTTP-driven — unlocks everything else: automatic scaling from zero to thousands of concurrent requests, per-100ms billing, global deployment, and zero server management. It bridges the gap between rigid Function-as-a-Service platforms and the full complexity of Kubernetes.

By the end of this article you'll understand exactly how Cloud Run's request-driven execution model works, how to containerize a real Node.js API and deploy it with a single command, how to wire up environment variables and secrets the right way, and how to avoid the three mistakes that burn developers on their first production deployment. You'll also walk away knowing how to answer the Cloud Run questions that actually come up in DevOps and platform engineering interviews.

How Cloud Run's Request-Driven Model Actually Works

Before writing a single line of code, you need to understand Cloud Run's mental model — because it changes how you architect your app.

When no requests are hitting your service, Cloud Run scales it to zero. There are literally no running instances. The moment a request arrives, Cloud Run starts a container instance, routes the request to it, and keeps that instance alive to handle more requests for a short idle period. If traffic spikes, it starts more instances in parallel. This is called request-driven scaling, and it's the core reason Cloud Run is cheap for low-traffic services and effortless to scale for high-traffic ones.

The critical implication: your container must be stateless. Don't store session data in memory between requests, don't write to local disk expecting it to persist, and don't open background threads that do work outside of a request lifecycle. Any state must live in an external system — Cloud SQL, Firestore, Redis, or Cloud Storage.

Cold starts are the one real trade-off. When Cloud Run starts a fresh instance, there's a brief delay (typically 200ms–2s depending on your image size and runtime) before it can serve traffic. For latency-sensitive APIs, you can set a minimum instance count of 1 to keep a warm instance always running — at the cost of paying for that idle time. For batch jobs or internal tools, cold starts usually don't matter at all.

cloud_run_deploy.shBASH
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
#!/bin/bash
# ─────────────────────────────────────────────────────────────
# GOAL: Build a Docker image, push it to Artifact Registry,
#       and deploy it as a Cloud Run service in one script.
#
# PREREQUISITES:
#   - gcloud CLI installed and authenticated
#   - Docker installed and running
#   - A GCP project with billing enabled
# ─────────────────────────────────────────────────────────────

# Replace these with your actual values
GCP_PROJECT_ID="my-production-project"
GCP_REGION="us-central1"
SERVICE_NAME="product-api"
IMAGE_NAME="us-central1-docker.pkg.dev/${GCP_PROJECT_ID}/cloud-run-services/${SERVICE_NAME}"

# Step 1: Authenticate Docker to push to Artifact Registry
# gcloud configures Docker credentials automatically — no manual login needed
gcloud auth configure-docker us-central1-docker.pkg.dev --quiet

# Step 2: Build the Docker image and tag it for Artifact Registry
# The tag format must match your Artifact Registry repository path exactly
docker build \
  --tag "${IMAGE_NAME}:latest" \
  --platform linux/amd64 \
  . # <── build context is the current directory (where Dockerfile lives)

# Step 3: Push the image to Artifact Registry
# Cloud Run pulls from here at deploy time — it never touches Docker Hub by default
docker push "${IMAGE_NAME}:latest"

# Step 4: Deploy to Cloud Run
gcloud run deploy "${SERVICE_NAME}" \
  --image "${IMAGE_NAME}:latest" \
  --platform managed \
  --region "${GCP_REGION}" \
  --allow-unauthenticated \
  --port 8080 \
  --memory 512Mi \
  --cpu 1 \
  --min-instances 0 \
  --max-instances 100 \
  --concurrency 80
  # --allow-unauthenticated: makes the service publicly accessible
  # --concurrency 80: each instance handles up to 80 simultaneous requests
  # --min-instances 0: scale to zero when idle (cheapest option)
  # --max-instances 100: hard cap to prevent runaway billing
Output
Configuring Docker credentials... Done.
Building Docker image for linux/amd64...
Step 1/8 : FROM node:20-alpine
Step 2/8 : WORKDIR /app
...
Successfully built a3f8c91d2b44
Successfully tagged us-central1-docker.pkg.dev/my-production-project/cloud-run-services/product-api:latest
Pushing image to Artifact Registry...
latest: digest: sha256:7d3e... size: 1847
Deploying container to Cloud Run service [product-api] in [us-central1]
OK Deploying new service... Done.
OK Creating Revision... Revision product-api-00001-abc is active and serving 100% of traffic.
Service URL: https://product-api-abc123-uc.a.run.app
Watch Out: Platform Mismatch
Always build with --platform linux/amd64. If you're on an Apple Silicon Mac and skip this flag, your image builds for arm64, deploys fine, but Cloud Run silently runs it under emulation — causing random slowness and occasional crashes that are nearly impossible to debug.
Production Insight
A production API that stores session in memory will lose all user sessions on every scale-down.
Solution: use Firestore or Redis for session state.
Cold start latency of 500ms caused a checkout timeout for a retail client — we added --min-instances 2 and cut P95 latency by 80%.
Key Takeaway
Cloud Run is stateless by design.
Store external state, not local memory.
Cold starts are real — test for your use case before blaming the platform.

Building a Real Containerized API That Cloud Run Will Love

Cloud Run's only requirement is that your container listens on the port defined by the PORT environment variable. That's it. Cloud Run injects PORT at runtime — you don't hardcode it. This one detail trips up a lot of developers who hardcode 3000 or 8080 in their app and then wonder why health checks fail.

Here's a real-world pattern: a lightweight Node.js product API. Notice how the app reads PORT from the environment, handles a health check endpoint (which Cloud Run hits to confirm your container started successfully), and does clean shutdown on SIGTERM (Cloud Run sends SIGTERM before killing an instance during scale-down, giving you a chance to finish in-flight requests).

The Dockerfile matters as much as the code. Keep your image small — every extra MB adds cold start latency. Use multi-stage builds to separate build dependencies from the runtime image, use Alpine-based base images, and always run as a non-root user. Cloud Run doesn't require root, and running as root is a security smell that will get flagged in any serious security audit.

DockerfileDOCKERFILE
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
# ── Stage 1: Install dependencies ──────────────────────────────
# We use a full Node image here because we need npm to install packages
FROM node:20-alpine AS dependency-installer
WORKDIR /build

# Copy package files first — Docker layer caching means if these
# haven't changed, npm install is skipped entirely on rebuild
COPY package.json package-lock.json ./
RUN npm ci --omit=dev
# npm ci is stricter than npm install — it respects package-lock.json exactly
# --omit=dev drops devDependencies so they don't bloat the final image

# ── Stage 2: Runtime image ──────────────────────────────────────
# Start fresh from a minimal Alpine image — no build tools, no npm cache
FROM node:20-alpine AS runtime
WORKDIR /app

# Create a non-root user — Cloud Run doesn't need root and neither does your app
RUN addgroup --system api-group && adduser --system --ingroup api-group api-user

# Copy only what we need: the installed node_modules and the app source
COPY --from=dependency-installer /build/node_modules ./node_modules
COPY src/ ./src/
COPY package.json ./

# Switch to non-root user before the app starts
USER api-user

# Cloud Run injects PORT at runtime — we expose 8080 as a documentation hint
# but the app itself must read process.env.PORT, not hardcode this number
EXPOSE 8080

CMD ["node", "src/server.js"]
Output
# When you run: docker build --platform linux/amd64 -t product-api .
[+] Building 14.3s (12/12) FINISHED
=> [dependency-installer 1/4] FROM node:20-alpine 3.1s
=> [dependency-installer 3/4] COPY package.json ... 0.1s
=> [dependency-installer 4/4] RUN npm ci --omit=dev 8.4s
=> [runtime 3/5] COPY --from=dependency-installer ... 0.2s
=> [runtime 4/5] COPY src/ ./src/ 0.1s
=> exporting to image 0.4s
=> naming to docker.io/library/product-api:latest 0.0s
# Final image size: 98MB (vs ~1.1GB if you used a non-Alpine full Node image)
Pro Tip: Read PORT from the Environment
In your Node.js app, always start your server with: const port = parseInt(process.env.PORT) || 8080; — Cloud Run injects PORT automatically. If you hardcode 3000 or 8080, your app will usually still work, but only by coincidence. The day Cloud Run changes the injected port, you'll have a silent production failure.
Production Insight
A team pushed a 1.2GB image with full dev tools — cold starts took 8 seconds.
Multi-stage builds cut it to 98MB, cold starts dropped to 400ms.
Use docker image ls to spot bloat. Always run as non-root — Cloud Run audits require it.
Key Takeaway
The PORT env var is not optional — your app must read it.
Multi-stage builds cut cold starts by 90%.
SIGTERM handling prevents 502 errors during scale-down.

Wiring Up Secrets, Environment Variables, and Service Accounts Correctly

This is where most tutorials stop, and where real production deployments begin. Your app almost certainly needs secrets — database passwords, API keys, JWT signing keys. Hardcoding them into environment variables in your Cloud Run service definition means they show up in plaintext in your deployment history and in anyone's gcloud run describe output. That's a compliance and security problem.

The right pattern is to store secrets in Google Secret Manager and grant your Cloud Run service's service account permission to read them. Cloud Run can then mount secrets as environment variables or as files at startup — they're injected at runtime, never baked into the image or visible in the service config.

Service accounts are equally important. By default, Cloud Run uses the Compute Engine default service account, which has editor-level access to your entire project. That's far too permissive. Create a dedicated service account for each Cloud Run service, grant it only the specific IAM roles it needs (like roles/secretmanager.secretAccessor and roles/cloudsql.client), and attach it at deploy time. This is the principle of least privilege, and it's not optional in production.

setup_secrets_and_iam.shBASH
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
#!/bin/bash
# ─────────────────────────────────────────────────────────────
# GOAL: Create a dedicated service account for the product-api,
#       store secrets in Secret Manager, and deploy Cloud Run
#       with proper IAM bindings — no plaintext secrets anywhere.
# ─────────────────────────────────────────────────────────────

GCP_PROJECT_ID="my-production-project"
GCP_REGION="us-central1"
SERVICE_NAME="product-api"
SERVICE_ACCOUNT_NAME="product-api-runner"
SERVICE_ACCOUNT_EMAIL="${SERVICE_ACCOUNT_NAME}@${GCP_PROJECT_ID}.iam.gserviceaccount.com"
IMAGE_NAME="us-central1-docker.pkg.dev/${GCP_PROJECT_ID}/cloud-run-services/${SERVICE_NAME}:latest"

# Step 1: Create a dedicated service account for this Cloud Run service
# Never use the default Compute SA — it has far too many permissions
gcloud iam service-accounts create "${SERVICE_ACCOUNT_NAME}" \
  --display-name "Service Account for product-api Cloud Run service" \
  --project "${GCP_PROJECT_ID}"

# Step 2: Store the database password in Secret Manager
# Read the secret from stdin so it never appears in your shell history
echo -n "super-secret-db-password" | \
  gcloud secrets create DB_PASSWORD \
    --data-file=- \
    --replication-policy="automatic" \
    --project "${GCP_PROJECT_ID}"
# The -n flag on echo prevents a trailing newline — important for passwords!

# Step 3: Grant the service account permission to READ this specific secret
# Note: secretAccessor only allows reading — not creating or deleting secrets
gcloud secrets add-iam-policy-binding DB_PASSWORD \
  --member="serviceAccount:${SERVICE_ACCOUNT_EMAIL}" \
  --role="roles/secretmanager.secretAccessor" \
  --project "${GCP_PROJECT_ID}"

# Step 4: Deploy Cloud Run with the dedicated service account
# and mount the secret as an environment variable at runtime
gcloud run deploy "${SERVICE_NAME}" \
  --image "${IMAGE_NAME}" \
  --platform managed \
  --region "${GCP_REGION}" \
  --service-account "${SERVICE_ACCOUNT_EMAIL}" \
  --update-secrets="DB_PASSWORD=DB_PASSWORD:latest" \
  # ↑ Format: ENV_VAR_NAME=SECRET_NAME:VERSION
  # Cloud Run fetches the secret at startup and injects it as an env var
  # Your app reads it with: process.env.DB_PASSWORD — same as any env var
  --set-env-vars="NODE_ENV=production,DB_HOST=10.0.0.5,DB_NAME=products_db" \
  --allow-unauthenticated

# Step 5: Verify the deployment and check the service account is correct
gcloud run services describe "${SERVICE_NAME}" \
  --region "${GCP_REGION}" \
  --format="value(spec.template.spec.serviceAccountName)"
Output
Created service account [product-api-runner].
Created version [1] of the secret [DB_PASSWORD].
Updated IAM policy for secret [DB_PASSWORD].
bindings:
- members:
- serviceAccount:product-api-runner@my-production-project.iam.gserviceaccount.com
role: roles/secretmanager.secretAccessor
Deploying container to Cloud Run service [product-api]...
OK Deploying... Done.
OK Creating Revision... Revision product-api-00002-xyz is active.
Service URL: https://product-api-abc123-uc.a.run.app
# Output of the describe command:
product-api-runner@my-production-project.iam.gserviceaccount.com
Interview Gold: Secrets vs Environment Variables
Interviewers love asking how you'd handle secrets in Cloud Run. The wrong answer is --set-env-vars=DB_PASSWORD=mypassword. The right answer is Secret Manager + --update-secrets, because secrets are encrypted at rest, access is audited in Cloud Audit Logs, and rotating a secret doesn't require a redeployment — just a new secret version.
Production Insight
We once audited a service that used the default Compute SA — it could delete Cloud SQL instances. One compromised container could have taken down the entire database.
Least privilege: create one SA per service.
Use gcloud iam service-accounts list to review.
Key Takeaway
Never use the default service account.
Secrets go in Secret Manager — never in env vars.
One service account per service: hard boundary against blast radius.

Deploying and Monitoring: From First Deploy to Production Observability

Deploying your container is just the beginning. Once it's live, you need to monitor health, latency, and errors. Cloud Run integrates directly with Cloud Monitoring and Logging, but you need to know what to look for.

First, set up a health check endpoint that validates your app's critical dependencies — database connectivity, cache status, external API reachability. Cloud Run uses this for startup probes and eventually for traffic routing. A failing health check means the revision won't receive traffic.

Second, enable Cloud Logging and set log-based alerts. The most common production issues are 5xx errors, latency spikes, and concurrency limit hits. Cloud Run logs every request with status code, latency, and instance id. You can create metrics from these logs.

Third, understand concurrency. By default, each instance handles up to 80 concurrent requests. If your app is I/O-bound (calling a database or external API), increase concurrency to 250+. If it's CPU-bound, lower it. Monitor the container instance count metric — if it's constantly maxing out, reduce concurrency or increase max-instances.

Finally, set up notifications for revision failures and cost anomalies. Cloud Run bills per 100ms, so a runaway instance can cause unexpected bills. Set a budget alert in Google Cloud Billing.

monitoring_setup.shBASH
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
#!/bin/bash
# ─────────────────────────────────────────────────────────────
# GOAL: Create a log-based metric for 5xx errors, set up a
#       Cloud Monitoring alert, and view recent logs.
# ─────────────────────────────────────────────────────────────

# Create a log-based counter metric for HTTP 5xx responses
gcloud logging metrics create product-api-5xx \
  --description "Count of 5xx responses for product-api" \
  --log-filter='resource.type="cloud_run_revision" AND resource.labels.service_name="product-api" AND http_request.status >= 500'

# Create an alert policy (simplified — actually done via Monitoring UI or Terraform)
# This command just outlines the concept
# gcloud alpha monitoring policies create ...

# View recent 5xx logs directly
gcloud logging read 'resource.type="cloud_run_revision" AND resource.labels.service_name="product-api" AND http_request.status >= 500' \
  --limit 10 \
  --format="table(timestamp, http_request.status, http_request.latency)"
Output
Created metric [product-api-5xx].
# Sample log output
2025-12-01T12:34:56Z 500 1.234s
2025-12-01T12:35:10Z 502 2.001s
2025-12-01T12:35:15Z 500 0.890s
Pro Tip: Set CPU Throttling
By default, Cloud Run throttles CPU when an instance is not handling a request. For background tasks or health checks, disable CPU throttling with --no-cpu-throttling. This ensures your health check endpoint always has CPU available.
Production Insight
We missed a Log-based alert for 5xx errors. A database migration caused 10% error rate for 4 hours before we noticed. The fix was a 20-line Cloud Monitoring alert that now fires within 60 seconds of elevated error rates. Set alerts on day one — not after the incident.
Key Takeaway
Health checks must validate dependencies.
Log-based metrics catch silent failures.
Budgets prevent billing surprises from runaway scaling.

Advanced: VPC Connectors, Custom Domains, and Traffic Splitting

Once you're comfortable with basic deployments, you'll hit the advanced scenarios: connecting to private resources (like Cloud SQL with private IP), using a custom domain with SSL, and rolling out canary revisions.

VPC Connector: To reach resources inside your VPC (e.g., a private Cloud SQL instance), you need a Serverless VPC Access connector. Create it in your VPC, then attach it to your Cloud Run service with --vpc-connector. Without it, your container can access the internet but not your private resources. This is a common cause of 'connection refused' that's hard to debug.

Custom domains: By default, you get a .run.app URL. For production, map a custom domain using gcloud beta run domain-mappings create. Cloud Run auto-provisions an SSL certificate via Google-managed certificates. The gotcha: DNS propagation can take 10–30 minutes, and the domain must be verified (you need access to manage DNS records).

Traffic splitting: You can send a percentage of traffic to a specific revision. This powers canary deployments and A/B testing. Use gcloud run services update-traffic to split e.g., 95% to stable, 5% to new-revision. Rollback is instant: set 100% back to the old revision.

All three features require the managed platform (not Cloud Run on GKE).

advanced_setup.shBASH
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
#!/bin/bash
# ─────────────────────────────────────────────────────────────
# GOAL: Set up VPC connector, map custom domain, split traffic.
# ─────────────────────────────────────────────────────────────

VPC_CONNECTOR_NAME="my-connector"
VPC_NETWORK="default"
REGION="us-central1"

# Step 1: Create a Serverless VPC Access connector (requires VPC Access API)
# Note: this takes about 2-3 minutes to provision
gcloud compute networks vpc-access connectors create ${VPC_CONNECTOR_NAME} \
  --region ${REGION} \
  --network ${VPC_NETWORK} \
  --range 10.8.0.0/28

# Step 2: Deploy with VPC connector (and no egress settings)
gcloud run deploy product-api \
  --vpc-connector ${VPC_CONNECTOR_NAME} \
  --vpc-egress=private-ranges-only  # only route RFC1918 traffic through connector

# Step 3: Map custom domain
# First verify ownership: create a TXT record as instructed by the command
# gcloud beta run domain-mappings create --service product-api --domain api.mycompany.com

# Step 4: Traffic splitting – 5% canary
gcloud run services update-traffic product-api \
  --to-revisions=product-api-00001-wib=95,product-api-00002-xyz=5

# Rollback: send all traffic back to the old revision
gcloud run services update-traffic product-api \
  --to-revisions=product-api-00001-wib=100
Output
Created VPC Access Connector [my-connector].
OK Deploying revision... Done.
Service URL: https://product-api-abc123-uc.a.run.app
# Domain mapping output (truncated)
Domain: api.mycompany.com
Status: PENDING_VERIFICATION
Please configure your DNS by adding a TXT record with the following value...
# Traffic split output
Current traffic allocation:
product-api-00001-wib: 95%
product-api-00002-xyz: 5%
OK, traffic updated.
Traffic Splitting Gotcha
Traffic splitting works per revision, not per tag. If you deploy a new revision, it starts at 0% traffic. You need to explicitly shift traffic. Use --no-traffic flag on deploy to create a revision without routing traffic to it.
Production Insight
A team tried to reach Cloud SQL via its private IP but forgot the VPC connector. The app timed out on every database call. After an hour of debugging, they added the connector and it worked instantly. Also: custom domain mapping failed because the TXT record was typo'd — always double-check DNS values.
Key Takeaway
VPC connector is mandatory for private resource access.
Custom domains need DNS verification – allow 30 min propagation.
Traffic splitting lets you canary with zero-risk rollback.
● Production incidentPOST-MORTEMseverity: high

The Silent 503 Spike: How a Missing Health Check Took Down a Cloud Run Service

Symptom
Service returned 503 errors during peak traffic (500+ RPS). Errors vanished after scaling down instance count. No database or infrastructure alerts.
Assumption
Team assumed the 503s were due to database connection pool exhaustion or Cloud SQL CPU limits. They scaled up connection pools and added more CPU — no improvement.
Root cause
The health check endpoint was performing a full database query on every call. During high concurrency, many instances were started and each one immediately ran a health check. The database was overwhelmed not by application traffic but by health check queries, causing new instances to fail their health checks and be killed before serving traffic. Cloud Run then started even more instances, creating a cascading failure.
Fix
Changed the health check endpoint to a lightweight 'alive' check that only confirms the process is running (e.g., return 200 without touching the database). Moved database connectivity checks to a separate /ready endpoint with a longer interval. Set --startup-cpu-boost to give health checks extra CPU during cold start.
Key lesson
  • Health check endpoints must be cheap and independent of backend dependencies.
  • Separate liveness (is process running?) from readiness (can we serve traffic?).
  • Use Cloud Logging to correlate 503 spikes with health check latency.
  • Add startup-cpu-boost for cold-start heavy operations.
Production debug guideSymptom → Action guide for production issues5 entries
Symptom · 01
Container fails to start – health check fails immediately
Fix
Check logs: gcloud logging read 'resource.labels.service_name="YOUR_SERVICE" AND severity>=ERROR' --limit 10. Verify PORT env var is read correctly in your app. Run the container locally with docker run -e PORT=8080 -p 8080:8080 your-image and test the health endpoint.
Symptom · 02
Intermittent 502/503 errors under load
Fix
Check concurrency: gcloud run services describe YOUR_SERVICE --format='value(spec.template.spec.container.concurrency)'. Reduce concurrency if CPU-bound. Check database connection pool: too many connections per instance. Look for health check cascading (see incident above).
Symptom · 03
Service unreachable via custom domain
Fix
Verify DNS: dig YOUR_DOMAIN CNAME should point to ghs.googlehosted.com. Check domain mapping status: gcloud beta run domain-mappings list. Ensure you own the domain and the TXT verification record is published.
Symptom · 04
Slow requests on first call (cold start)
Fix
Set --min-instances 1 or higher for latency-sensitive services. Reduce image size: use multi-stage builds and Alpine base. Enable CPU boost: gcloud run deploy --cpu-boost --startup-cpu-boost. Monitor cold start time in Cloud Monitoring.
Symptom · 05
Cannot connect to Cloud SQL or other VPC resources
Fix
Check if a VPC connector exists: gcloud compute networks vpc-access connectors list. Verify the connector is in the same region as your service. Check firewall rules: allow ingress from the connector's IP range (10.8.0.0/28) to your resource. Test connectivity from a Cloud Run job with same connector.
★ Quick Debug Cheat Sheet for Cloud RunTop 5 symptoms with immediate commands and fixes
Service fails to deploy with 'Container failed to start'
Immediate action
Check logs immediately
Commands
gcloud logging read 'resource.type="cloud_run_revision" AND resource.labels.service_name="YOUR_SERVICE" AND severity>=ERROR' --limit 5 --format='value(textPayload)'
docker run -e PORT=8080 -p 8080:8080 YOUR_IMAGE && curl http://localhost:8080/health
Fix now
Ensure your app binds to the PORT env var. Add health check endpoint that returns 200 within 10 seconds.
High 5xx error rate during traffic spikes+
Immediate action
Check concurrency and cold start settings
Commands
gcloud run services describe YOUR_SERVICE --format='value(spec.template.spec.container.concurrency)'
gcloud run services describe YOUR_SERVICE --format='value(spec.template.metadata.annotations.autoscaling\.knative\.dev/maxScale)'
Fix now
Lower concurrency to 10–20 or add --min-instances 1. Ensure health check is light.
Private resource unreachable (e.g., Cloud SQL)+
Immediate action
Verify VPC connector exists and is attached
Commands
gcloud compute networks vpc-access connectors list --region YOUR_REGION
gcloud run services describe YOUR_SERVICE --format='value(spec.template.spec.vpcAccess)'
Fix now
If missing: create a VPC connector and update service: gcloud run deploy --vpc-connector CONNECTOR_NAME
Custom domain not loading, shows 'site cannot be reached'+
Immediate action
Check DNS and mapping status
Commands
dig YOUR_DOMAIN CNAME +short
gcloud beta run domain-mappings describe --domain YOUR_DOMAIN --region YOUR_REGION
Fix now
Ensure CNAME points to ghs.googlehosted.com. Complete domain ownership verification via TXT record.
Cost spikes – unexpected billing increase+
Immediate action
Check instance count and max-instances
Commands
gcloud run services describe YOUR_SERVICE --format='value(spec.template.metadata.annotations.autoscaling\.knative\.dev/maxScale)'
gcloud logging read 'resource.type="cloud_run_revision" AND resource.labels.service_name="YOUR_SERVICE" AND resource.labels.revision_name' --limit 10 --format='value(resource.labels.revision_name)' | sort -u | wc -l
Fix now
Reduce max-instances (e.g., 10). Add budget alerts in Google Billing. Review logs for repeated 429s causing instance over-provisioning.
Cloud Run vs Cloud Functions (Gen 2)
Feature / AspectGoogle Cloud RunGoogle Cloud Functions (Gen 2)
Deployment unitAny Docker containerSource code in supported runtime
Runtime flexibilityAny language, any versionLimited to supported runtimes (Node, Python, Go, Java, etc.)
Max request timeout60 minutes60 minutes (Gen 2)
Cold start controlMin instances settingMin instances setting
Concurrency per instanceUp to 1000 simultaneous requests1 request per instance (default)
Binary/system dependenciesYes — install anything in DockerfileVery limited
Stateful background tasksNot supported (request-scoped)Not supported
Best forAPIs, web apps, microservices, ML inferenceEvent-driven triggers (Pub/Sub, Storage events)
Billing granularityPer 100ms of CPU+memory usagePer 100ms of CPU+memory usage
VPC connectivityYes — Serverless VPC Access connectorYes — Serverless VPC Access connector

Key takeaways

1
Cloud Run runs any container that listens on an HTTP port
not just specific runtimes — which makes it dramatically more flexible than traditional FaaS platforms.
2
Your app must read the port from process.env.PORT (or the equivalent in your language), not hardcode it
Cloud Run injects this at runtime and health checks will silently fail if you get it wrong.
3
Secrets belong in Google Secret Manager mounted via --update-secrets, not in --set-env-vars
the difference is encryption at rest, audit logging, and the ability to rotate secrets without redeploying.
4
Cold starts are real but controllable
use --min-instances 1 for latency-sensitive services, keep your Docker image small with multi-stage builds, and always handle SIGTERM for graceful shutdown.
5
Health check endpoints must be cheap and independent
a heavy health check that queries a database can cause cascading 503 failures during scaling events.

Common mistakes to avoid

5 patterns
×

Hardcoding the PORT number in the app

Symptom
Cloud Run health checks fail immediately after deploy with 'Container failed to start' even though the container runs fine locally
Fix
Always bind your server to parseInt(process.env.PORT) || 8080. Cloud Run injects PORT at runtime and will use whatever it decides, not necessarily 8080.
×

Using the default Compute Engine service account

Symptom
The service works, but your Cloud Run service has editor-level access to your entire GCP project, meaning a compromised container can read, write, or delete anything
Fix
Create a dedicated service account per service with gcloud iam service-accounts create, and pass it via --service-account at deploy time. Grant only the specific roles that service actually needs.
×

Not handling SIGTERM for graceful shutdown

Symptom
During scale-down or redeployment, in-flight requests are abruptly cut off, causing 502 errors for users
Fix
Listen for the SIGTERM signal and stop accepting new requests while finishing active ones. In Node.js: process.on('SIGTERM', () => { server.close(() => process.exit(0)); }); Cloud Run waits up to 10 seconds after SIGTERM before force-killing the instance.
×

Building image for wrong platform (arm64 on Apple Silicon)

Symptom
Container deploys but runs slowly and crashes randomly under load due to emulation overhead
Fix
Always build with --platform linux/amd64. Verify platform in Docker Desktop settings, or set DOCKER_DEFAULT_PLATFORM=linux/amd64 environment variable.
×

Setting concurrency too high for CPU-bound workloads

Symptom
Latency spikes as instances become overloaded; error rate increases under moderate traffic
Fix
Monitor CPU usage per instance. For CPU-bound apps, reduce concurrency to 10–20. For I/O-bound apps (most APIs), 80–250 is fine. Test with actual traffic pattern.
INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01SENIOR
Cloud Run scales to zero by default — what's the trade-off and when woul...
Q02SENIOR
How does Cloud Run's concurrency model differ from AWS Lambda, and why d...
Q03SENIOR
A Cloud Run service is failing health checks immediately after deploymen...
Q04SENIOR
Explain how you'd implement a canary deployment strategy using Cloud Run...
Q01 of 04SENIOR

Cloud Run scales to zero by default — what's the trade-off and when would you explicitly set --min-instances to 1 or higher?

ANSWER
The trade-off is cold start latency. When scaling from zero, the first request has to wait for a container to start (typically 200ms–2s). For latency-sensitive APIs (sub-500ms P95) or user-facing endpoints, set --min-instances to 1 or higher to keep a warm instance always available. For batch jobs, internal tools, or development environments, scale-to-zero is fine and saves money. Also consider using --cpu-boost and --startup-cpu-boost to speed up cold starts if you must keep min-instances low. The cost of min-instances is paying for idle CPU and memory 24/7, so calculate the trade-off: e.g., $0.09/hour per instance in us-central1.
FAQ · 5 QUESTIONS

Frequently Asked Questions

01
Does Google Cloud Run support WebSockets or long-lived connections?
02
How much does Google Cloud Run actually cost for a low-traffic API?
03
What's the difference between Cloud Run (fully managed) and Cloud Run for Anthos?
04
Can I use Cloud Run for background jobs or task queues?
05
How do I set up CI/CD for Cloud Run?
🔥

That's Cloud. Mark it forged?

6 min read · try the examples if you haven't

Previous
AWS EKS — Elastic Kubernetes Service
17 / 23 · Cloud
Next
AWS SQS and SNS