Intermediate 6 min · March 06, 2026

Google Cloud Run Basics

Cloud Run — Health Check Triggered Cascading 503 Failures

Q: Does Google Cloud Run support WebSockets or long-lived connections?

Yes, as of Cloud Run's HTTP/2 and WebSocket support update, long-lived connections like WebSockets and server-sent events are supported. You need to enable HTTP/2 end-to-end in your service configuration. Keep in mind that Cloud Run's request timeout caps at 60 minutes, so truly persistent connections (like for real-time gaming) still aren't a great fit — use a service like Firebase Realtime Database or a dedicated WebSocket server on GKE instead.

Q: How much does Google Cloud Run actually cost for a low-traffic API?

For a low-traffic API that scales to zero, costs can be near zero. Cloud Run bills per 100ms of CPU and memory usage only while handling requests, plus a small per-request charge. Google also provides a generous free tier: 2 million requests/month, 360,000 GB-seconds of memory, and 180,000 vCPU-seconds free every month. A side project or internal tool that gets a few thousand requests a day will typically cost pennies or nothing.

Q: What's the difference between Cloud Run (fully managed) and Cloud Run for Anthos?

Cloud Run fully managed runs on Google's infrastructure and requires zero cluster management — you just deploy a container and Google handles everything. Cloud Run for Anthos (now called Cloud Run on GKE) runs the same Cloud Run developer experience on top of a Kubernetes cluster you control, either on GCP or on-premises. Choose fully managed unless you have a specific reason to run on your own infrastructure, like regulatory requirements or needing to co-locate with an existing GKE workload.

Q: Can I use Cloud Run for background jobs or task queues?

Yes, but with caveats. Cloud Run can handle long-running HTTP requests up to 60 minutes. For background work that doesn't need synchronous responses, deploy a Cloud Run job (fully managed batch compute) instead of a service. Cloud Run jobs run to completion and then shut down, costing only the execution time. For message-driven tasks, use Cloud Tasks or Pub/Sub to trigger Cloud Run services. Keep in mind that Cloud Run services have a 60-minute timeout and scale down when idle, so they're not suitable for long-running daemons.

Q: How do I set up CI/CD for Cloud Run?

You can use Cloud Build, GitHub Actions, or GitLab CI. The typical flow: build your Docker image with --platform linux/amd64, push to Artifact Registry, then deploy with gcloud run deploy. For zero-downtime deployments, Cloud Build's cloudbuild.yaml can automate the entire pipeline. Use Cloud Deploy for more advanced delivery strategies like canary and rolling updates. Example GitHub Action: uses 'google-github-actions/deploy-cloudrun' with image and service name.

At 500+ RPS, a Cloud Run health check that runs a full DB query triggers cascading 503 failures.

Naren Founder & Principal Engineer

20+ years shipping production infrastructure and CI/CD at scale. Everything here is grounded in real deployments.

✓ Production

production tested

July 27, 2026

last updated

1,713

articles · all by Naren

Before you start⏱ 25 min

✓Solid grasp of DevOps fundamentals
✓Comfortable with command-line tools
✓Basic Linux administration knowledge

● Production Incident 🔎 Debug Guide ⚙ Triage Commands

⚡Quick Answer

Cloud Run runs any container that listens on HTTP and is fully managed by Google
Scales to zero when idle – you pay only for request processing time per 100ms
Key components: container image, service account, Secret Manager, VPC connector
Cold start adds 200ms–2s on first request; fix with --min-instances 1
Production break: hardcoding the port causes health check failures silently
Biggest mistake: using the default Compute Engine service account gives blanket editor access

✦ Definition~90s read

What is Google Cloud Run Basics?

Cloud Run is Google Cloud's fully managed serverless container platform, designed to run stateless HTTP-driven workloads with automatic scaling down to zero. Unlike traditional container orchestrators like Kubernetes, Cloud Run abstracts away infrastructure management entirely—you provide a container image, and it handles routing, scaling, and availability.

★

The key architectural insight is that Cloud Run uses a request-driven model: each container instance processes one request at a time by default (though concurrency can be configured), and the platform scales new instances based on incoming traffic. This model is what makes health checks tricky—if a health check fails, Cloud Run stops routing traffic to that instance, but the failure can cascade when the health check endpoint itself depends on backend services that are also under load, creating a feedback loop of 503s.

Health checks in Cloud Run are not optional; they are the platform's mechanism for determining instance readiness and liveness. When a health check fails, the instance is immediately removed from the serving pool, and new requests are routed to other instances—or trigger cold starts if none exist.

The cascade problem emerges when your health check endpoint performs expensive operations (like database queries or external API calls) that can time out under load, causing all instances to fail health checks simultaneously. This is distinct from traditional health checks in load balancers, where a single failed check might just reduce traffic weight; Cloud Run's binary health check model means one failure can take down your entire service if not carefully designed.

To avoid cascading 503s, you must build health checks that are lightweight and independent of your application's critical path—think a simple in-memory status check rather than a full dependency probe. Cloud Run's request-driven model also means you need to handle startup latency gracefully, using startup probes and configuring containerConcurrency appropriately.

For production, you'll wire up secrets via Secret Manager, use service accounts with minimal permissions, and set up observability with Cloud Monitoring and logging. Advanced features like VPC connectors for private network access, custom domains with SSL, and traffic splitting for canary deployments give you control without sacrificing the serverless benefits.

The bottom line: Cloud Run is powerful for stateless APIs and microservices, but its health check behavior demands a different mindset than traditional container platforms—treat health checks as a separate concern, not a proxy for application health.

Plain-English First

Imagine you run a lemonade stand, but instead of renting a shop 24/7, you only pay for the exact minutes customers are buying lemonade — the stand appears instantly when someone walks up and vanishes when they leave. Google Cloud Run works exactly like that for your code: you package your app into a container, hand it to Google, and they handle spinning it up when requests arrive and shutting it down when things go quiet. You never touch a server, never patch an OS, and never pay for idle time.

Every developer eventually hits the same wall: the app works perfectly on your laptop, but getting it into production means provisioning servers, configuring load balancers, managing autoscaling groups, and babysitting infrastructure at 2am. For teams that just want their code to run reliably at scale, that overhead is expensive, slow, and frankly soul-crushing. Cloud Run was built to eliminate exactly that gap between 'it works on my machine' and 'it's live in production'.

Cloud Run is Google's fully managed serverless container platform. Unlike AWS Lambda, which forces you into specific runtimes and tiny deployment packages, Cloud Run runs any container that listens on a port and responds to HTTP. That one constraint — your app must be stateless and HTTP-driven — unlocks everything else: automatic scaling from zero to thousands of concurrent requests, per-100ms billing, global deployment, and zero server management. It bridges the gap between rigid Function-as-a-Service platforms and the full complexity of Kubernetes.

By the end of this article you'll understand exactly how Cloud Run's request-driven execution model works, how to containerize a real Node.js API and deploy it with a single command, how to wire up environment variables and secrets the right way, and how to avoid the three mistakes that burn developers on their first production deployment. You'll also walk away knowing how to answer the Cloud Run questions that actually come up in DevOps and platform engineering interviews.

Why Cloud Run Health Checks Can Cascade Into 503s

Cloud Run is a managed compute platform that runs stateless containers in a serverless model, automatically scaling from zero based on HTTP traffic. The core mechanic: each revision is deployed as a set of instances, and health checks (startup, liveness, readiness) gate traffic routing. A failing readiness probe removes an instance from the load balancer's rotation, but if all instances fail, the service returns 503.

In practice, Cloud Run health checks are HTTP GET requests to a configurable path (default /). Startup probes run once per instance; liveness probes restart the container if they fail; readiness probes control traffic flow. The critical property: readiness probes are checked before routing — if an instance fails readiness, it's drained, but new instances may be created to replace it. However, if the failure is systemic (e.g., a database connection pool exhaustion), all instances fail readiness simultaneously, causing a full outage.

Use Cloud Run when you need auto-scaling stateless HTTP services with minimal ops overhead — ideal for APIs, web backends, and event-driven workers. The health check system matters because it's the only mechanism for graceful traffic management; misconfigured probes (too strict, too slow, or dependent on external services) are the leading cause of cascading 503 failures in production.

⚠ Readiness Probe Dependency Trap

Never make readiness probes depend on external services (DB, cache) — a downstream outage will kill your entire Cloud Run service, not just the dependent instance.

📊 Production Insight

A team set readiness probe timeout to 1s but their app startup took 3s — every new instance failed readiness, got terminated, and the service never became healthy.

Symptom: continuous 503s with zero healthy instances in Cloud Run dashboard, even though the container image was correct.

Rule: readiness probe timeout must exceed the 99th percentile of your app's startup time, and never probe external dependencies.

🎯 Key Takeaway

Readiness probes must be fast, local, and independent of downstream services.

Startup probes gate traffic — set them generously (60s+) to avoid premature termination.

A single misconfigured probe can take down an entire service; test probe behavior under load before deploying.

thecodeforge.io

Google Cloud Run Basics

How Cloud Run's Request-Driven Model Actually Works

Before writing a single line of code, you need to understand Cloud Run's mental model — because it changes how you architect your app.

When no requests are hitting your service, Cloud Run scales it to zero. There are literally no running instances. The moment a request arrives, Cloud Run starts a container instance, routes the request to it, and keeps that instance alive to handle more requests for a short idle period. If traffic spikes, it starts more instances in parallel. This is called request-driven scaling, and it's the core reason Cloud Run is cheap for low-traffic services and effortless to scale for high-traffic ones.

The critical implication: your container must be stateless. Don't store session data in memory between requests, don't write to local disk expecting it to persist, and don't open background threads that do work outside of a request lifecycle. Any state must live in an external system — Cloud SQL, Firestore, Redis, or Cloud Storage.

Cold starts are the one real trade-off. When Cloud Run starts a fresh instance, there's a brief delay (typically 200ms–2s depending on your image size and runtime) before it can serve traffic. For latency-sensitive APIs, you can set a minimum instance count of 1 to keep a warm instance always running — at the cost of paying for that idle time. For batch jobs or internal tools, cold starts usually don't matter at all.

cloud_run_deploy.shBASH

#!/bin/bash
# ─────────────────────────────────────────────────────────────
# GOAL: Build a Docker image, push it to Artifact Registry,
#       and deploy it as a Cloud Run service in one script.
#
# PREREQUISITES:
#   - gcloud CLI installed and authenticated
#   - Docker installed and running
#   - A GCP project with billing enabled
# ─────────────────────────────────────────────────────────────

# Replace these with your actual values
GCP_PROJECT_ID="my-production-project"
GCP_REGION="us-central1"
SERVICE_NAME="product-api"
IMAGE_NAME="us-central1-docker.pkg.dev/${GCP_PROJECT_ID}/cloud-run-services/${SERVICE_NAME}"

# Step 1: Authenticate Docker to push to Artifact Registry
# gcloud configures Docker credentials automatically — no manual login needed
gcloud auth configure-docker us-central1-docker.pkg.dev --quiet

# Step 2: Build the Docker image and tag it for Artifact Registry
# The tag format must match your Artifact Registry repository path exactly
docker build \
  --tag "${IMAGE_NAME}:latest" \
  --platform linux/amd64 \
  . # <── build context is the current directory (where Dockerfile lives)

# Step 3: Push the image to Artifact Registry
# Cloud Run pulls from here at deploy time — it never touches Docker Hub by default
docker push "${IMAGE_NAME}:latest"

# Step 4: Deploy to Cloud Run
gcloud run deploy "${SERVICE_NAME}" \
  --image "${IMAGE_NAME}:latest" \
  --platform managed \
  --region "${GCP_REGION}" \
  --allow-unauthenticated \
  --port 8080 \
  --memory 512Mi \
  --cpu 1 \
  --min-instances 0 \
  --max-instances 100 \
  --concurrency 80
  # --allow-unauthenticated: makes the service publicly accessible
  # --concurrency 80: each instance handles up to 80 simultaneous requests
  # --min-instances 0: scale to zero when idle (cheapest option)
  # --max-instances 100: hard cap to prevent runaway billing

Output

Configuring Docker credentials... Done.

Building Docker image for linux/amd64...

Step 1/8 : FROM node:20-alpine

Step 2/8 : WORKDIR /app

...

Successfully built a3f8c91d2b44

Successfully tagged us-central1-docker.pkg.dev/my-production-project/cloud-run-services/product-api:latest

Pushing image to Artifact Registry...

latest: digest: sha256:7d3e... size: 1847

Deploying container to Cloud Run service [product-api] in [us-central1]

OK Deploying new service... Done.

OK Creating Revision... Revision product-api-00001-abc is active and serving 100% of traffic.

Service URL: https://product-api-abc123-uc.a.run.app

⚠ Watch Out: Platform Mismatch

Always build with --platform linux/amd64. If you're on an Apple Silicon Mac and skip this flag, your image builds for arm64, deploys fine, but Cloud Run silently runs it under emulation — causing random slowness and occasional crashes that are nearly impossible to debug.

📊 Production Insight

A production API that stores session in memory will lose all user sessions on every scale-down.

Solution: use Firestore or Redis for session state.

Cold start latency of 500ms caused a checkout timeout for a retail client — we added --min-instances 2 and cut P95 latency by 80%.

🎯 Key Takeaway

Cloud Run is stateless by design.

Store external state, not local memory.

Cold starts are real — test for your use case before blaming the platform.

Building a Real Containerized API That Cloud Run Will Love

Cloud Run's only requirement is that your container listens on the port defined by the PORT environment variable. That's it. Cloud Run injects PORT at runtime — you don't hardcode it. This one detail trips up a lot of developers who hardcode 3000 or 8080 in their app and then wonder why health checks fail.

Here's a real-world pattern: a lightweight Node.js product API. Notice how the app reads PORT from the environment, handles a health check endpoint (which Cloud Run hits to confirm your container started successfully), and does clean shutdown on SIGTERM (Cloud Run sends SIGTERM before killing an instance during scale-down, giving you a chance to finish in-flight requests).

The Dockerfile matters as much as the code. Keep your image small — every extra MB adds cold start latency. Use multi-stage builds to separate build dependencies from the runtime image, use Alpine-based base images, and always run as a non-root user. Cloud Run doesn't require root, and running as root is a security smell that will get flagged in any serious security audit.

DockerfileDOCKERFILE

# ── Stage 1: Install dependencies ──────────────────────────────
# We use a full Node image here because we need npm to install packages
FROM node:20-alpine AS dependency-installer
WORKDIR /build

# Copy package files first — Docker layer caching means if these
# haven't changed, npm install is skipped entirely on rebuild
COPY package.json package-lock.json ./
RUN npm ci --omit=dev
# npm ci is stricter than npm install — it respects package-lock.json exactly
# --omit=dev drops devDependencies so they don't bloat the final image

# ── Stage 2: Runtime image ──────────────────────────────────────
# Start fresh from a minimal Alpine image — no build tools, no npm cache
FROM node:20-alpine AS runtime
WORKDIR /app

# Create a non-root user — Cloud Run doesn't need root and neither does your app
RUN addgroup --system api-group && adduser --system --ingroup api-group api-user

# Copy only what we need: the installed node_modules and the app source
COPY --from=dependency-installer /build/node_modules ./node_modules
COPY src/ ./src/
COPY package.json ./

# Switch to non-root user before the app starts
USER api-user

# Cloud Run injects PORT at runtime — we expose 8080 as a documentation hint
# but the app itself must read process.env.PORT, not hardcode this number
EXPOSE 8080

CMD ["node", "src/server.js"]

Output

# When you run: docker build --platform linux/amd64 -t product-api .

[+] Building 14.3s (12/12) FINISHED

=> [dependency-installer 1/4] FROM node:20-alpine 3.1s

=> [dependency-installer 3/4] COPY package.json ... 0.1s

=> [dependency-installer 4/4] RUN npm ci --omit=dev 8.4s

=> [runtime 3/5] COPY --from=dependency-installer ... 0.2s

=> [runtime 4/5] COPY src/ ./src/ 0.1s

=> exporting to image 0.4s

=> naming to docker.io/library/product-api:latest 0.0s

# Final image size: 98MB (vs ~1.1GB if you used a non-Alpine full Node image)

💡Pro Tip: Read PORT from the Environment

In your Node.js app, always start your server with: const port = parseInt(process.env.PORT) || 8080; — Cloud Run injects PORT automatically. If you hardcode 3000 or 8080, your app will usually still work, but only by coincidence. The day Cloud Run changes the injected port, you'll have a silent production failure.

📊 Production Insight

A team pushed a 1.2GB image with full dev tools — cold starts took 8 seconds.

Multi-stage builds cut it to 98MB, cold starts dropped to 400ms.

Use docker image ls to spot bloat. Always run as non-root — Cloud Run audits require it.

🎯 Key Takeaway

The PORT env var is not optional — your app must read it.

Multi-stage builds cut cold starts by 90%.

SIGTERM handling prevents 502 errors during scale-down.

thecodeforge.io

Google Cloud Run Basics

Wiring Up Secrets, Environment Variables, and Service Accounts Correctly

This is where most tutorials stop, and where real production deployments begin. Your app almost certainly needs secrets — database passwords, API keys, JWT signing keys. Hardcoding them into environment variables in your Cloud Run service definition means they show up in plaintext in your deployment history and in anyone's gcloud run describe output. That's a compliance and security problem.

The right pattern is to store secrets in Google Secret Manager and grant your Cloud Run service's service account permission to read them. Cloud Run can then mount secrets as environment variables or as files at startup — they're injected at runtime, never baked into the image or visible in the service config.

Service accounts are equally important. By default, Cloud Run uses the Compute Engine default service account, which has editor-level access to your entire project. That's far too permissive. Create a dedicated service account for each Cloud Run service, grant it only the specific IAM roles it needs (like roles/secretmanager.secretAccessor and roles/cloudsql.client), and attach it at deploy time. This is the principle of least privilege, and it's not optional in production.

setup_secrets_and_iam.shBASH

#!/bin/bash
# ─────────────────────────────────────────────────────────────
# GOAL: Create a dedicated service account for the product-api,
#       store secrets in Secret Manager, and deploy Cloud Run
#       with proper IAM bindings — no plaintext secrets anywhere.
# ─────────────────────────────────────────────────────────────

GCP_PROJECT_ID="my-production-project"
GCP_REGION="us-central1"
SERVICE_NAME="product-api"
SERVICE_ACCOUNT_NAME="product-api-runner"
SERVICE_ACCOUNT_EMAIL="${SERVICE_ACCOUNT_NAME}@${GCP_PROJECT_ID}.iam.gserviceaccount.com"
IMAGE_NAME="us-central1-docker.pkg.dev/${GCP_PROJECT_ID}/cloud-run-services/${SERVICE_NAME}:latest"

# Step 1: Create a dedicated service account for this Cloud Run service
# Never use the default Compute SA — it has far too many permissions
gcloud iam service-accounts create "${SERVICE_ACCOUNT_NAME}" \
  --display-name "Service Account for product-api Cloud Run service" \
  --project "${GCP_PROJECT_ID}"

# Step 2: Store the database password in Secret Manager
# Read the secret from stdin so it never appears in your shell history
echo -n "super-secret-db-password" | \
  gcloud secrets create DB_PASSWORD \
    --data-file=- \
    --replication-policy="automatic" \
    --project "${GCP_PROJECT_ID}"
# The -n flag on echo prevents a trailing newline — important for passwords!

# Step 3: Grant the service account permission to READ this specific secret
# Note: secretAccessor only allows reading — not creating or deleting secrets
gcloud secrets add-iam-policy-binding DB_PASSWORD \
  --member="serviceAccount:${SERVICE_ACCOUNT_EMAIL}" \
  --role="roles/secretmanager.secretAccessor" \
  --project "${GCP_PROJECT_ID}"

# Step 4: Deploy Cloud Run with the dedicated service account
# and mount the secret as an environment variable at runtime
gcloud run deploy "${SERVICE_NAME}" \
  --image "${IMAGE_NAME}" \
  --platform managed \
  --region "${GCP_REGION}" \
  --service-account "${SERVICE_ACCOUNT_EMAIL}" \
  --update-secrets="DB_PASSWORD=DB_PASSWORD:latest" \
  # ↑ Format: ENV_VAR_NAME=SECRET_NAME:VERSION
  # Cloud Run fetches the secret at startup and injects it as an env var
  # Your app reads it with: process.env.DB_PASSWORD — same as any env var
  --set-env-vars="NODE_ENV=production,DB_HOST=10.0.0.5,DB_NAME=products_db" \
  --allow-unauthenticated

# Step 5: Verify the deployment and check the service account is correct
gcloud run services describe "${SERVICE_NAME}" \
  --region "${GCP_REGION}" \
  --format="value(spec.template.spec.serviceAccountName)"

Output

Created service account [product-api-runner].

Created version [1] of the secret [DB_PASSWORD].

Updated IAM policy for secret [DB_PASSWORD].

bindings:

- members:

- serviceAccount:product-api-runner@my-production-project.iam.gserviceaccount.com

role: roles/secretmanager.secretAccessor

Deploying container to Cloud Run service [product-api]...

OK Deploying... Done.

OK Creating Revision... Revision product-api-00002-xyz is active.

Service URL: https://product-api-abc123-uc.a.run.app

# Output of the describe command:

product-api-runner@my-production-project.iam.gserviceaccount.com

🔥Interview Gold: Secrets vs Environment Variables

Interviewers love asking how you'd handle secrets in Cloud Run. The wrong answer is --set-env-vars=DB_PASSWORD=mypassword. The right answer is Secret Manager + --update-secrets, because secrets are encrypted at rest, access is audited in Cloud Audit Logs, and rotating a secret doesn't require a redeployment — just a new secret version.

📊 Production Insight

We once audited a service that used the default Compute SA — it could delete Cloud SQL instances. One compromised container could have taken down the entire database.

Least privilege: create one SA per service.

Use gcloud iam service-accounts list to review.

🎯 Key Takeaway

Never use the default service account.

Secrets go in Secret Manager — never in env vars.

One service account per service: hard boundary against blast radius.

Deploying and Monitoring: From First Deploy to Production Observability

Deploying your container is just the beginning. Once it's live, you need to monitor health, latency, and errors. Cloud Run integrates directly with Cloud Monitoring and Logging, but you need to know what to look for.

First, set up a health check endpoint that validates your app's critical dependencies — database connectivity, cache status, external API reachability. Cloud Run uses this for startup probes and eventually for traffic routing. A failing health check means the revision won't receive traffic.

Second, enable Cloud Logging and set log-based alerts. The most common production issues are 5xx errors, latency spikes, and concurrency limit hits. Cloud Run logs every request with status code, latency, and instance id. You can create metrics from these logs.

Third, understand concurrency. By default, each instance handles up to 80 concurrent requests. If your app is I/O-bound (calling a database or external API), increase concurrency to 250+. If it's CPU-bound, lower it. Monitor the container instance count metric — if it's constantly maxing out, reduce concurrency or increase max-instances.

Finally, set up notifications for revision failures and cost anomalies. Cloud Run bills per 100ms, so a runaway instance can cause unexpected bills. Set a budget alert in Google Cloud Billing.

monitoring_setup.shBASH

#!/bin/bash
# ─────────────────────────────────────────────────────────────
# GOAL: Create a log-based metric for 5xx errors, set up a
#       Cloud Monitoring alert, and view recent logs.
# ─────────────────────────────────────────────────────────────

# Create a log-based counter metric for HTTP 5xx responses
gcloud logging metrics create product-api-5xx \
  --description "Count of 5xx responses for product-api" \
  --log-filter='resource.type="cloud_run_revision" AND resource.labels.service_name="product-api" AND http_request.status >= 500'

# Create an alert policy (simplified — actually done via Monitoring UI or Terraform)
# This command just outlines the concept
# gcloud alpha monitoring policies create ...

# View recent 5xx logs directly
gcloud logging read 'resource.type="cloud_run_revision" AND resource.labels.service_name="product-api" AND http_request.status >= 500' \
  --limit 10 \
  --format="table(timestamp, http_request.status, http_request.latency)"

Output

Created metric [product-api-5xx].

# Sample log output

2025-12-01T12:34:56Z 500 1.234s

2025-12-01T12:35:10Z 502 2.001s

2025-12-01T12:35:15Z 500 0.890s

💡Pro Tip: Set CPU Throttling

By default, Cloud Run throttles CPU when an instance is not handling a request. For background tasks or health checks, disable CPU throttling with --no-cpu-throttling. This ensures your health check endpoint always has CPU available.

📊 Production Insight

We missed a Log-based alert for 5xx errors. A database migration caused 10% error rate for 4 hours before we noticed. The fix was a 20-line Cloud Monitoring alert that now fires within 60 seconds of elevated error rates. Set alerts on day one — not after the incident.

🎯 Key Takeaway

Health checks must validate dependencies.

Log-based metrics catch silent failures.

Budgets prevent billing surprises from runaway scaling.

Advanced: VPC Connectors, Custom Domains, and Traffic Splitting

Once you're comfortable with basic deployments, you'll hit the advanced scenarios: connecting to private resources (like Cloud SQL with private IP), using a custom domain with SSL, and rolling out canary revisions.

VPC Connector: To reach resources inside your VPC (e.g., a private Cloud SQL instance), you need a Serverless VPC Access connector. Create it in your VPC, then attach it to your Cloud Run service with --vpc-connector. Without it, your container can access the internet but not your private resources. This is a common cause of 'connection refused' that's hard to debug.

Custom domains: By default, you get a .run.app URL. For production, map a custom domain using gcloud beta run domain-mappings create. Cloud Run auto-provisions an SSL certificate via Google-managed certificates. The gotcha: DNS propagation can take 10–30 minutes, and the domain must be verified (you need access to manage DNS records).

Traffic splitting: You can send a percentage of traffic to a specific revision. This powers canary deployments and A/B testing. Use gcloud run services update-traffic to split e.g., 95% to stable, 5% to new-revision. Rollback is instant: set 100% back to the old revision.

All three features require the managed platform (not Cloud Run on GKE).

advanced_setup.shBASH

#!/bin/bash
# ─────────────────────────────────────────────────────────────
# GOAL: Set up VPC connector, map custom domain, split traffic.
# ─────────────────────────────────────────────────────────────

VPC_CONNECTOR_NAME="my-connector"
VPC_NETWORK="default"
REGION="us-central1"

# Step 1: Create a Serverless VPC Access connector (requires VPC Access API)
# Note: this takes about 2-3 minutes to provision
gcloud compute networks vpc-access connectors create ${VPC_CONNECTOR_NAME} \
  --region ${REGION} \
  --network ${VPC_NETWORK} \
  --range 10.8.0.0/28

# Step 2: Deploy with VPC connector (and no egress settings)
gcloud run deploy product-api \
  --vpc-connector ${VPC_CONNECTOR_NAME} \
  --vpc-egress=private-ranges-only  # only route RFC1918 traffic through connector

# Step 3: Map custom domain
# First verify ownership: create a TXT record as instructed by the command
# gcloud beta run domain-mappings create --service product-api --domain api.mycompany.com

# Step 4: Traffic splitting – 5% canary
gcloud run services update-traffic product-api \
  --to-revisions=product-api-00001-wib=95,product-api-00002-xyz=5

# Rollback: send all traffic back to the old revision
gcloud run services update-traffic product-api \
  --to-revisions=product-api-00001-wib=100

Output

Created VPC Access Connector [my-connector].

OK Deploying revision... Done.

Service URL: https://product-api-abc123-uc.a.run.app

# Domain mapping output (truncated)

Domain: api.mycompany.com

Status: PENDING_VERIFICATION

Please configure your DNS by adding a TXT record with the following value...

# Traffic split output

Current traffic allocation:

product-api-00001-wib: 95%

product-api-00002-xyz: 5%

OK, traffic updated.

🔥Traffic Splitting Gotcha

Traffic splitting works per revision, not per tag. If you deploy a new revision, it starts at 0% traffic. You need to explicitly shift traffic. Use --no-traffic flag on deploy to create a revision without routing traffic to it.

📊 Production Insight

A team tried to reach Cloud SQL via its private IP but forgot the VPC connector. The app timed out on every database call. After an hour of debugging, they added the connector and it worked instantly. Also: custom domain mapping failed because the TXT record was typo'd — always double-check DNS values.

🎯 Key Takeaway

VPC connector is mandatory for private resource access.

Custom domains need DNS verification – allow 30 min propagation.

Traffic splitting lets you canary with zero-risk rollback.

Why You Should Care About Cloud Run's Auto-Scaling Limits (Before It Costs You)

Most devs think serverless means infinite scale. Cloud Run will scale to 1,000 instances by default—until your database connection pool explodes. That's not a theory; it's a production incident waiting to happen. Cloud Run scales from zero based on concurrent requests, but each instance can handle multiple requests in parallel if you configure concurrency properly. The real trap: no one sets max instances. Your service hits 10 concurrent users, spins up 10 containers, each opens its own database pool. At 100 concurrent users, you've got 100 connection pools fighting for database resources. Set max instances to a number that your downstream dependencies can handle. Start with 10. Monitor. Adjust. Your database will thank you. CPU and memory allocation matter here too—if you under-provision, Cloud Run spins up more instances to compensate. Over-provisioning wastes money. Find the sweet spot with load testing before you go live.

cloudrun-service.yamlYAML

// io.thecodeforge
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  name: order-api
spec:
  template:
    spec:
      containerConcurrency: 80
      containers:
      - image: gcr.io/project-123/order-api:v2
        resources:
          limits:
            cpu: "1"
            memory: "512Mi"
      maxInstances: 10
      minInstances: 0

Output

$ gcloud run services replace cloudrun-service.yaml

Deploying... Done.

Service [order-api] revision [order-api-00005-xyz] has been deployed.

⚠ Production Trap:

Default max instances is 1000. Your database vendor sent a thank-you note when you hit 200 connections. Set maxInstances to something sane immediately.

🎯 Key Takeaway

Always cap max instances to match your downstream capacity, not your ego.

Cold Starts: The Silent Latency Killer (And How to Banish It)

Cloud Run scales to zero by default. That's free money when idle, but the first request after a period of inactivity triggers a cold start—Cloud Run needs to pull your container image, boot the runtime, and start your app. That can take 5-10 seconds. Users don't wait that long. They refresh. Your error logs fill up with 5xx errors from the retry spam. The fix isn't complicated: set minInstances to 1 for production services. Cloud Run keeps one instance warm, always ready to serve. Cost difference? Pennies per day for a baseline instance vs. losing users at scale. For services that must respond instantly, pair minInstances with CPU always-on (set cpu-boost to true). That keeps the runtime warm and avoids the JVM cold-start tax. If you're running Java or Node.js with heavy dependencies, preload your modules during container startup. Validate with a readiness probe. Don't let cold starts decide your service's fate.

main.goGO

// io.thecodeforge
package main

import (
	"log"
	"net/http"
	"time"
)

func main() {
	// Pre-warm database connection and template cache
	db := initDatabase()
	cache := initTemplateCache()
	defer db.Close()

	http.HandleFunc("/api/orders", func(w http.ResponseWriter, r *http.Request) {
		start := time.Now()
		orders, err := fetchOrders(db, cache, r)
		if err != nil {
			http.Error(w, err.Error(), http.StatusInternalServerError)
			return
		}
		w.Header().Set("X-Response-Time", time.Since(start).String())
		w.Write(orders)
	})
	log.Fatal(http.ListenAndServe(":8080", nil))
}

Output

$ gcloud run deploy order-api --image gcr.io/project-123/order-api:v2 \

--min-instances=1 \

--cpu-boost \

--concurrency=80

Deploying container to Cloud Run service [order-api]... OK

⚠ Production Trap:

Cold starts don't just affect speed—they cause cascading failures during traffic spikes. If your service scales from zero when traffic surges, every request incurs latency. Set minInstances > 0 for any service serving user-facing traffic.

🎯 Key Takeaway

For production, always set minInstances to 1 and enable CPU boost. Your users won't wait 10 seconds for a cold start.

● Production incidentPOST-MORTEMseverity: high

The Silent 503 Spike: How a Missing Health Check Took Down a Cloud Run Service

Symptom

Service returned 503 errors during peak traffic (500+ RPS). Errors vanished after scaling down instance count. No database or infrastructure alerts.

Assumption

Team assumed the 503s were due to database connection pool exhaustion or Cloud SQL CPU limits. They scaled up connection pools and added more CPU — no improvement.

Root cause

The health check endpoint was performing a full database query on every call. During high concurrency, many instances were started and each one immediately ran a health check. The database was overwhelmed not by application traffic but by health check queries, causing new instances to fail their health checks and be killed before serving traffic. Cloud Run then started even more instances, creating a cascading failure.

Fix

Changed the health check endpoint to a lightweight 'alive' check that only confirms the process is running (e.g., return 200 without touching the database). Moved database connectivity checks to a separate /ready endpoint with a longer interval. Set --startup-cpu-boost to give health checks extra CPU during cold start.

Key lesson

Health check endpoints must be cheap and independent of backend dependencies.
Separate liveness (is process running?) from readiness (can we serve traffic?).
Use Cloud Logging to correlate 503 spikes with health check latency.
Add startup-cpu-boost for cold-start heavy operations.

Production debug guideSymptom → Action guide for production issues5 entries

Symptom · 01

Container fails to start – health check fails immediately

→

Fix

Check logs: gcloud logging read 'resource.labels.service_name="YOUR_SERVICE" AND severity>=ERROR' --limit 10. Verify PORT env var is read correctly in your app. Run the container locally with docker run -e PORT=8080 -p 8080:8080 your-image and test the health endpoint.

Symptom · 02

Intermittent 502/503 errors under load

→

Fix

Check concurrency: gcloud run services describe YOUR_SERVICE --format='value(spec.template.spec.container.concurrency)'. Reduce concurrency if CPU-bound. Check database connection pool: too many connections per instance. Look for health check cascading (see incident above).

Symptom · 03

Service unreachable via custom domain

→

Fix

Verify DNS: dig YOUR_DOMAIN CNAME should point to ghs.googlehosted.com. Check domain mapping status: gcloud beta run domain-mappings list. Ensure you own the domain and the TXT verification record is published.

Symptom · 04

Slow requests on first call (cold start)

→

Fix

Set --min-instances 1 or higher for latency-sensitive services. Reduce image size: use multi-stage builds and Alpine base. Enable CPU boost: gcloud run deploy --cpu-boost --startup-cpu-boost. Monitor cold start time in Cloud Monitoring.

Symptom · 05

Cannot connect to Cloud SQL or other VPC resources

→

Fix

Check if a VPC connector exists: gcloud compute networks vpc-access connectors list. Verify the connector is in the same region as your service. Check firewall rules: allow ingress from the connector's IP range (10.8.0.0/28) to your resource. Test connectivity from a Cloud Run job with same connector.

★ Quick Debug Cheat Sheet for Cloud RunTop 5 symptoms with immediate commands and fixes

Service fails to deploy with 'Container failed to start'−

Immediate action

Check logs immediately

Commands

gcloud logging read 'resource.type="cloud_run_revision" AND resource.labels.service_name="YOUR_SERVICE" AND severity>=ERROR' --limit 5 --format='value(textPayload)'

docker run -e PORT=8080 -p 8080:8080 YOUR_IMAGE && curl http://localhost:8080/health

Fix now

Ensure your app binds to the PORT env var. Add health check endpoint that returns 200 within 10 seconds.

High 5xx error rate during traffic spikes+

Private resource unreachable (e.g., Cloud SQL)+

Custom domain not loading, shows 'site cannot be reached'+

Cost spikes – unexpected billing increase+

Cloud Run vs Cloud Functions (Gen 2)

Feature / Aspect	Google Cloud Run	Google Cloud Functions (Gen 2)
Deployment unit	Any Docker container	Source code in supported runtime
Runtime flexibility	Any language, any version	Limited to supported runtimes (Node, Python, Go, Java, etc.)
Max request timeout	60 minutes	60 minutes (Gen 2)
Cold start control	Min instances setting	Min instances setting
Concurrency per instance	Up to 1000 simultaneous requests	1 request per instance (default)
Binary/system dependencies	Yes — install anything in Dockerfile	Very limited
Stateful background tasks	Not supported (request-scoped)	Not supported
Best for	APIs, web apps, microservices, ML inference	Event-driven triggers (Pub/Sub, Storage events)
Billing granularity	Per 100ms of CPU+memory usage	Per 100ms of CPU+memory usage
VPC connectivity	Yes — Serverless VPC Access connector	Yes — Serverless VPC Access connector

⚙ Quick Reference

7 commands from this guide

File	Command / Code	Purpose
cloud_run_deploy.sh	GCP_PROJECT_ID="my-production-project"	How Cloud Run's Request-Driven Model Actually Works
Dockerfile	FROM node:20-alpine AS dependency-installer	Building a Real Containerized API That Cloud Run Will Love
setup_secrets_and_iam.sh	GCP_PROJECT_ID="my-production-project"	Wiring Up Secrets, Environment Variables, and Service Accoun
monitoring_setup.sh	gcloud logging metrics create product-api-5xx \	Deploying and Monitoring
advanced_setup.sh	VPC_CONNECTOR_NAME="my-connector"	Advanced
cloudrun-service.yaml	apiVersion: serving.knative.dev/v1	Why You Should Care About Cloud Run's Auto-Scaling Limits (B
main.go	"log"	Cold Starts

Key takeaways

Cloud Run runs any container that listens on an HTTP port

not just specific runtimes — which makes it dramatically more flexible than traditional FaaS platforms.

Your app must read the port from process.env.PORT (or the equivalent in your language), not hardcode it

Cloud Run injects this at runtime and health checks will silently fail if you get it wrong.

Secrets belong in Google Secret Manager mounted via --update-secrets, not in --set-env-vars

the difference is encryption at rest, audit logging, and the ability to rotate secrets without redeploying.

Cold starts are real but controllable

use --min-instances 1 for latency-sensitive services, keep your Docker image small with multi-stage builds, and always handle SIGTERM for graceful shutdown.

Health check endpoints must be cheap and independent

a heavy health check that queries a database can cause cascading 503 failures during scaling events.

Common mistakes to avoid

5 patterns

Hardcoding the PORT number in the app

Symptom

Cloud Run health checks fail immediately after deploy with 'Container failed to start' even though the container runs fine locally

Fix

Always bind your server to parseInt(process.env.PORT) || 8080. Cloud Run injects PORT at runtime and will use whatever it decides, not necessarily 8080.

Using the default Compute Engine service account

Symptom

The service works, but your Cloud Run service has editor-level access to your entire GCP project, meaning a compromised container can read, write, or delete anything

Fix

Create a dedicated service account per service with gcloud iam service-accounts create, and pass it via --service-account at deploy time. Grant only the specific roles that service actually needs.

Not handling SIGTERM for graceful shutdown

Symptom

During scale-down or redeployment, in-flight requests are abruptly cut off, causing 502 errors for users

Fix

Listen for the SIGTERM signal and stop accepting new requests while finishing active ones. In Node.js: process.on('SIGTERM', () => { server.close(() => process.exit(0)); }); Cloud Run waits up to 10 seconds after SIGTERM before force-killing the instance.

Building image for wrong platform (arm64 on Apple Silicon)

Symptom

Container deploys but runs slowly and crashes randomly under load due to emulation overhead

Fix

Always build with --platform linux/amd64. Verify platform in Docker Desktop settings, or set DOCKER_DEFAULT_PLATFORM=linux/amd64 environment variable.

Setting concurrency too high for CPU-bound workloads

Symptom

Latency spikes as instances become overloaded; error rate increases under moderate traffic

Fix

Monitor CPU usage per instance. For CPU-bound apps, reduce concurrency to 10–20. For I/O-bound apps (most APIs), 80–250 is fine. Test with actual traffic pattern.

INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01SENIOR

Cloud Run scales to zero by default — what's the trade-off and when woul...

Q02SENIOR

How does Cloud Run's concurrency model differ from AWS Lambda, and why d...

Q03SENIOR

A Cloud Run service is failing health checks immediately after deploymen...

Q04SENIOR

Explain how you'd implement a canary deployment strategy using Cloud Run...

Q01 of 04SENIOR

Cloud Run scales to zero by default — what's the trade-off and when would you explicitly set --min-instances to 1 or higher?

ANSWER

The trade-off is cold start latency. When scaling from zero, the first request has to wait for a container to start (typically 200ms–2s). For latency-sensitive APIs (sub-500ms P95) or user-facing endpoints, set --min-instances to 1 or higher to keep a warm instance always available. For batch jobs, internal tools, or development environments, scale-to-zero is fine and saves money. Also consider using --cpu-boost and --startup-cpu-boost to speed up cold starts if you must keep min-instances low. The cost of min-instances is paying for idle CPU and memory 24/7, so calculate the trade-off: e.g., $0.09/hour per instance in us-central1.

FAQ · 5 QUESTIONS

Frequently Asked Questions

Does Google Cloud Run support WebSockets or long-lived connections?

How much does Google Cloud Run actually cost for a low-traffic API?

What's the difference between Cloud Run (fully managed) and Cloud Run for Anthos?

Can I use Cloud Run for background jobs or task queues?

How do I set up CI/CD for Cloud Run?

Naren Founder & Principal Engineer

20+ years shipping production infrastructure and CI/CD at scale. Everything here is grounded in real deployments.

✓ Verified

production tested

July 27, 2026

last updated

1,713

articles · all by Naren

🔥

That's Cloud. Mark it forged?

6 min read · try the examples if you haven't