Home DevOps Google Cloud Run Explained: Deploy Containers Without Managing Servers

Google Cloud Run Explained: Deploy Containers Without Managing Servers

In Plain English 🔥
Imagine you run a lemonade stand, but instead of renting a shop 24/7, you only pay for the exact minutes customers are buying lemonade — the stand appears instantly when someone walks up and vanishes when they leave. Google Cloud Run works exactly like that for your code: you package your app into a container, hand it to Google, and they handle spinning it up when requests arrive and shutting it down when things go quiet. You never touch a server, never patch an OS, and never pay for idle time.
⚡ Quick Answer
Imagine you run a lemonade stand, but instead of renting a shop 24/7, you only pay for the exact minutes customers are buying lemonade — the stand appears instantly when someone walks up and vanishes when they leave. Google Cloud Run works exactly like that for your code: you package your app into a container, hand it to Google, and they handle spinning it up when requests arrive and shutting it down when things go quiet. You never touch a server, never patch an OS, and never pay for idle time.

Every developer eventually hits the same wall: the app works perfectly on your laptop, but getting it into production means provisioning servers, configuring load balancers, managing autoscaling groups, and babysitting infrastructure at 2am. For teams that just want their code to run reliably at scale, that overhead is expensive, slow, and frankly soul-crushing. Cloud Run was built to eliminate exactly that gap between 'it works on my machine' and 'it's live in production'.

Cloud Run is Google's fully managed serverless container platform. Unlike AWS Lambda, which forces you into specific runtimes and tiny deployment packages, Cloud Run runs any container that listens on a port and responds to HTTP. That one constraint — your app must be stateless and HTTP-driven — unlocks everything else: automatic scaling from zero to thousands of concurrent requests, per-100ms billing, global deployment, and zero server management. It bridges the gap between rigid Function-as-a-Service platforms and the full complexity of Kubernetes.

By the end of this article you'll understand exactly how Cloud Run's request-driven execution model works, how to containerize a real Node.js API and deploy it with a single command, how to wire up environment variables and secrets the right way, and how to avoid the three mistakes that burn developers on their first production deployment. You'll also walk away knowing how to answer the Cloud Run questions that actually come up in DevOps and platform engineering interviews.

How Cloud Run's Request-Driven Model Actually Works

Before writing a single line of code, you need to understand Cloud Run's mental model — because it changes how you architect your app.

When no requests are hitting your service, Cloud Run scales it to zero. There are literally no running instances. The moment a request arrives, Cloud Run starts a container instance, routes the request to it, and keeps that instance alive to handle more requests for a short idle period. If traffic spikes, it starts more instances in parallel. This is called request-driven scaling, and it's the core reason Cloud Run is cheap for low-traffic services and effortless to scale for high-traffic ones.

The critical implication: your container must be stateless. Don't store session data in memory between requests, don't write to local disk expecting it to persist, and don't open background threads that do work outside of a request lifecycle. Any state must live in an external system — Cloud SQL, Firestore, Redis, or Cloud Storage.

Cold starts are the one real trade-off. When Cloud Run starts a fresh instance, there's a brief delay (typically 200ms–2s depending on your image size and runtime) before it can serve traffic. For latency-sensitive APIs, you can set a minimum instance count of 1 to keep a warm instance always running — at the cost of paying for that idle time. For batch jobs or internal tools, cold starts usually don't matter at all.

cloud_run_deploy.sh · BASH
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748
#!/bin/bash
# ─────────────────────────────────────────────────────────────
# GOAL: Build a Docker image, push it to Artifact Registry,
#       and deploy it as a Cloud Run service in one script.
#
# PREREQUISITES:
#   - gcloud CLI installed and authenticated
#   - Docker installed and running
#   - A GCP project with billing enabled
# ─────────────────────────────────────────────────────────────

# Replace these with your actual values
GCP_PROJECT_ID="my-production-project"
GCP_REGION="us-central1"
SERVICE_NAME="product-api"
IMAGE_NAME="us-central1-docker.pkg.dev/${GCP_PROJECT_ID}/cloud-run-services/${SERVICE_NAME}"

# Step 1: Authenticate Docker to push to Artifact Registry
# gcloud configures Docker credentials automatically — no manual login needed
gcloud auth configure-docker us-central1-docker.pkg.dev --quiet

# Step 2: Build the Docker image and tag it for Artifact Registry
# The tag format must match your Artifact Registry repository path exactly
docker build \
  --tag "${IMAGE_NAME}:latest" \
  --platform linux/amd64 \
  . # <── build context is the current directory (where Dockerfile lives)

# Step 3: Push the image to Artifact Registry
# Cloud Run pulls from here at deploy time — it never touches Docker Hub by default
docker push "${IMAGE_NAME}:latest"

# Step 4: Deploy to Cloud Run
gcloud run deploy "${SERVICE_NAME}" \
  --image "${IMAGE_NAME}:latest" \
  --platform managed \
  --region "${GCP_REGION}" \
  --allow-unauthenticated \
  --port 8080 \
  --memory 512Mi \
  --cpu 1 \
  --min-instances 0 \
  --max-instances 100 \
  --concurrency 80
  # --allow-unauthenticated: makes the service publicly accessible
  # --concurrency 80: each instance handles up to 80 simultaneous requests
  # --min-instances 0: scale to zero when idle (cheapest option)
  # --max-instances 100: hard cap to prevent runaway billing
▶ Output
Configuring Docker credentials... Done.
Building Docker image for linux/amd64...
Step 1/8 : FROM node:20-alpine
Step 2/8 : WORKDIR /app
...
Successfully built a3f8c91d2b44
Successfully tagged us-central1-docker.pkg.dev/my-production-project/cloud-run-services/product-api:latest

Pushing image to Artifact Registry...
latest: digest: sha256:7d3e... size: 1847

Deploying container to Cloud Run service [product-api] in [us-central1]
OK Deploying new service... Done.
OK Creating Revision... Revision product-api-00001-abc is active and serving 100% of traffic.

Service URL: https://product-api-abc123-uc.a.run.app
⚠️
Watch Out: Platform MismatchAlways build with --platform linux/amd64. If you're on an Apple Silicon Mac and skip this flag, your image builds for arm64, deploys fine, but Cloud Run silently runs it under emulation — causing random slowness and occasional crashes that are nearly impossible to debug.

Building a Real Containerized API That Cloud Run Will Love

Cloud Run's only requirement is that your container listens on the port defined by the PORT environment variable. That's it. Cloud Run injects PORT at runtime — you don't hardcode it. This one detail trips up a lot of developers who hardcode 3000 or 8080 in their app and then wonder why health checks fail.

Here's a real-world pattern: a lightweight Node.js product API. Notice how the app reads PORT from the environment, handles a health check endpoint (which Cloud Run hits to confirm your container started successfully), and does clean shutdown on SIGTERM (Cloud Run sends SIGTERM before killing an instance during scale-down, giving you a chance to finish in-flight requests).

The Dockerfile matters as much as the code. Keep your image small — every extra MB adds cold start latency. Use multi-stage builds to separate build dependencies from the runtime image, use Alpine-based base images, and always run as a non-root user. Cloud Run doesn't require root, and running as root is a security smell that will get flagged in any serious security audit.

Dockerfile · DOCKERFILE
123456789101112131415161718192021222324252627282930313233
# ── Stage 1: Install dependencies ──────────────────────────────
# We use a full Node image here because we need npm to install packages
FROM node:20-alpine AS dependency-installer
WORKDIR /build

# Copy package files first — Docker layer caching means if these
# haven't changed, npm install is skipped entirely on rebuild
COPY package.json package-lock.json ./
RUN npm ci --omit=dev
# npm ci is stricter than npm install — it respects package-lock.json exactly
# --omit=dev drops devDependencies so they don't bloat the final image

# ── Stage 2: Runtime image ──────────────────────────────────────
# Start fresh from a minimal Alpine image — no build tools, no npm cache
FROM node:20-alpine AS runtime
WORKDIR /app

# Create a non-root user — Cloud Run doesn't need root and neither does your app
RUN addgroup --system api-group && adduser --system --ingroup api-group api-user

# Copy only what we need: the installed node_modules and the app source
COPY --from=dependency-installer /build/node_modules ./node_modules
COPY src/ ./src/
COPY package.json ./

# Switch to non-root user before the app starts
USER api-user

# Cloud Run injects PORT at runtime — we expose 8080 as a documentation hint
# but the app itself must read process.env.PORT, not hardcode this number
EXPOSE 8080

CMD ["node", "src/server.js"]
▶ Output
# When you run: docker build --platform linux/amd64 -t product-api .
[+] Building 14.3s (12/12) FINISHED
=> [dependency-installer 1/4] FROM node:20-alpine 3.1s
=> [dependency-installer 3/4] COPY package.json ... 0.1s
=> [dependency-installer 4/4] RUN npm ci --omit=dev 8.4s
=> [runtime 3/5] COPY --from=dependency-installer ... 0.2s
=> [runtime 4/5] COPY src/ ./src/ 0.1s
=> exporting to image 0.4s
=> naming to docker.io/library/product-api:latest 0.0s

# Final image size: 98MB (vs ~1.1GB if you used a non-Alpine full Node image)
⚠️
Pro Tip: Read PORT from the EnvironmentIn your Node.js app, always start your server with: const port = parseInt(process.env.PORT) || 8080; — Cloud Run injects PORT automatically. If you hardcode 3000 or 8080, your app will usually still work, but only by coincidence. The day Cloud Run changes the injected port, you'll have a silent production failure.

Wiring Up Secrets, Environment Variables, and Service Accounts Correctly

This is where most tutorials stop, and where real production deployments begin. Your app almost certainly needs secrets — database passwords, API keys, JWT signing keys. Hardcoding them into environment variables in your Cloud Run service definition means they show up in plaintext in your deployment history and in anyone's gcloud run describe output. That's a compliance and security problem.

The right pattern is to store secrets in Google Secret Manager and grant your Cloud Run service's service account permission to read them. Cloud Run can then mount secrets as environment variables or as files at startup — they're injected at runtime, never baked into the image or visible in the service config.

Service accounts are equally important. By default, Cloud Run uses the Compute Engine default service account, which has editor-level access to your entire project. That's far too permissive. Create a dedicated service account for each Cloud Run service, grant it only the specific IAM roles it needs (like roles/secretmanager.secretAccessor and roles/cloudsql.client), and attach it at deploy time. This is the principle of least privilege, and it's not optional in production.

setup_secrets_and_iam.sh · BASH
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354
#!/bin/bash
# ─────────────────────────────────────────────────────────────
# GOAL: Create a dedicated service account for the product-api,
#       store secrets in Secret Manager, and deploy Cloud Run
#       with proper IAM bindings — no plaintext secrets anywhere.
# ─────────────────────────────────────────────────────────────

GCP_PROJECT_ID="my-production-project"
GCP_REGION="us-central1"
SERVICE_NAME="product-api"
SERVICE_ACCOUNT_NAME="product-api-runner"
SERVICE_ACCOUNT_EMAIL="${SERVICE_ACCOUNT_NAME}@${GCP_PROJECT_ID}.iam.gserviceaccount.com"
IMAGE_NAME="us-central1-docker.pkg.dev/${GCP_PROJECT_ID}/cloud-run-services/${SERVICE_NAME}:latest"

# Step 1: Create a dedicated service account for this Cloud Run service
# Never use the default Compute SA — it has far too many permissions
gcloud iam service-accounts create "${SERVICE_ACCOUNT_NAME}" \
  --display-name "Service Account for product-api Cloud Run service" \
  --project "${GCP_PROJECT_ID}"

# Step 2: Store the database password in Secret Manager
# Read the secret from stdin so it never appears in your shell history
echo -n "super-secret-db-password" | \
  gcloud secrets create DB_PASSWORD \
    --data-file=- \
    --replication-policy="automatic" \
    --project "${GCP_PROJECT_ID}"
# The -n flag on echo prevents a trailing newline — important for passwords!

# Step 3: Grant the service account permission to READ this specific secret
# Note: secretAccessor only allows reading — not creating or deleting secrets
gcloud secrets add-iam-policy-binding DB_PASSWORD \
  --member="serviceAccount:${SERVICE_ACCOUNT_EMAIL}" \
  --role="roles/secretmanager.secretAccessor" \
  --project "${GCP_PROJECT_ID}"

# Step 4: Deploy Cloud Run with the dedicated service account
# and mount the secret as an environment variable at runtime
gcloud run deploy "${SERVICE_NAME}" \
  --image "${IMAGE_NAME}" \
  --platform managed \
  --region "${GCP_REGION}" \
  --service-account "${SERVICE_ACCOUNT_EMAIL}" \
  --update-secrets="DB_PASSWORD=DB_PASSWORD:latest" \
  # ↑ Format: ENV_VAR_NAME=SECRET_NAME:VERSION
  # Cloud Run fetches the secret at startup and injects it as an env var
  # Your app reads it with: process.env.DB_PASSWORD — same as any env var
  --set-env-vars="NODE_ENV=production,DB_HOST=10.0.0.5,DB_NAME=products_db" \
  --allow-unauthenticated

# Step 5: Verify the deployment and check the service account is correct
gcloud run services describe "${SERVICE_NAME}" \
  --region "${GCP_REGION}" \
  --format="value(spec.template.spec.serviceAccountName)"
▶ Output
Created service account [product-api-runner].

Created version [1] of the secret [DB_PASSWORD].

Updated IAM policy for secret [DB_PASSWORD].
bindings:
- members:
- serviceAccount:product-api-runner@my-production-project.iam.gserviceaccount.com
role: roles/secretmanager.secretAccessor

Deploying container to Cloud Run service [product-api]...
OK Deploying... Done.
OK Creating Revision... Revision product-api-00002-xyz is active.

Service URL: https://product-api-abc123-uc.a.run.app

# Output of the describe command:
product-api-runner@my-production-project.iam.gserviceaccount.com
🔥
Interview Gold: Secrets vs Environment VariablesInterviewers love asking how you'd handle secrets in Cloud Run. The wrong answer is --set-env-vars=DB_PASSWORD=mypassword. The right answer is Secret Manager + --update-secrets, because secrets are encrypted at rest, access is audited in Cloud Audit Logs, and rotating a secret doesn't require a redeployment — just a new secret version.
Feature / AspectGoogle Cloud RunGoogle Cloud Functions (Gen 2)
Deployment unitAny Docker containerSource code in supported runtime
Runtime flexibilityAny language, any versionLimited to supported runtimes (Node, Python, Go, Java, etc.)
Max request timeout60 minutes60 minutes (Gen 2)
Cold start controlMin instances settingMin instances setting
Concurrency per instanceUp to 1000 simultaneous requests1 request per instance (default)
Binary/system dependenciesYes — install anything in DockerfileVery limited
Stateful background tasksNot supported (request-scoped)Not supported
Best forAPIs, web apps, microservices, ML inferenceEvent-driven triggers (Pub/Sub, Storage events)
Billing granularityPer 100ms of CPU+memory usagePer 100ms of CPU+memory usage
VPC connectivityYes — Serverless VPC Access connectorYes — Serverless VPC Access connector

🎯 Key Takeaways

  • Cloud Run runs any container that listens on an HTTP port — not just specific runtimes — which makes it dramatically more flexible than traditional FaaS platforms.
  • Your app must read the port from process.env.PORT (or the equivalent in your language), not hardcode it — Cloud Run injects this at runtime and health checks will silently fail if you get it wrong.
  • Secrets belong in Google Secret Manager mounted via --update-secrets, not in --set-env-vars — the difference is encryption at rest, audit logging, and the ability to rotate secrets without redeploying.
  • Cold starts are real but controllable: use --min-instances 1 for latency-sensitive services, keep your Docker image small with multi-stage builds, and always handle SIGTERM for graceful shutdown.

⚠ Common Mistakes to Avoid

  • Mistake 1: Hardcoding the PORT number in the app — Symptom: Cloud Run health checks fail immediately after deploy with 'Container failed to start' even though the container runs fine locally — Fix: Always bind your server to parseInt(process.env.PORT) || 8080. Cloud Run injects PORT at runtime and will use whatever it decides, not necessarily 8080.
  • Mistake 2: Using the default Compute Engine service account — Symptom: The service works, but your Cloud Run service has editor-level access to your entire GCP project, meaning a compromised container can read, write, or delete anything — Fix: Create a dedicated service account per service with gcloud iam service-accounts create, and pass it via --service-account at deploy time. Grant only the specific roles that service actually needs.
  • Mistake 3: Not handling SIGTERM for graceful shutdown — Symptom: During scale-down or redeployment, in-flight requests are abruptly cut off, causing 502 errors for users — Fix: Listen for the SIGTERM signal and stop accepting new requests while finishing active ones. In Node.js: process.on('SIGTERM', () => { server.close(() => process.exit(0)); }); Cloud Run waits up to 10 seconds after SIGTERM before force-killing the instance.

Interview Questions on This Topic

  • QCloud Run scales to zero by default — what's the trade-off and when would you explicitly set --min-instances to 1 or higher?
  • QHow does Cloud Run's concurrency model differ from AWS Lambda, and why does that difference affect how you write your application code?
  • QA Cloud Run service is failing health checks immediately after deployment even though the Docker image works perfectly locally. Walk me through how you'd diagnose and fix it.

Frequently Asked Questions

Does Google Cloud Run support WebSockets or long-lived connections?

Yes, as of Cloud Run's HTTP/2 and WebSocket support update, long-lived connections like WebSockets and server-sent events are supported. You need to enable HTTP/2 end-to-end in your service configuration. Keep in mind that Cloud Run's request timeout caps at 60 minutes, so truly persistent connections (like for real-time gaming) still aren't a great fit — use a service like Firebase Realtime Database or a dedicated WebSocket server on GKE instead.

How much does Google Cloud Run actually cost for a low-traffic API?

For a low-traffic API that scales to zero, costs can be near zero. Cloud Run bills per 100ms of CPU and memory usage only while handling requests, plus a small per-request charge. Google also provides a generous free tier: 2 million requests/month, 360,000 GB-seconds of memory, and 180,000 vCPU-seconds free every month. A side project or internal tool that gets a few thousand requests a day will typically cost pennies or nothing.

What's the difference between Cloud Run (fully managed) and Cloud Run for Anthos?

Cloud Run fully managed runs on Google's infrastructure and requires zero cluster management — you just deploy a container and Google handles everything. Cloud Run for Anthos (now called Cloud Run on GKE) runs the same Cloud Run developer experience on top of a Kubernetes cluster you control, either on GCP or on-premises. Choose fully managed unless you have a specific reason to run on your own infrastructure, like regulatory requirements or needing to co-locate with an existing GKE workload.

🔥
TheCodeForge Editorial Team Verified Author

Written and reviewed by senior developers with real-world experience across enterprise, startup and open-source projects. Every article on TheCodeForge is written to be clear, accurate and genuinely useful — not just SEO filler.

← PreviousAWS EKS — Elastic Kubernetes ServiceNext →Alerting and On-call Best Practices
Forged with 🔥 at TheCodeForge.io — Where Developers Are Forged