GCP is Google's cloud platform built on the same infrastructure powering Search and YouTube
The Project is the atomic unit of isolation — billing, IAM, and APIs are per-project
Compute options: GCE (VMs), GKE (Kubernetes), Cloud Run (serverless containers) — pick by ops overhead tolerance
Storage services: GCS (blobs), Cloud SQL (relational), Spanner (global), Firestore (NoSQL) — match your data access pattern
Biggest mistake: using roles/editor on a service account — grants nearly full write access, making any compromise catastrophic
Performance insight: Cloud Run scales to zero, costing $0 at idle; GKE clusters cost ~$70/month minimum even when idle
Plain-English First
Imagine you're opening a restaurant but you don't want to buy the building, the ovens, or hire an electrician. Instead, you rent a fully-equipped kitchen by the hour — use as much or as little as you need, and pay only for what you cook. Google Cloud Platform is exactly that, but for software. Instead of buying servers, databases, and networking gear, your app rents Google's global infrastructure by the second. When traffic spikes on Black Friday, you dial up the kitchen size. When it's quiet, you dial it back down. No hardware, no waste.
Every production application you've ever used — from a startup's API to a Fortune 500's data pipeline — runs on someone's computers. The question is whose, and at what cost. Running your own servers means upfront capital, a team to maintain them, and a very bad Monday when one fails at 2 AM. Cloud platforms exist to flip that model: you get world-class infrastructure on demand, billed like a utility, with Google's Site Reliability Engineers quietly keeping the lights on behind the scenes. Google Cloud Platform is Google's answer to that problem, and it's built on the same infrastructure that runs Search, Gmail, and YouTube — systems engineered to handle billions of requests a day.
The real problem GCP solves isn't just 'running code remotely.' It's the operational complexity that kills engineering teams: patching OS vulnerabilities, provisioning storage that scales automatically, routing traffic across continents, and debugging distributed systems. Before managed cloud services, teams burned enormous engineering hours on infrastructure that added zero value to their product. GCP packages that complexity into opinionated, composable services so your team can stay focused on the thing that actually matters — the software itself.
After reading this, you'll confidently map a real-world application's requirements to specific GCP services, understand the difference between GCP's compute tiers and when each is appropriate, deploy a containerized workload to Google Kubernetes Engine, and avoid the billing and security mistakes that catch new GCP users off guard. This isn't a tour of the UI — it's a mental model you'll actually use.
GCP's Mental Model: Projects, Regions, and the Resource Hierarchy
Before touching any GCP service, you need to understand how GCP organises everything. Get this wrong and you'll end up with sprawling costs, broken IAM permissions, and services that can't talk to each other.
GCP groups resources into a three-tier hierarchy: Organisation → Folders → Projects. A Project is the atomic unit — every resource (a VM, a bucket, a database) lives inside exactly one project. Billing, IAM permissions, and API enablement are all scoped to the project. This is intentional: it means a dev team can have a payments-service-dev project completely isolated from payments-service-prod, with different budgets, different access controls, and separate audit logs.
Regions and zones handle physical location. A Region is a geographic area (e.g., us-central1 in Iowa). Each region contains multiple Zones (us-central1-a, us-central1-b, etc.) — these are independent data centres within that region. The rule of thumb: deploy across at least two zones for high availability, across multiple regions only if latency to global users or data sovereignty requires it. Cross-region data transfer costs money, so don't do it by default.
Understanding this hierarchy is what separates developers who get surprised by a $4,000 bill from those who plan budgets accurately from day one.
gcp_project_setup.shBASH
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
#!/bin/bash
# -----------------------------------------------------------
# GCPPROJECTSETUPSCRIPT
# Runthis once to initialise a newGCP project correctly.
# Requires: gcloud CLI authenticated via `gcloud auth login`
# -----------------------------------------------------------
# Define project configuration as variables — never hardcode these inline
PROJECT_ID="payments-service-prod" # Must be globally unique across all GCP
BILLING_ACCOUNT_ID="012345-ABCDEF-789GHI" # Found in GCPConsole > Billing
PRIMARY_REGION="us-central1" # Closest region to your main user base
PRIMARY_ZONE="us-central1-a" # Default zone within that region
# Step1: Create the project
# --set-as-default means subsequent gcloud commands target this project automatically
gcloud projects create "${PROJECT_ID}" \
--name="Payments Service Production" \
--set-as-default
echo "Project '${PROJECT_ID}' created."
# Step2: Link a billing account — without this, most services won't activate
gcloud billing projects link "${PROJECT_ID}" \
--billing-account="${BILLING_ACCOUNT_ID}"
echo "Billing account linked."
# Step3: Set the default region and zone so you don't have to repeat --region/--zone
# on every command. This saves you from accidentally deploying to the wrong region.
gcloud config set compute/region "${PRIMARY_REGION}"
gcloud config set compute/zone "${PRIMARY_ZONE}"
echo "Default region set to ${PRIMARY_REGION}, zone to ${PRIMARY_ZONE}."
# Step4: Enable only the APIs your project actually needs.
# GCP disables most APIs by default — this is a security feature, not a bug.
# Enabling unused APIs increases your attack surface for nothing.
gcloud services enable \
compute.googleapis.com \
container.googleapis.com \
cloudsql.googleapis.com \
storage.googleapis.com
echo "Core APIs enabled."
echo "Project initialisation complete. Run 'gcloud config list' to verify."
Output
Project 'payments-service-prod' created.
Billing account linked.
Default region set to us-central1, zone to us-central1-a.
Project initialisation complete. Run 'gcloud config list' to verify.
Watch Out: Project IDs Are Permanent
Once a GCP Project ID is created, it cannot be changed — ever. Even after deleting the project, that ID is reserved globally for 30 days. Always use a naming convention like {team}-{service}-{env} (e.g., platform-auth-prod) before you run that create command.
Production Insight
Mistaking the project for an organisational boundary leads to sprawling costs and broken IAM.
Many teams put dev/staging/prod in folders under one project — wrong: separate projects isolate billing and access.
Rule: one project per environment, one service account per service.
Key Takeaway
The project is your atomic unit of isolation.
Billing, IAM, and APIs are per-project.
Use separate projects for separate environments.
Project Hierarchy Decisions
IfSingle service, no need for isolation
→
UseOne project is fine
IfMultiple environments (dev/staging/prod)
→
UseCreate separate projects per environment
IfMulti-team, multi-service
→
UseUse folders under an organisation node for logical grouping, each team gets its own project
GCP Compute Options: Choosing the Right Engine for Your Workload
GCP gives you five distinct ways to run code, and picking the wrong one is one of the most common — and expensive — mistakes teams make. They're not interchangeable; each is optimised for a specific shape of workload.
Compute Engine (GCE) is raw virtual machines. You control the OS, you manage patching, you configure networking. Use this when you're lifting-and-shifting an existing application that has specific OS dependencies, or when you need GPU access for ML training jobs. It's the most flexible and the most operational overhead.
Google Kubernetes Engine (GKE) is managed Kubernetes. GCP handles the control plane (the bit that schedules your containers) and you manage your node pools and workloads. This is the workhorse for microservices architectures — use it when you have multiple services that need independent scaling, resource isolation, and rolling deployments.
Cloud Run is serverless containers. You push a container image, GCP handles everything else — scaling from zero to thousands of instances, load balancing, HTTPS. No cluster to manage. Use this for stateless APIs and event-driven services where you want zero infrastructure management. It's phenomenally cost-efficient for variable traffic.
App Engine is the oldest PaaS on GCP — opinionated, language-specific runtimes. Mostly superseded by Cloud Run for new projects.
Cloud Functions is function-level serverless for event triggers. Use it for glue code: responding to a file upload, processing a Pub/Sub message, or running a webhook handler. Not suited for long-running or compute-heavy work.
Here's the thing — each tier has a hidden cost: GCE's sustained-use discounts save you after 25% of the month, but they don't apply to preemptible VMs. GKE's control plane is free, but node costs add up fast — a three-node n1-standard-2 cluster costs about $200/month before any workload. Cloud Run per-request billing means you pay nothing at idle, but cold starts can hit 3 seconds for JVM apps. Trade-offs everywhere.
cloud_run_service.yamlYAML
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
# -----------------------------------------------------------
# CLOUDRUNSERVICEDEFINITION
# Deploys a containerised payments API to CloudRun.
# CloudRun auto-scales to zero when idle — you pay nothing
# when your service isn't handling requests.
# Deploy with: gcloud run services replace cloud_run_service.yaml
# -----------------------------------------------------------
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
name: payments-api
namespace: "123456789" # YourGCPProjectNumber (not ProjectID)
annotations:
# Force all traffic through HTTPS — never allow plain HTTP in production
run.googleapis.com/ingress: all
spec:
template:
metadata:
annotations:
# Scale down to zero instances when there are no requests
# This is what makes CloudRun cost-effective for variable traffic
autoscaling.knative.dev/minScale: "0"
# Cap at 10 instances to prevent runaway costs during a traffic spike
autoscaling.knative.dev/maxScale: "10"
# Each instance handles max 80 concurrent requests before a new one spins up
run.googleapis.com/execution-environment: gen2
spec:
# HowlongCloudRun waits for a response before treating it as a timeout
timeoutSeconds: 30
# CPU and memory are per-instance limits
containers:
- image: gcr.io/payments-service-prod/payments-api:v2.1.0
ports:
- containerPort: 8080 # CloudRun always routes traffic to port 8080
resources:
limits:
cpu: "1" # 1 vCPU per instance
memory: "512Mi" # 512MB RAM — right-size this based on profiling
env:
# Never hardcode secrets. ReferenceSecretManager instead.
- name: DB_PASSWORD
valueFrom:
secretKeyRef:
key: latest
name: payments-db-password # Name of secret in SecretManager
traffic:
# 100% of traffic goes to the latest revision
# You can split traffic here for canary deployments (e.g., 90/10)
- latestRevision: true
percent: 100
Output
Deploying...
Setting IAM policy
Done.
Service [payments-api] revision [payments-api-00002-xyz] has been deployed and is serving 100 percent of traffic.
Service URL: https://payments-api-abcdef-uc.a.run.app
Pro Tip: Cloud Run Cold Starts
Setting minScale to 0 means you pay nothing at idle, but the first request after a period of inactivity hits a 'cold start' — typically 1-3 seconds for a JVM app, under 300ms for Go or Node. For latency-sensitive services (payments, auth), set minScale to 1. The cost is roughly $10-15/month for one always-warm instance — cheap insurance against SLA breaches.
Production Insight
Choosing the wrong compute tier is one of the most expensive mistakes teams make.
Cloud Run's scale-to-zero saves money for variable traffic but introduces cold start latency.
Rule: start with Cloud Run for stateless services; only move to GKE or GCE when Cloud Run limits constrain you.
Key Takeaway
Compute choice is about ops overhead vs control.
Cloud Run is the default for new stateless services.
GKE for multi-container apps needing control; GCE for legacy lift-and-shift or GPUs.
Compute Decision: Which Service to Use
IfStateless API, unpredictable traffic, no GPU needed
→
UseCloud Run — scale-to-zero, per-request billing
IfMulti-service microservices architecture, needs control over networking
→
UseGKE — autoscaling, rolling updates, service mesh support
IfLegacy app with specific OS dependencies or GPU/TPU required
→
UseCompute Engine — full VM control, GPU support
Storage on GCP: Matching the Data Shape to the Right Service
Nothing reveals a GCP beginner faster than seeing them store relational data in Cloud Storage or put time-series metrics into Cloud SQL. GCP has six distinct storage services and each one is engineered for a specific data access pattern. Using the wrong one doesn't just waste money — it actively degrades performance.
Cloud Storage (GCS) is object storage — think S3. Binary blobs, static assets, backups, data lake files. Infinitely scalable, globally accessible, extremely cheap. Access pattern: write once, read many, no updates to individual fields.
Cloud SQL is managed relational databases — PostgreSQL, MySQL, or SQL Server. Handles backups, failover, and patching. Use it when you have structured data with relationships and your team already thinks in SQL. Scales vertically (bigger machine) with read replicas for horizontal read scaling.
Cloud Spanner is the exotic one — globally distributed, horizontally scalable relational database with ACID transactions. It's what powers Google's own financial systems. Use it when Cloud SQL's 96TB limit isn't enough or when you need active-active multi-region writes. The price point reflects its power — about 20x Cloud SQL.
Firestore is a serverless NoSQL document database, optimised for mobile and web clients with real-time sync built in. Excellent for user profiles, session data, and content that's hierarchical and document-shaped.
Bigtable is a managed wide-column NoSQL store, designed for petabyte-scale time-series, IoT, and financial data with millisecond latency at massive scale. Not a general-purpose database.
Memorystore is managed Redis or Memcached — in-memory caching layer for your hot data.
One more thing: GCS storage classes (Standard, Nearline, Coldline, Archive) let you save 60-90% by picking the right access frequency. Access a Coldline object once? That retrieval costs more than storing it for a month. Pick storage class based on real access patterns, not on what feels right.
gcs_upload_and_signed_url.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
# -----------------------------------------------------------# GCS OBJECT UPLOAD + SIGNED URL GENERATION# Real-world pattern: a user uploads a profile photo.# We store it privately in GCS, then generate a short-lived# signed URL so the frontend can display it without making# the bucket publicly readable (a major security mistake).## Install dependencies: pip install google-cloud-storage# Auth: set GOOGLE_APPLICATION_CREDENTIALS env var to your# service account key JSON path, or use Workload Identity.# -----------------------------------------------------------import datetime
from pathlib importPathfrom google.cloud import storage
GCP_PROJECT_ID = "payments-service-prod"
PRIVATE_BUCKET_NAME = "user-profile-photos-prod" # This bucket is NOT public
SIGNED_URL_EXPIRY_MINUTES = 15# Short expiry — limits blast radius if URL leaksdefupload_user_profile_photo(
user_id: str,
local_file_path: Path,
content_type: str = "image/jpeg",
) -> str:
"""
Uploads a profile photo to GCSand returns a signed URL
the frontend can use to display it for the next 15 minutes.
Returns the signed URL string.
"""
storage_client = storage.Client(project=GCP_PROJECT_ID)
bucket = storage_client.bucket(PRIVATE_BUCKET_NAME)
# Build a deterministic object path — makes it easy to find later# and naturally organises objects by user without needing folders
object_name = f"users/{user_id}/profile/avatar.jpg"
blob = bucket.blob(object_name)
# Set content type so browsers render it correctly, not download it
blob.content_type = content_type
# Upload the file — this overwrites any existing photo for this user
blob.upload_from_filename(str(local_file_path))
print(f"Uploaded '{local_file_path}' to gs://{PRIVATE_BUCKET_NAME}/{object_name}")
# Generate a V4 signed URL — time-limited, cryptographically signed# by our service account. The bucket stays private; only holders of# this URL can access the object, and only until it expires.
signed_url = blob.generate_signed_url(
version="v4",
expiration=datetime.timedelta(minutes=SIGNED_URL_EXPIRY_MINUTES),
method="GET", # Read-only access
)
print(f"Signed URL (valid {SIGNED_URL_EXPIRY_MINUTES} mins): {signed_url[:80]}...")
return signed_url
if __name__ == "__main__":
# Simulate uploading a photo for user ID 'usr_8821'
sample_photo_path = Path("/tmp/avatar_upload.jpg")
# In production this file comes from a multipart form upload
sample_photo_path.write_bytes(b"<fake-jpeg-bytes-for-demo>")
url = upload_user_profile_photo(
user_id="usr_8821",
local_file_path=sample_photo_path,
)
print(f"\nFrontend should use this URL to render the avatar: {url[:60]}...")
Output
Uploaded '/tmp/avatar_upload.jpg' to gs://user-profile-photos-prod/users/usr_8821/profile/avatar.jpg
Signed URL (valid 15 mins): https://storage.googleapis.com/user-profile-photos-prod/users/usr_88...
Frontend should use this URL to render the avatar: https://storage.googleapis.com/user-profile-photos...
Watch Out: Never Make a Storage Bucket Containing PII Public
GCS has an 'allUsers' IAM permission that makes an entire bucket readable by the whole internet. It's convenient for hosting static assets, but it has caused real data breaches when teams accidentally applied it to buckets containing user data. Use signed URLs as shown above — they give time-limited, auditable access without ever opening the bucket publicly.
Production Insight
Using Cloud Storage for relational data or Cloud SQL for blob data wastes money and performance.
A common trap: storing JSON blobs in Cloud SQL when Firestore or GCS would be cheaper and faster.
Rule: analyse your data access pattern before picking a storage service.
GCP IAM and Networking: The Security Layer You Can't Skip
Here's the uncomfortable truth: most cloud security incidents aren't caused by sophisticated attacks. They're caused by over-permissioned service accounts, open firewall rules, and credentials hardcoded into source code. GCP's IAM and VPC model exist specifically to prevent this — but only if you use them intentionally.
IAM (Identity and Access Management) in GCP follows the principle of least privilege. Every service account, user, and group gets only the permissions it needs — nothing more. Roles are either predefined (like roles/storage.objectViewer) or custom. The most dangerous role is roles/editor on a project — it's temptingly broad and you'll see it everywhere in tutorials. Never use it in production.
Workload Identity is the right way for GKE workloads to authenticate to GCP APIs. Instead of downloading a service account key JSON file (a long-lived credential that can be stolen), Workload Identity binds a Kubernetes service account to a GCP service account. The credential is ephemeral and automatically rotated. If you're using key files in a Kubernetes cluster, stop — switch to Workload Identity.
VPC (Virtual Private Cloud) is your private network inside GCP. By default, GCP creates a 'default' VPC with permissive firewall rules. For anything production, create a custom VPC with explicit subnets per region, and firewall rules that deny all ingress by default and allow only what you specify. Use Private Google Access on subnets so VMs can reach GCP APIs without needing a public IP.
gcp_iam_least_privilege_setup.shBASH
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
#!/bin/bash
# -----------------------------------------------------------
# GCPIAMLEAST-PRIVILEGESETUP
# Creates a service account for a CloudRun payments service
# with ONLY the permissions it actually needs:
# - Read secrets from SecretManager
# - Write to a specific CloudStorage bucket
# - Publish to a specific Pub/Sub topic
# Nothingelse. This limits blast radius if the service is compromised.
# -----------------------------------------------------------
PROJECT_ID="payments-service-prod"
SERVICE_NAME="payments-api"
# Step1: Create a dedicated service account forthis service
# One service account per service — never share service accounts
SERVICE_ACCOUNT_EMAIL="${SERVICE_NAME}@${PROJECT_ID}.iam.gserviceaccount.com"
gcloud iam service-accounts create "${SERVICE_NAME}" \
--project="${PROJECT_ID}" \
--display-name="Payments API Service Account" \
--description="Identity for the payments-api Cloud Run service. Least-privilege access only."
echo "Service account created: ${SERVICE_ACCOUNT_EMAIL}"
# Step2: Grant permission to read secrets from SecretManager
# This is scoped to the PROJECT level — ideally scope it to individual secrets
# using resource-level IAMfor even finer control
gcloud projects add-iam-policy-binding "${PROJECT_ID}" \
--member="serviceAccount:${SERVICE_ACCOUNT_EMAIL}" \
--role="roles/secretmanager.secretAccessor"
echo "Secret Manager access granted."
# Step3: Grant permission to write objects to a specific bucket ONLY
# Note: roles/storage.objectCreator is narrower than roles/storage.objectAdmin
# objectCreator can write new objects but cannot delete or overwrite existing ones
TARGET_BUCKET="gs://payments-receipts-prod"
gcloud storage buckets add-iam-policy-binding "${TARGET_BUCKET}" \
--member="serviceAccount:${SERVICE_ACCOUNT_EMAIL}" \
--role="roles/storage.objectCreator"
echo "Storage write access granted to ${TARGET_BUCKET} only."
# Step4: GrantPub/Sub publish permission on one specific topic
TARGET_TOPIC="projects/${PROJECT_ID}/topics/payment-completed-events"
gcloud pubsub topics add-iam-policy-binding "payment-completed-events" \
--project="${PROJECT_ID}" \
--member="serviceAccount:${SERVICE_ACCOUNT_EMAIL}" \
--role="roles/pubsub.publisher"
echo "Pub/Sub publish access granted to payment-completed-events topic."
# Step5: Attachthis service account to the CloudRun service
# The service now authenticates as thisSA automatically — no key files needed
gcloud run services update "${SERVICE_NAME}" \
--project="${PROJECT_ID}" \
--region="us-central1" \
--service-account="${SERVICE_ACCOUNT_EMAIL}"
echo "Service account attached to Cloud Run service."
echo "IAM setup complete. This service account has NO other GCP permissions."
Output
Service account created: payments-api@payments-service-prod.iam.gserviceaccount.com
Secret Manager access granted.
Storage write access granted to gs://payments-receipts-prod only.
Pub/Sub publish access granted to payment-completed-events topic.
Service account attached to Cloud Run service.
IAM setup complete. This service account has NO other GCP permissions.
Interview Gold: Why Not Just Use roles/editor?
roles/editor grants write access to almost every GCP resource in the project — including the ability to read secrets, exfiltrate data, and create new compute resources. If a service with this role is compromised, the attacker has near-full control of your GCP project. Interviewers love asking how you'd scope permissions for a specific service. The answer is always: one dedicated service account, one role per permission needed, no editor/owner roles in production.
Production Insight
The most common security incident in GCP is an over-permissioned service account.
A single roles/editor binding on a project lets an attacker control everything.
Rule: use custom roles or least-privilege predefined roles; never use roles/editor in production.
Key Takeaway
IAM is not an afterthought — design least-privilege before deploying.
One SA per service, one role per need.
Use Workload Identity for GKE to avoid key files.
IAM Strategy Decisions
IfService needs to read from Cloud SQL
→
UseGrant roles/cloudsql.client on the instance or project
IfService needs to publish to Pub/Sub
→
UseGrant roles/pubsub.publisher on the specific topic
IfService needs to write to a specific GCS bucket
→
UseGrant roles/storage.objectCreator on the bucket
GCP Networking: VPCs, Firewalls, and Connectivity
GCP's networking model is built around Virtual Private Clouds (VPCs). A VPC is a global isolated network that spans all regions. Within it, you define subnets per region, each with a private IP range. By default, GCP creates a 'default' VPC with permissive firewall rules — convenient for prototyping but dangerous for production. Always create a custom VPC for production workloads.
Subnets are regional IP ranges (e.g., 10.0.0.0/20 in us-central1). Resources within the same subnet can communicate without a firewall rule. Firewall Rules are stateful — by default, all ingress is denied unless allowed. Egress is allowed. Rule order doesn't matter; priority does. Private Google Access lets VMs without external IPs reach Google APIs via Google's internal network. Cloud NAT is required for VMs with no external IP to outbound internet. VPC Peering connects two VPCs so they can communicate using internal IPs — common for multi-project setups. Shared VPC centralises network management: a host project shares its VPC with service projects.
Best practice: start with a custom VPC, define subnets for each tier (frontend, backend, data), apply firewall rules that deny all ingress except on specific ports from specific source ranges, and use Private Google Access for all API calls.
GCP's default VPC has an ingress rule allowing SSH (tcp:22) and RDP (tcp:3389) from any IP (0.0.0.0/0). If you deploy a VM with an external IP, it's accessible from the internet within minutes. Always create a custom VPC with strict firewall rules.
Production Insight
Using the default VPC in production often leaves SSH and RDP open on all instances.
Attackers scan GCP IP ranges and find exposed instances within hours.
Rule: always create a custom VPC with ingress firewall rules that only allow your specific IP ranges.
Key Takeaway
Custom VPCs are mandatory for production.
Default VPC is too permissive.
Use Private Google Access to keep VMs off the internet.
VPC Design Decisions
IfSingle project, small app
→
UseCustom VPC with a single subnet per region is fine
IfMultiple services with different security tiers
→
UseMultiple subnets with strict firewall rules between them
IfMulti-project organisation
→
UseShared VPC for centralised networking; use VPC peering for isolated projects
● Production incidentPOST-MORTEMseverity: high
Data Exposure via Public Bucket
Symptom
A security scanner flagged the bucket as publicly accessible. Later analysis showed automated scrapers had downloaded the data.
Assumption
The team believed that 'allUsers' only applied to authenticated Google users, not the entire internet.
Root cause
The IAM binding roles/storage.objectViewer for allUsers made all objects readable without any authentication.
Fix
Immediately removed the allUsers binding using gcloud storage buckets remove-iam-policy-binding gs://BUCKET_NAME --member=allUsers --role=roles/storage.objectViewer. Then rotated any exposed secrets and rotated the bucket's default KMS key. Migrated to signed URLs for temporary access.
Key lesson
Never grant allUsers access to any bucket that contains sensitive data. Use pre-signed URLs for time-limited access.
Audit bucket IAM bindings regularly with Cloud Asset Inventory.
Enable Object Versioning and retention policies to detect and recover from accidental exposure.
Production debug guideSymptom → Action guide for the most common GCP issues5 entries
Symptom · 01
gcloud command fails with 'Permission denied'
→
Fix
Run gcloud auth application-default login or set GOOGLE_APPLICATION_CREDENTIALS. Verify the service account has the required role with gcloud projects get-iam-policy PROJECT_ID.
Symptom · 02
Can't reach a GCE instance via external IP
→
Fix
Check firewall rules: gcloud compute firewall-rules list --filter=network=default. Ensure an ingress rule allows traffic on the required port from your IP.
Symptom · 03
Cloud Run service returns 403
→
Fix
Verify the service's service account has the roles/run.invoker on the service. Use gcloud run services get-iam-policy SERVICE_NAME --region=REGION.
Symptom · 04
GKE pod cannot connect to Cloud SQL
→
Fix
Check the Pod's service account has roles/cloudsql.client. Use Workload Identity mapping. Verify VPC peering or Private Services Access is configured if using Private IP.
Symptom · 05
Billing is unexpectedly high
→
Fix
Use the Compute Engine VM list with labels. Run gcloud billing accounts list and check budget alerts. Use the Cost Table dashboard in GCP Console.
★ GCP CLI Debug Cheat SheetQuick commands to identify and fix common GCP issues in under 60 seconds.
Authentication failure−
Immediate action
Re-authenticate with `gcloud auth login`
Commands
gcloud auth login
gcloud config list account
Fix now
Also check GOOGLE_APPLICATION_CREDENTIALS env var is set correctly.
billing, IAM, and APIs are all scoped per project. Use separate projects for dev/staging/prod, not separate folders within one project.
2
The compute decision (GCE vs GKE vs Cloud Run) is really a decision about how much operational ownership you want
more control always means more operational overhead. Cloud Run is the default choice for new stateless services unless you have a specific reason not to use it.
3
The right storage service is determined entirely by your data access pattern
Cloud Storage for blobs, Cloud SQL for relational data under 96TB, Spanner for globally distributed relational, Firestore for document-shaped hierarchical data. Mixing these up costs money and performance.
4
IAM is not an afterthought
set up least-privilege service accounts before you deploy your first service. The cost of retrofitting permissions on a live system is far higher than getting it right during initial setup.
5
Custom VPCs are mandatory for production. Default VPC is too permissive. Use Private Google Access to keep VMs off the internet.
Common mistakes to avoid
3 patterns
×
Enabling allUsers IAM on a GCS bucket containing user data
Symptom
All objects in the bucket are publicly readable on the internet, often discovered via a security scanner or a data breach report.
Fix
Remove the allUsers binding immediately using gcloud storage buckets remove-iam-policy-binding gs://BUCKET_NAME --member=allUsers --role=roles/storage.objectViewer. Audit Cloud Audit Logs to check which objects were accessed. Switch to signed URLs for any temporary public access.
×
Using a single service account with roles/editor for every service in a project
Symptom
If one service is compromised or a key file is leaked, an attacker gains near-full write access to the entire GCP project including secrets, databases, and compute.
Fix
Create one service account per service, grant only the specific predefined roles required (e.g., roles/pubsub.publisher, not roles/pubsub.admin), and use Workload Identity for GKE instead of key files.
×
Deploying all resources to a single zone without high-availability consideration
Symptom
A GCP zonal outage (like the 2021 us-central1-b incident) takes down your entire application, violating your SLA.
Fix
For Compute Engine, use Managed Instance Groups (MIGs) spread across multiple zones. For Cloud SQL, enable High Availability to provision a standby instance in a different zone. For GKE, create node pools with nodes spread across zones using --num-nodes-per-zone.
INTERVIEW PREP · PRACTICE MODE
Interview Questions on This Topic
Q01SENIOR
You're building a payments microservice that needs to read from Cloud SQ...
Q02SENIOR
A product manager tells you traffic to your API is unpredictable — quiet...
Q03SENIOR
Your team is moving from a monolith to microservices on GCP. How do you ...
Q01 of 03SENIOR
You're building a payments microservice that needs to read from Cloud SQL and publish to Pub/Sub. Walk me through how you'd set up IAM for it in production — and specifically, what would you NOT do that junior engineers typically get wrong?
ANSWER
First, create a dedicated service account for the payments microservice — no shared accounts. Grant roles/cloudsql.client for connecting to Cloud SQL and roles/pubsub.publisher for publishing. For Cloud SQL, also ensure the service account is authorized in the database using cloudsql_proxy or Private IP. What I would NOT do: use roles/editor or roles/cloudsql.admin — those grant far too much access, including the ability to drop databases or modify IAM policies. Also avoid embedding a long-lived service account key file in the container — use Workload Identity (if on GKE) or attach the SA to the Cloud Run service directly.
Q02 of 03SENIOR
A product manager tells you traffic to your API is unpredictable — quiet for hours, then spikes to 10,000 requests per minute around lunchtime. Which GCP compute option would you choose and why? What are the trade-offs of your choice?
ANSWER
I'd choose Cloud Run. It scales from zero to thousands of instances instantly, and you pay only per request — which matches unpredictable traffic perfectly. Trade-offs: Cold starts (3s for JVM) could affect the first request after idle. To mitigate, set minScale=1 (costs ~$10/month). Also Cloud Run has a 3600-second request timeout — if the API does long-running work, you might need GKE. But for a typical read-from-SQL-and-publish-to-PubSub flow, Cloud Run is ideal.
Q03 of 03SENIOR
Your team is moving from a monolith to microservices on GCP. How do you decide between GKE and Cloud Run for each service? What signals in a service's requirements push you toward one versus the other?
ANSWER
Key signal: statefulness. If a service is stateless (no local disk, no sticky sessions) and can be containerized, Cloud Run is the default — minimal ops overhead. If the service needs GPUs, long timeouts (>1h), or requires low-level networking (e.g. eBPF), GKE is necessary. Also consider team expertise: if your team knows Kubernetes, GKE gives more control; if not, Cloud Run abstracts everything. Decision: For 80% of microservices (REST APIs, event handlers, webhooks), start with Cloud Run. Use GKE only when Cloud Run limits force you out.
01
You're building a payments microservice that needs to read from Cloud SQL and publish to Pub/Sub. Walk me through how you'd set up IAM for it in production — and specifically, what would you NOT do that junior engineers typically get wrong?
SENIOR
02
A product manager tells you traffic to your API is unpredictable — quiet for hours, then spikes to 10,000 requests per minute around lunchtime. Which GCP compute option would you choose and why? What are the trade-offs of your choice?
SENIOR
03
Your team is moving from a monolith to microservices on GCP. How do you decide between GKE and Cloud Run for each service? What signals in a service's requirements push you toward one versus the other?
SENIOR
FAQ · 1 QUESTIONS
Frequently Asked Questions
01
Is Google Cloud Platform better than AWS for beginners?
Neither is objectively better — they solve the same problems with different UX and pricing models. GCP tends to have a cleaner CLI (gcloud) and more opinionated managed services like Cloud Run and BigQuery that reduce setup time. AWS has a larger ecosystem and more third-party integrations. For net-new projects without existing cloud investments, GCP's Cloud Run and managed Kubernetes are genuinely excellent starting points, and GCP's free tier is generous enough to learn without a credit card charge.