Senior 3 min · March 09, 2026

GCP — Service Account Editor Deletes Production DB

A misconfigured gcloud config with Editor role deleted production 'prod-db' Cloud SQL.

N
Naren · Founder
Plain-English first. Then code. Then the interview question.
About
 ● Production Incident 🔎 Debug Guide
Quick Answer
  • GCP is a cloud platform built on Google's internal infrastructure, optimized for data and containers
  • Core hierarchy: Organization → Folders → Projects → Resources — drives billing and IAM inheritance
  • Primary compute options: Compute Engine (VMs), GKE (Kubernetes), Cloud Run (serverless containers)
  • Global network: 35+ regions, 100+ zones, private fiber — adds ~30ms latency vs on-prem for distant users
  • Production trap: Default VPC with open firewall rules can expose services; always create custom VPCs
  • Biggest mistake: Granting primitive roles (Owner/Editor) instead of predefined roles — violates least privilege
Plain-English First

Think of Google Cloud Platform as a giant, high-tech utility company for your digital ideas. Just like you plug a lamp into a wall to get electricity without building a power plant, GCP lets you 'plug in' your website or app to use Google's massive network of supercomputers. You don't have to buy the hardware; you just pay for the amount of 'power' you use, allowing you to scale from a small garage project to a global service overnight.

Google Cloud Platform (GCP) is a suite of cloud computing services that runs on the same infrastructure that Google uses internally for its end-user products, such as Google Search and YouTube. In the modern DevOps landscape, GCP isn't just another provider; it is the pioneer of containerization and planet-scale data processing.

In this guide, we'll break down exactly what GCP is, why it was designed to prioritize data and containerization, and how to navigate its core hierarchy to manage projects correctly. We will explore the shift from managing physical 'boxes' to managing software-defined ecosystems.

By the end, you'll have both the conceptual understanding and practical CLI examples to start deploying resources on Google Cloud with confidence.

The GCP Resource Hierarchy: Organization to Resources

GCP exists to solve the problem of infrastructure management at global scale. While other providers focused on virtual machines, Google focused on high-level services, Kubernetes (which it invented), and advanced data analytics. GCP is structured around a strict resource hierarchy: Organization > Folders > Projects > Resources. This hierarchy is the backbone of governance; policies and billing are inherited downward. This ensures that permissions (IAM) and cost centers can be managed granularly across massive enterprise teams without losing centralized control.

io/thecodeforge/gcp/GCPInit.shBASH
1
2
3
4
5
6
7
8
9
10
11
12
13
14
# io.thecodeforge: Initializing the Google Cloud SDK and project environment

# 1. Authenticate with Google Cloud securely
gcloud auth login

# 2. Create a new project for TheCodeForge development
# Projects are the primary grouping for billing and APIs
gcloud projects create thecodeforge-dev-2026 --name="Forge Dev Project"

# 3. Set the project as your current active context
gcloud config set project thecodeforge-dev-2026

# 4. Enable core APIs required for common DevOps workflows
gcloud services enable compute.googleapis.com container.googleapis.com bigquery.googleapis.com
Output
Project [thecodeforge-dev-2026] created successfully.
Updated property [core/project].
Operation finished successfully. Services [compute.googleapis.com, container.googleapis.com, bigquery.googleapis.com] are enabled.
Key Insight:
The most important thing to understand about GCP is its Project-centric model. Every resource you create must belong to a Project. This allows for isolated billing and environment-specific security policies.
Production Insight
A common production failure occurs when teams skip Folder creation and put all projects under the Organization node.
Without Folders, IAM policies become tangled — a developer role meant for one team leaks to another.
Rule: always create at least one Folder per environment (dev, staging, prod) to enforce boundary at scale.
Key Takeaway
Resource hierarchy defines IAM and billing inheritance.
Projects are the atomic unit of resource grouping.
Folders prevent policy spaghetti across teams.

Identity and Access Management (IAM): Security at the Core

When starting with GCP, most developers hit the same set of gotchas regarding Identity and Access Management (IAM) and networking. A common mistake is using the 'Primitive Roles' (Owner, Editor, Viewer) at the project level, which grants too much power and violates the Principle of Least Privilege. Instead, use 'Predefined Roles' that grant access only to specific services like Cloud Storage or BigQuery. Furthermore, Google's global network allows for 'Global VPCs,' meaning your internal traffic can traverse Google's private fiber across continents without ever hitting the public internet.

io/thecodeforge/gcp/IAMBestPractices.shBASH
1
2
3
4
5
6
7
8
9
10
11
12
13
# io.thecodeforge: Granting narrow permissions instead of project-wide access

# BANNED: Granting Editor role (Violation of Least Privilege)
# gcloud projects add-iam-policy-binding thecodeforge-dev-2026 --member="user:dev@example.com" --role="roles/editor"

# RECOMMENDED: Granting specific read-only access to Cloud Storage objects
gcloud projects add-iam-policy-binding thecodeforge-dev-2026 \
    --member="user:dev@thecodeforge.io" \
    --role="roles/storage.objectViewer"

# PRODUCTION STEP: Create a specific service account for an application
gcloud iam service-accounts create forge-app-sa \
    --display-name="TheCodeForge App Service Account"
Output
Updated IAM policy for project [thecodeforge-dev-2026].
Created service account [forge-app-sa].
Watch Out:
The most common mistake with GCP is ignoring the 'Default Network'. In production, you should always create a custom VPC (Virtual Private Cloud) to ensure your internal IP ranges don't conflict and your firewall rules are explicitly defined.
Production Insight
Teams often assume that service account keys are safe if stored in Cloud KMS.
But keys have no expiration by default — a leaked key grants access forever.
Rule: use Workload Identity Federation instead of downloadable keys for CI/CD; rotate keys every 90 days if unavoidable.
Key Takeaway
Primitive roles are too permissive for production.
Use predefined roles and service accounts for least privilege.
IAM conditions can restrict access by time, IP, or resource.

Compute Services: VMs, Containers, and Serverless

GCP offers three primary compute paths: Compute Engine (raw VMs), Google Kubernetes Engine (managed Kubernetes), and Cloud Run (fully managed serverless containers). Each addresses a different operational profile. Compute Engine gives the most control but requires managing OS updates and scaling. GKE automates container orchestration but introduces cluster maintenance overhead. Cloud Run removes infrastructure entirely — you just supply a container image and GCP handles scaling, load balancing, and even zero-instance cold starts. The right choice depends on your team's Kubernetes expertise and traffic predictability.

io/thecodeforge/gcp/ComputeExamples.shBASH
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# io.thecodeforge: Creating and deploying compute resources

# 1. Create a Compute Engine instance (n2-standard-2)
gcloud compute instances create forge-web-vm \
    --zone=us-central1-a \
    --machine-type=n2-standard-2 \
    --image-family=ubuntu-2204-lts \
    --image-project=ubuntu-os-cloud \
    --tags=http-server,https-server

# 2. Deploy a stateless container on Cloud Run
gcloud run deploy forge-api \
    --image=gcr.io/thecodeforge-dev-2026/api:1.0 \
    --region=us-central1 \
    --platform=managed \
    --allow-unauthenticated \
    --memory=512Mi \
    --concurrency=80

# 3. Create a GKE autopilot cluster
gcloud container clusters create-auto forge-cluster \
    --region=us-central1 \
    --project=thecodeforge-dev-2026
Output
Created instance [forge-web-vm].
Deploying container to Cloud Run service [forge-api]...
Creating cluster forge-cluster...done.
Performance Insight
Compute Engine instances with GPUs (e.g., for ML training) can be cost-effective but cold start times for Cloud Run are under 1 second if CPU is always allocated. GKE Autopilot removes node management but has a 5-minute provisioning time for new nodes.
Production Insight
A startup ran their entire API on Cloud Run with a single container.
When traffic spiked, Cloud Run autoscaled to 250 instances, but each instance hammered the same Postgres connection pool — not designed for 250 connections.
Rule: decouple stateful backends with a connection pooler (e.g., PgBouncer) or use Cloud SQL proxy's built-in pooling.
Key Takeaway
VMs for full control, GKE for orchestration at scale.
Cloud Run for serverless simplicity — but watch for concurrent backend connections.
Always test autoscaling behavior under load before production launch.

Data & Analytics: BigQuery, Dataflow, and Pub/Sub

GCP's strength lies in its data and analytics services. BigQuery is a serverless data warehouse that processes petabytes using SQL, with no infrastructure to manage. Dataflow (based on Apache Beam) handles streaming and batch data processing pipelines. Pub/Sub provides asynchronous messaging at scale, often used for event-driven architectures. Together, these form the backbone of real-time and batch analytics. They integrate tightly with IAM for fine-grained access control and with Cloud DLP for sensitive data protection.

io/thecodeforge/gcp/DataPipeline.shBASH
1
2
3
4
5
6
7
8
9
10
11
12
13
# io.thecodeforge: Using BigQuery, Dataflow, and Pub/Sub together

# 1. Create a Pub/Sub topic for order events
gcloud pubsub topics create forge-order-events

# 2. Create a BigQuery dataset
gcloud bq datasets create forge_analytics --location=US

# 3. Submit a Dataflow pipeline that reads from Pub/Sub and writes to BigQuery
gcloud dataflow jobs run forge-stream-pipeline \
    --gcs-location=gs://dataflow-templates/latest/PubSub_to_BigQuery \
    --parameters=inputTopic=projects/thecodeforge-dev-2026/topics/forge-order-events,\
outputTableSpec=thecodeforge-dev-2026:forge_analytics.orders
Output
Created topic [forge-order-events].
Created dataset 'thecodeforge-dev-2026:forge_analytics'.
Job [forge-stream-pipeline] submitted successfully.
How Data Flows in GCP
  • Pub/Sub decouples event producers from consumers — at-least-once delivery, no ordering guarantee by default.
  • Dataflow pipelines auto-scale based on backlog — but beware of data skew causing stragglers.
  • BigQuery charges per query ($5 per TB scanned) — use clustering and partitioning to reduce scan bytes.
  • Combine with Cloud Storage for data lakes: cheap storage, then query with BigQuery or Spark on Dataproc.
Production Insight
A fintech company used BigQuery for real-time fraud detection.
Queries scanning full tables took 30+ seconds — too slow.
They partitioned by event_timestamp and clustered by user_id, reducing scan to 10% of the table.
Rule: always partition by time-based columns; cluster by high-cardinality filter columns.
Key Takeaway
BigQuery is serverless — no clusters to manage.
Partitioning and clustering reduce cost and latency.
Pub/Sub + Dataflow enables real-time analytics at global scale.

Networking and Security: VPCs, Firewalls, and VPNs

GCP's global network is a first-class product. You can create a single VPC that spans regions, with subnets in each zone. Firewall rules are stateful, and you can use Cloud NAT to give private instances outbound internet access without public IPs. For hybrid cloud, Cloud VPN or Dedicated Interconnect connects your on-premises network. The default network is open by default — not safe for production. Always create custom VPCs in 'Custom Subnet Mode' to define your own CIDR ranges and avoid overlap.

io/thecodeforge/gcp/NetworkSetup.shBASH
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
# io.thecodeforge: Creating a custom VPC with firewall rules

# 1. Create a custom mode VPC (no default subnets)
gcloud compute networks create forge-vpc --subnet-mode=custom

# 2. Create subnets in multiple regions
gcloud compute networks subnets create us-east-subnet \
    --network=forge-vpc \
    --region=us-east1 \
    --range=10.0.1.0/24

gcloud compute networks subnets create europe-west-subnet \
    --network=forge-vpc \
    --region=europe-west1 \
    --range=10.0.2.0/24

# 3. Create a firewall rule to allow SSH from a specific IP
# (Avoid open 0.0.0.0/0 for production)
gcloud compute firewall-rules create allow-ssh-corp \
    --network=forge-vpc \
    --allow=tcp:22 \
    --source-ranges=203.0.113.0/24
Output
Created network [forge-vpc].
Created subnet [us-east-subnet].
Created subnet [europe-west-subnet].
Created firewall rule [allow-ssh-corp].
Production Pitfall
Using the Default VPC with auto-created subnets can lead to IP range conflicts when connecting to on-premises networks. Always create custom VPCs with planned address ranges.
Production Insight
A media company had two GCP projects peered via VPC peering, but both used 10.0.0.0/16 subnets.
Routing conflicts caused intermittent HTTP 502s.
They had to recreate one project's VPC with a different CIDR — a multi-day migration.
Rule: never use overlapping CIDR ranges across peered VPCs or hybrid connections.
Key Takeaway
Custom VPCs for production — never use the default.
Plan IP ranges to avoid overlaps with peers and on-prem.
Use Cloud NAT for internet egress from private instances.
● Production incidentPOST-MORTEMseverity: high

Service Account with Editor Role Deletes Production Database

Symptom
Cloud SQL instance 'prod-db' disappeared. No manual intervention by a human engineer. Billing alerts showed zero QPS for the database.
Assumption
The team assumed that using a dedicated service account for CI/CD was sufficient security. They didn't realize the Editor role included the sql.instances.delete permission.
Root cause
A script in the CI/CD pipeline executed a cleanup command that targeted the wrong project because of a misconfigured gcloud config. The Editor role on the service account allowed the deletion without additional authentication.
Fix
1. Revoke the Editor role from all service accounts. 2. Grant only the minimal roles needed (e.g., roles/cloudsql.client for read/write, roles/iam.serviceAccountUser for deployment). 3. Enable Organization Policies to restrict deletion of SQL instances. 4. Add a terraform prevent_destroy lifecycle block to production databases.
Key lesson
  • Never grant primitive roles to service accounts used in CI/CD pipelines.
  • Always test gcloud config and project context in CI/CD steps before destructive commands.
  • Use IAM Recommender and Policy Analyzer to audit granted permissions quarterly.
Production debug guideQuick fixes for the most common GCP production issues5 entries
Symptom · 01
Cannot SSH into Compute Engine VM instance
Fix
Check OS Login setting, metadata SSH keys, firewall rule allow-ssh (port 22), and IAM permissions (roles/compute.osLogin). Use gcloud compute ssh with --troubleshoot flag.
Symptom · 02
Cloud Run service returns 403 Forbidden
Fix
Verify that the service account used by the Cloud Run revision has the correct IAM roles (e.g., roles/run.invoker). Check that the request includes an Authorization header with a valid Bearer token.
Symptom · 03
BigQuery query times out (30 min limit)
Fix
Review the query execution plan via the UI. Look for slow stages (shuffle, sort). Optimize: use clustering, partitioning, approximate aggregations, or materialized views.
Symptom · 04
GKE pod stuck in ContainerCreating status
Fix
Run kubectl describe pod <name> to see events. Common causes: insufficient quota, persistent volume claim not bound, node pool autoscaling delay, or network policy blocking pull. Check node resource usage: kubectl top nodes.
Symptom · 05
Cloud Storage uploads are slow (latency >1s)
Fix
Use the gsutil perfdiag tool to benchmark throughput. Switch to regional bucket in the same region as your compute. For large files, use parallel composite uploads or transfer service.
★ Quick Debug Cheat Sheet for GCPCommands for diagnosing the most common GCP production issues
Instance unreachable (SSH/HTTP)
Immediate action
Run gcloud compute ssh with troubleshooting
Commands
gcloud compute ssh INSTANCE_NAME --zone=ZONE --troubleshoot
gcloud compute instances get-serial-port-output INSTANCE_NAME --zone=ZONE
Fix now
Check firewall rules: gcloud compute firewall-rules list — ensure allow-ssh (tcp:22) or allow-http (tcp:80) exists.
GKE pod CrashLoopBackOff+
Immediate action
Inspect pod logs and describe pod
Commands
kubectl logs POD_NAME --tail=50
kubectl describe pod POD_NAME | grep -A10 Events:
Fix now
If OOMKilled, increase memory limits. If ImagePullBackOff, check image name and registry access.
Cloud Run revision fails with 503+
Immediate action
Check revision logs in Cloud Logging
Commands
gcloud beta run revisions describe REVISION --region=REGION --platform=managed
gcloud logging read "resource.type=cloud_run_revision AND resource.labels.revision_name=REVISION" --limit=50
Fix now
Verify that max instances is not 0, and that the service has enough CPU (use CPU always allocated for latency-sensitive apps).
IAM permission denied for gcloud command+
Immediate action
Check current authenticated user/project and IAM policy
Commands
gcloud auth list && gcloud config list project
gcloud projects get-iam-policy PROJECT_ID --flatten='bindings[].members' --format='table(bindings.role)'
Fix now
Re-authenticate with correct account: gcloud auth login or use service account key: gcloud auth activate-service-account --key-file=...
Cloud vs On-Premise: The Shift to GCP
AspectTraditional On-PremiseGoogle Cloud Platform
HardwareManual purchase/setup (CapEx)Software-defined (API driven, OpEx)
ProvisioningWeeks for hardware arrivalMilliseconds via Terraform/CLI
Global ReachLimited to local data centersGlobal network (35+ Regions, 100+ Zones)
SecurityPerimeter-based (Firewalls)Identity-based (Zero Trust/BeyondCorp)
MaintenanceOS patching/Hardware swapsManaged Services (Serverless/PaaS)

Key takeaways

1
GCP is built on a highly-optimized global network, offering superior latency for data-heavy applications and global load balancing.
2
The Resource Hierarchy (Org > Folder > Project) is the mandatory foundation for security and billing governance.
3
Always follow the Principle of Least Privilege
use Predefined or Custom Roles rather than Primitive roles like 'Editor'.
4
Automation is king
Use the 'gcloud' CLI and Infrastructure as Code (Terraform) to ensure environments are reproducible and human-error-free.
5
Leverage Google’s innovation
If you are doing Data Analytics (BigQuery) or Containers (GKE), you are using the industry gold standard.

Common mistakes to avoid

5 patterns
×

Over-provisioning resources

Symptom
Monthly bill 3x higher than expected with many idle VMs.
Fix
Use Google's 'Recommender' tool to find idle VMs or oversized databases. It often suggests moving to smaller machine types based on actual historical utilization.
×

Leaving the Default VPC in place

Symptom
Internal IP conflicts when peering with another project or on-premises network.
Fix
Create custom VPCs in 'Custom Subnet Mode' to control network isolation, internal IP ranges, and firewall rules explicitly.
×

Ignoring the service account lifecycle

Symptom
A leaked service account JSON key is used by an attacker to access Cloud Storage.
Fix
Avoid long-lived JSON keys. Use Workload Identity Federation or short-lived tokens to keep CI/CD pipelines secure. Regularly rotate keys if unavoidable.
×

Running everything on VMs

Symptom
High compute costs for a simple stateless API that could run on Cloud Run.
Fix
If your workload is a container, look at Cloud Run or GKE first. Don't pay for an idle OS if you only need to run a function.
×

Not enabling VPC Flow Logs

Symptom
During a security incident, no network traffic logs to investigate.
Fix
Enable VPC Flow Logs on subnets (costs money but is essential for forensics). Set log sampling rate to 0.5 for cost-effective coverage.
INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01JUNIOR
Explain the GCP Resource Hierarchy. Why would an enterprise use 'Folders...
Q02SENIOR
You need to migrate a high-latency database to GCP. How does Google's 'P...
Q03JUNIOR
Describe the difference between Primitive Roles and Predefined Roles in ...
Q04SENIOR
What is the function of a 'Service Account' in GCP, and how does it diff...
Q05JUNIOR
Compare Regions vs. Zones. If you require high availability for a web ap...
Q06SENIOR
What is the difference between BigQuery and Cloud SQL? Which would you u...
Q07SENIOR
Explain how GCP's Global VPC works and why it matters for multi-region d...
Q01 of 07JUNIOR

Explain the GCP Resource Hierarchy. Why would an enterprise use 'Folders' instead of just 'Projects'?

ANSWER
The hierarchy is Organization > Folders > Projects > Resources. Folders allow grouping projects by team or environment. Without folders, you cannot apply common IAM policies or billing controls to multiple projects at once. For example, you can assign a folder admin to each team lead, giving them autonomy over their projects while the org-level policies remain centrally managed.
FAQ · 5 QUESTIONS

Frequently Asked Questions

01
What is the difference between GCP and AWS?
02
Is GCP cheaper than AWS?
03
What is the GCP free tier?
04
How do I choose between Cloud Run and GKE?
05
How does GCP handle security compliance?
🔥

That's Google Cloud. Mark it forged?

3 min read · try the examples if you haven't

Previous
Ansible Roles and Best Practices
1 / 4 · Google Cloud
Next
GCP vs AWS vs Azure — Key Differences