Intermediate 6 min · March 06, 2026

IaC — State Corruption from Untagged S3 Buckets

Untagged S3 bucket manually deleted corrupts Terraform state.

N
Naren · Founder
Plain-English first. Then code. Then the interview question.
About
Quick Answer
  • IaC defines infrastructure in version-controlled files — treat servers like code
  • Declarative (Terraform) describes desired state; imperative (Ansible) lists steps
  • Remote state storage (S3+DynamoDB) prevents team-wide state corruption
  • Idempotency means 10 applies = same result as 1 — safe for CI/CD
  • Production insight: manual changes in the cloud console cause drift; Terraform overwrites them
  • Biggest mistake: committing terraform.tfstate to Git — exposes plaintext secrets permanently

Every modern software team has faced the same nightmare: a production server dies at 2am, and the engineer who built it left the company six months ago. Nobody wrote anything down. The replacement server gets rebuilt from memory, Slack messages, and guesswork — and it's never quite right. This isn't a people problem. It's a process problem, and Infrastructure as Code (IaC) exists specifically to eliminate it. When your infrastructure lives in code, it lives in Git, in pull requests, in code reviews, and in your CI/CD pipeline — just like the application it runs.

Before IaC, provisioning infrastructure meant logging into a cloud console, clicking through wizards, and hoping the person next to you was watching and taking notes. Every environment — dev, staging, production — drifted apart over time because human hands configured them differently. This 'configuration drift' is the silent killer of reliable deployments. IaC solves this by making infrastructure declarative and repeatable: you describe the desired state of your system, and a tool like Terraform or Ansible figures out how to get there. The same code that spins up your staging environment spins up production, byte for byte.

By the end of this article you'll understand why IaC exists at a systems level, know the difference between declarative and imperative approaches, and have a real working Terraform + GitHub Actions CI/CD pipeline you can adapt for your own projects. You'll also know the two mistakes that catch almost every intermediate engineer off guard when they go to use IaC in a team setting.

Declarative vs Imperative IaC — Choosing the Right Mental Model

There are two ways to tell someone how to make a cup of coffee. The imperative way: 'Boil water. Measure 18g of beans. Grind them. Pour water at 94°C. Wait 4 minutes.' The declarative way: 'I want a black filter coffee in this cup.' The declarative approach lets the system figure out the steps.

This distinction is the most important conceptual split in IaC. Terraform is declarative — you describe what your infrastructure should look like, and Terraform calculates the diff between current state and desired state, then makes the changes. Ansible is imperative by default — you write a sequence of tasks that run top to bottom. Both are valid. The right choice depends on what you're managing.

Declarative tools shine for cloud resource provisioning: creating VPCs, EC2 instances, databases, and load balancers. You don't want to think about order of operations — you just want the result. Imperative tools shine for configuration management: installing packages, editing config files, restarting services. The order genuinely matters there.

In a mature DevOps pipeline you'll often use both: Terraform provisions the server, Ansible configures it. Understanding why they work differently stops you from fighting the tool when it doesn't behave the way you expect.

IaC in a Real CI/CD Pipeline — Automate the Infrastructure Itself

Knowing how to run Terraform locally is a starting point. But IaC's real power unlocks when it runs automatically inside your CI/CD pipeline. Think about it: your application code goes through automated testing before it deploys. Why should your infrastructure changes be any different? A pull request that adds a new RDS database should go through the same review process as a pull request that adds a new API endpoint.

The pattern that works in production is this: on every pull request, run terraform plan and post the output as a PR comment. This gives reviewers an exact, human-readable diff of what will change in the real cloud — before anyone approves it. On merge to main, run terraform apply automatically. No one runs Terraform from their laptop. Ever.

This approach solves three problems at once. It creates an audit trail (every infrastructure change is a Git commit with an author and a timestamp). It prevents 'works on my machine' infrastructure (the pipeline always runs from a clean state). And it forces infrastructure changes through code review, which catches mistakes before they hit production.

The GitHub Actions workflow below implements this exact pattern. It's the real thing — not a toy example.

Remote State and Modules — The Patterns That Make IaC Scale

A single main.tf file works fine for a hobby project. It falls apart the moment you have two engineers, two environments, or two services. This is where two patterns become non-negotiable: remote state backends and modules.

Remote state is how Terraform remembers what it already built. Without it, every terraform apply is flying blind. With a remote backend — like an S3 bucket with a DynamoDB lock table — the state file lives in the cloud, is accessible to everyone on the team, and is locked during applies so two engineers can't run it simultaneously and corrupt each other's work.

Modules are reusable Terraform components. Think of them as functions for infrastructure. Instead of copy-pasting the same EC2 + security group + IAM role configuration for every service, you write it once as a module and call it with different variables for each service. This is the IaC equivalent of the DRY principle and it's what separates a professional IaC setup from a pile of disconnected config files.

Below is a minimal but real remote backend configuration alongside a module call pattern. This is the structure you'd actually find in a production repository.

IaC Security and Secrets Management — Don't Leak Your Infrastructure's Keys

Infrastructure code often requires secrets: API keys, database passwords, cloud provider credentials. A common rookie mistake is hardcoding these in the IaC files. Terraform state files, in particular, store resource attributes in plaintext, which can include sensitive values like database passwords or IAM secret keys.

The rule: IaC code should never contain secrets. Instead, use environment variables, encrypted variables in your CI/CD platform, or a dedicated secrets manager like HashiCorp Vault, AWS Secrets Manager, or GitHub Actions Secrets. For Terraform, use the sensitive = true attribute on outputs, and avoid outputting secrets in plan output. Use data sources that read from a secrets manager rather than embedding values.

Another critical practice: never commit .tfstate files to version control. Even if you delete them later, secrets are permanently exposed in Git history. Add .tfstate and .tfstate.backup to .gitignore on day one. Use a remote backend with encryption at rest.

Below is an example of using environment variables with Terraform, and a pattern for reading secrets from AWS Secrets Manager.

IaC Testing and Validation — Ensure Your Infrastructure Works Before You Apply

Writing infrastructure code without testing is like deploying a microservice without unit tests. You're one typo away from deleting a production database. IaC testing isn't as mature as application testing, but it's evolving fast. Here are three levels of validation every IaC pipeline needs:

  1. Syntax and static analysis: Use terraform validate to catch basic HCL errors. Use tools like tflint for style and potential bugs, and checkov or tfsec for security policy violations. Run these on every PR before the plan step.
  2. Plan review: The manual step where a human reviews the terraform plan output. This catches logical mistakes — like changing a security group that breaks connectivity, or accidentally destroying a stateful resource.
  3. Integration testing: Tools like Terratest let you write Go tests that deploy real infrastructure, run assertions against it, and then destroy it. This is the gold standard, but it's expensive and slow. Use it sparingly for critical resources.

Below is a minimal GitHub Actions step that runs static analysis before the plan. Integrate this into your workflow to catch issues early.

Terraform vs Ansible: Key Differences
AspectTerraform (Declarative)Ansible (Imperative)
Primary use caseCloud resource provisioning — VPCs, VMs, databasesConfiguration management — packages, files, services
ApproachYou describe desired state; Terraform calculates the stepsYou write ordered tasks; Ansible executes them top to bottom
IdempotencyBuilt-in — plan always shows exact diff before applyAchievable but manual — each task must be written idempotently
State managementExplicit state file tracks what was builtStateless — re-runs the playbook against live systems
LanguageHCL (HashiCorp Configuration Language)YAML-based playbooks
Cloud-native fitExcellent — purpose-built for cloud APIsGood — best for OS-level configuration after provisioning
Learning curveSteeper — must understand state, providers, backendsGentler — YAML is readable; SSH-based, no agent required
Team use caseInfrastructure provisioning in CI/CD pipelinesApplication deployment and server hardening automation

Key Takeaways

  • IaC treats infrastructure the same way you treat application code — versioned, reviewed, tested, and deployed through a pipeline. The moment infrastructure lives in Git, it gains an audit trail and becomes repeatable.
  • Declarative tools like Terraform describe the what (desired end state); imperative tools like Ansible describe the how (ordered steps). Use Terraform to provision cloud resources and Ansible to configure the OS inside them — they solve different halves of the same problem.
  • Remote state backends (S3 + DynamoDB for AWS) are non-negotiable for any team setup. Local state files cause silent corruption when two engineers run Terraform against the same environment simultaneously.
  • The gold-standard CI/CD pattern is terraform plan on PR (posted as a comment for review) and terraform apply on merge to main — never run Terraform locally against production in a team environment.
  • Secrets in IaC must be externalised and state files must be encrypted at rest — one committed .tfstate file can expose every database password your team uses.

Common Mistakes to Avoid

  • Committing terraform.tfstate to Git
    Symptom: State file contains plaintext database passwords, API keys, and sensitive outputs. Git history is permanent — even if you delete the file later, secrets are exposed forever in every clone.
    Fix: Add .tfstate and .tfstate.backup to .gitignore on day one. Configure a remote S3 backend before your first apply. Run git secrets or truffleHog in your CI pipeline to catch accidental secret commits.
  • Making manual changes to IaC-managed resources in the cloud console
    Symptom: You log into AWS, tweak a security group rule 'just this once' to fix a prod issue. Next time Terraform runs, it overwrites your manual change because the state file says the old rule is the desired state. Terraform wins every time.
    Fix: Treat IaC-managed resources as read-only in any UI. Make the fix in code, open a PR, let the pipeline apply it. If you need emergency access, import the manual change back into Terraform state immediately with terraform import.
  • Not pinning provider and module versions
    Symptom: Writing `version = latest` or omitting versions entirely means a provider upgrade can break your configuration without any code change on your part. This causes phantom CI failures that are extremely hard to debug because nothing in your diff changed.
    Fix: Always pin with a pessimistic constraint operator — version = '~> 5.0' allows patch updates but blocks breaking major versions. Commit your .terraform.lock.hcl file to Git so the whole team and CI use identical provider binaries.

Interview Questions on This Topic

  • QWhat is configuration drift, and how does Infrastructure as Code prevent it? Can you give a concrete example of how drift occurs without IaC?SeniorReveal
    Configuration drift is the gradual divergence of infrastructure environments from their intended configuration over time. Without IaC, a sysadmin might manually patch a server, apply a hotfix, or change a firewall rule. That change never gets documented or replicated. Six months later, staging and prod are so different that a deployment that works in staging fails in production. IaC prevents drift by codifying the desired state. Every time you apply the code, the system reconciles actual vs. desired state, overwriting any manual changes. Example: a team manually updated the instance type of an EC2 from t3.micro to t3.medium to handle a load spike. They forgot to update the Terraform config. Next deploy cycle, Terraform scaled it back down to t3.micro, causing a performance outage.
  • QExplain the difference between declarative and imperative IaC. If you had to provision a new EC2 instance and then install Nginx on it, which tool would you use for each step and why?Mid-levelReveal
    Declarative IaC (Terraform, CloudFormation) focuses on the end state. You declare: 'I want an EC2 instance with these properties' and the tool determines the steps. Imperative IaC (Ansible, Chef) focuses on the steps: 'Run this command, then that command'. For provisioning the EC2 instance, use Terraform: it handles dependencies like VPC, subnet, security groups, and can track the resource in state. For installing Nginx, use Ansible: it connects to the running instance and executes a sequence of tasks (install package, configure virtual host, start service). Ansible is better at OS-level configuration because order matters (e.g., install package before editing config). Combining both tools is a common production pattern.
  • QIf a colleague manually deleted an AWS resource that Terraform created, what happens when the next terraform apply runs — and how would you handle a situation where you need to remove a resource from Terraform management without destroying it?SeniorReveal
    If a resource is deleted outside Terraform, the state file still thinks it exists. The next terraform plan will show that resource as 'created' because Terraform sees a missing resource and wants to recreate it. To fix, run terraform refresh to update the state to reflect reality (the resource is gone). But if you want to keep the resource but remove it from Terraform management, use terraform state rm <resource_address>. That removes the resource from the state file without deleting the actual cloud resource. Then you can manually manage it or re-import later. This is useful when migrating a resource to a different Terraform configuration or when retiring Terraform but keeping the infrastructure.
  • QHow do you handle secrets in Terraform? What mechanisms prevent them from leaking in state files or plan output?Mid-levelReveal
    Never hardcode secrets. Use a secrets manager like AWS Secrets Manager, HashiCorp Vault, or environment variables. For Terraform, read secrets via data sources (aws_secretsmanager_secret_version). Mark outputs containing secrets as sensitive = true to prevent them appearing in plan output. However, note that even sensitive values are stored in plaintext in the state file. Mitigations: use an encrypted remote backend (S3 with KMS, Terraform Cloud with at-rest encryption), and restrict access to the state file via IAM. Also, tools like checkov can detect potential secret leaks in the code. Finally, never commit .tfstate files to Git.

Frequently Asked Questions

What is the difference between Infrastructure as Code and Configuration Management?

Infrastructure as Code provisions the actual cloud resources — servers, networks, databases — from scratch. Configuration management (Ansible, Chef, Puppet) takes those resources and configures what's running inside them: which packages are installed, which config files exist, which services are running. In practice, you use both: IaC to build the infrastructure, configuration management to set it up.

Do I need to know programming to use Terraform or Ansible?

Not deeply, but you do need to understand variables, conditionals, and loops — concepts that exist in both HCL and Ansible's YAML. Terraform's HCL is specifically designed to be more readable than a general-purpose programming language. The bigger learning curve is understanding cloud concepts (networking, IAM, VPCs) than the IaC syntax itself.

Is Terraform the same as CloudFormation?

Both are declarative IaC tools, but CloudFormation is AWS-only and tightly integrated with AWS services, while Terraform is cloud-agnostic — the same tool works for AWS, Azure, GCP, and hundreds of other providers using the same workflow. Teams on a single cloud often find CloudFormation simpler; multi-cloud teams or those who want portability prefer Terraform.

How much does Infrastructure as Code cost?

Terraform is open source and free. Terraform Cloud has a free tier for up to 5 users. Ansible is also free and open source. The main costs are cloud resources and the time to write and maintain the code. IaC saves money in the long run by reducing manual errors, simplifying disaster recovery, and enabling reliable scaling.

🔥

That's CI/CD. Mark it forged?

6 min read · try the examples if you haven't

Previous
Canary Releases Explained
8 / 14 · CI/CD
Next
ArgoCD for GitOps