Infrastructure as Code (IaC) is the practice of managing and provisioning infrastructure through machine-readable definition files, rather than manual processes or ad-hoc scripts. It solves the fundamental problem of configuration drift and environment inconsistency by treating infrastructure as version-controlled, reproducible artifacts.
★
Imagine you're building a LEGO city.
Instead of SSH-ing into servers to fix things or running a series of commands that may work today but fail tomorrow, you declare the desired end state of your infrastructure in code, and the IaC tool figures out how to get there. This shift from imperative 'how-to' scripts to declarative 'what-should-be' definitions is what makes IaC the backbone of modern DevOps and cloud-native operations.
IaC sits at the intersection of configuration management, orchestration, and cloud provisioning. Tools like Terraform and AWS CDK focus on declarative provisioning of cloud resources (VPCs, S3 buckets, load balancers), while Ansible handles configuration management and application deployment.
The key distinction is state management: Terraform maintains a state file that tracks every resource it manages, making it a state machine that can detect and correct drift. Ansible is stateless by default, executing tasks idempotently but without tracking resource lineage.
AWS CDK compiles high-level constructs into CloudFormation templates, giving you the power of a programming language with the safety of declarative infrastructure. When not to use IaC? For tiny, throwaway environments or rapid prototyping where the overhead of state management and code reviews outweighs the benefits.
But for any production system, IaC is non-negotiable.
The push vs. pull model further defines how IaC operates. In a pull model (common with tools like Chef or Puppet), agents on each machine periodically check a central server for desired state and apply changes. In a push model (Terraform, Ansible), you trigger execution from a central point—typically a CI/CD pipeline—that pushes the desired state to the infrastructure.
The push model is simpler for cloud-native architectures where you're provisioning ephemeral resources, while pull models excel in long-lived server fleets. The real power emerges when you combine IaC with CI/CD: every commit to your infrastructure repository triggers a pipeline that plans, validates, and applies changes, with state files stored in remote backends (S3 with DynamoDB locking, Terraform Cloud, or HashiCorp Consul) to prevent corruption.
This is where untagged S3 buckets become a silent killer—without proper tagging and lifecycle policies, state files can be accidentally deleted or overwritten, leading to orphaned resources and manual recovery nightmares.
Plain-English First
Imagine you're building a LEGO city. Instead of photographing your city and hoping you can recreate it from memory, you keep the instruction booklet. Whenever a tornado (server crash) hits, you just follow the booklet and rebuild it perfectly in minutes. Infrastructure as Code is that instruction booklet — except for your servers, networks, and cloud resources. Your entire data centre, written down as files you can version, share, and replay on demand.
Every modern software team has faced the same nightmare: a production server dies at 2am, and the engineer who built it left the company six months ago. Nobody wrote anything down. The replacement server gets rebuilt from memory, Slack messages, and guesswork — and it's never quite right. This isn't a people problem. It's a process problem, and Infrastructure as Code (IaC) exists specifically to eliminate it. When your infrastructure lives in code, it lives in Git, in pull requests, in code reviews, and in your CI/CD pipeline — just like the application it runs.
Before IaC, provisioning infrastructure meant logging into a cloud console, clicking through wizards, and hoping the person next to you was watching and taking notes. Every environment — dev, staging, production — drifted apart over time because human hands configured them differently. This 'configuration drift' is the silent killer of reliable deployments. IaC solves this by making infrastructure declarative and repeatable: you describe the desired state of your system, and a tool like Terraform or Ansible figures out how to get there. The same code that spins up your staging environment spins up production, byte for byte.
By the end of this article you'll understand why IaC exists at a systems level, know the difference between declarative and imperative approaches, and have a real working Terraform + GitHub Actions CI/CD pipeline you can adapt for your own projects. You'll also know the two mistakes that catch almost every intermediate engineer off guard when they go to use IaC in a team setting.
Why Infrastructure as Code Is a State Machine, Not a Script
Infrastructure as Code (IaC) is the practice of defining and managing infrastructure — servers, networks, databases — through machine-readable definition files, not manual CLI commands or click-ops. The core mechanic is declarative: you specify the desired end state (e.g., three EC2 instances, one RDS read replica), and the IaC tool computes the diff between current and desired state, then executes only the necessary create/update/delete operations. This turns infrastructure into a reproducible artifact, versioned alongside application code.
In practice, IaC tools like Terraform, CloudFormation, or Pulumi maintain a state file that maps real-world resources to your definitions. This state is the source of truth — it tracks resource IDs, dependencies, and metadata. When you run a plan, the tool compares your config against this state, not against live cloud APIs directly. That means state corruption (e.g., from manual changes or untagged resources) causes drift: the tool sees a resource as missing when it's actually present, or vice versa, leading to duplicate resources, deletion of production data, or failed applies.
Use IaC for any environment that outlives a single developer session — production, staging, even long-lived dev environments. The value compounds when you need to recreate an environment from scratch (disaster recovery, blue/green deployments) or audit changes across a team. Without IaC, you're one accidental click away from an irreproducible mess. With it, you get deterministic provisioning, change history, and the ability to roll back infrastructure changes like code.
State Is Not Optional
Treat your IaC state file as a production artifact — lock it, back it up, and never edit it manually. One bad merge can orphan an entire environment.
Production Insight
A team ran terraform apply after a manual S3 bucket rename — the state still referenced the old name, so Terraform tried to create a new bucket with the old name, failing because the name was taken.
The symptom: plan shows 'create' for a resource that already exists, or 'destroy' for a resource still in use.
Rule: Never rename or delete cloud resources outside IaC — always update the definition and let the tool reconcile.
Key Takeaway
IaC is a state machine: the state file is the single source of truth, not the cloud console.
Drift is inevitable — detect it with periodic plans or drift detection tools, not manual inspection.
Tag every resource consistently; untagged resources are invisible to state reconciliation and cause silent corruption.
thecodeforge.io
IaC State Corruption from Untagged S3 Buckets
Infrastructure As Code
Declarative vs Imperative IaC — Choosing the Right Mental Model
There are two ways to tell someone how to make a cup of coffee. The imperative way: 'Boil water. Measure 18g of beans. Grind them. Pour water at 94°C. Wait 4 minutes.' The declarative way: 'I want a black filter coffee in this cup.' The declarative approach lets the system figure out the steps.
This distinction is the most important conceptual split in IaC. Terraform is declarative — you describe what your infrastructure should look like, and Terraform calculates the diff between current state and desired state, then makes the changes. Ansible is imperative by default — you write a sequence of tasks that run top to bottom. Both are valid. The right choice depends on what you're managing.
Declarative tools shine for cloud resource provisioning: creating VPCs, EC2 instances, databases, and load balancers. You don't want to think about order of operations — you just want the result. Imperative tools shine for configuration management: installing packages, editing config files, restarting services. The order genuinely matters there.
In a mature DevOps pipeline you'll often use both: Terraform provisions the server, Ansible configures it. Understanding why they work differently stops you from fighting the tool when it doesn't behave the way you expect.
main.tfHCL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
# main.tf — Terraform declarative configuration for a basic web server on AWS
# Run: terraform init && terraform plan && terraform apply
# TellTerraform which cloud provider to use and which region
provider "aws" {
region = "us-east-1"
}
# Declare the desired state: an EC2 instance running Ubuntu
# Terraform will CREATEthisif it doesn't exist, UPDATE it if the spec changed,
# and doNOTHINGif it already matches — this is idempotency in action
resource "aws_instance""web_server" {
ami = "ami-0c02fb55956c7d316" # Ubuntu22.04LTS in us-east-1
instance_type = "t3.micro" # Free-tier eligible size
# Tag the resource so humans can find it in the AWS console
tags = {
Name = "theforge-web-server"Environment = "staging"ManagedBy = "terraform" # Critical: tells the team this isn't manually managed
}
}
# Output block: print the publicIP after apply so we know where to connect
output "web_server_public_ip" {
description = "The public IP address of our web server"
value = aws_instance.web_server.public_ip
}
Always tag IaC-managed resources with 'ManagedBy: terraform' (or whichever tool you use). When a colleague logs into the AWS console and sees an unfamiliar resource, this tag tells them not to touch it manually — because Terraform owns its state. Without it, someone will 'fix' it by hand and create a state drift nightmare on the next apply.
Production Insight
The ManagedBy tag isn't decoration — it's a human-readable signal that prevents accidental manual changes.
In one incident, a missing tag led to a team spending 4 hours debugging a security group that kept reverting every 15 minutes.
Rule: if it's managed by IaC, tag it. No exceptions.
Key Takeaway
Declarative IaC abstracts order-of-operations; imperative gives you precise control.
Use Terraform for cloud resources, Ansible for OS config — they solve different halves.
Pick declarative when you care about the destination; imperative when the journey matters.
IaC Tools Comparison: Terraform, Ansible, and AWS CDK
Choosing the right IaC tool is like choosing the right hammer — they look similar but each is designed for a specific nail. The three most popular tools each have distinct strengths: Terraform for declarative cloud provisioning, Ansible for imperative configuration management, and AWS CDK for developers who want to write infrastructure in familiar programming languages.
Feature
Terraform (Declarative)
Ansible (Imperative)
AWS CDK (Imperative-like)
Primary use
Cloud resource provisioning
Configuration and application deployment
Cloud resource provisioning (AWS only)
Approach
Declare desired state, tool computes steps
Write ordered tasks, tool executes
Write code (TypeScript, Python, etc.) that generates CloudFormation
State management
Explicit state file (local or remote)
Stateless — runs against live systems
CloudFormation stack (state managed by AWS)
Idempotency
Built-in via plan/apply
Manual — each task must be idempotent
Built-in via CloudFormation
Language
HCL
YAML
TypeScript, Python, Java, C#, Go
Learning curve
Moderate (must understand state, providers)
Gentle (YAML is easy, no agent required)
Steep (need to understand programming and cloud abstractions)
Best for
Multi-cloud, team with dedicated IaC knowledge
Quick automation, server configuration, hybrid environments
AWS-only teams with developers comfortable in TypeScript/Python
CDK (Cloud Development Kit) is unique because it lets you define AWS resources using general-purpose languages. Under the hood, CDK synthesises CloudFormation templates — so you get the safety of declarative state management with the expressiveness of code. This is increasingly popular in DevOps-heavy teams that already use TypeScript for backend services.
In production, pick Terraform if you need multi-cloud or a mature state management story. Pick Ansible if you're configuring servers and don't want to manage state files. Pick CDK if your team lives in AWS and writes TypeScript daily. Mixing Terraform (cloud) + Ansible (config) is the most common pattern among teams that need flexibility.
cdk-stack.tsTYPESCRIPT
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
// cdk-stack.ts — AWS CDK example (TypeScript)// This creates an S3 bucket with versioning enabled.// Run: cdk synth && cdk deployimport * as cdk from'aws-cdk-lib';
import * as s3 from'aws-cdk-lib/aws-s3';
exportclassMyStackextends cdk.Stack {
constructor(scope: cdk.App, id: string, props?: cdk.StackProps) {
super(scope, id, props);
// The bucket is created with versioning — idempotent because CloudFormation tracks itnew s3.Bucket(this, 'TheForgeBucket', {
versioned: true,
bucketName: 'theforge-iac-demo-bucket',
removalPolicy: cdk.RemovalPolicy.RETAIN, // Prevent accidental deletion
});
}
}
const app = new cdk.App();
newMyStack(app, 'TheForgeStack');
CDK generates CloudFormation templates — so you inherit CloudFormation's limitations (200 resource limit per stack, 1MB template size). If you anticipate needing multi-cloud or complex resource dependencies across providers, Terraform is a safer bet. CDK is AWS-only and opinionated toward AWS best practices.
Production Insight
A team once tried to manage an on-premises VMware cluster with CDK — it doesn't support non-AWS providers.
They wasted two weeks before switching to Terraform.
Rule: CDK is great for AWS-only shops; for anything else, use Terraform or Ansible.
Key Takeaway
Terraform for declarative cloud provisioning, Ansible for imperative config, CDK for AWS-native teams.
Mixing tools is common — pick based on your infrastructure footprint.
The ManagedBy tag applies regardless of tool: tag resources to prevent manual drift.
Push vs Pull Model in IaC — How Agents and Agentless Architectures Compare
IaC tools fall into two operational models: push and pull. This distinction affects everything from security posture to network topology. Understanding it helps you choose the right tool for your environment — and debug failures when the model conflicts with your infrastructure.
Push model (Agentless): The orchestration server (or user's laptop) directly connects to target nodes via SSH or WinRM and executes commands. Ansible is the canonical example. No agent needs to be installed on the target — the orchestrator pushes configuration to the node. This is simple to set up initially but requires network connectivity from the orchestrator to every target. In production pipelines, this often means running Ansible from a CI runner that has SSH access to your fleet.
Pull model (Agent-based): An agent is installed on each target node. The agent periodically polls a central server (or service) for its desired configuration, pulls it down, and applies it locally. Chef and Puppet (in default mode) work this way. Pull models scale better at the cost of more complex initial setup. The agent handles retries, avoids single orchestrator bottlenecks, and works even when the target is behind NAT or a firewall.
The trade-off: push is simpler for small infrastructures, pull is more resilient at scale. Many mature DevOps teams adopt a hybrid approach: Terraform (push for provisioning) + Chef/Puppet (pull for ongoing config).
ansible-push-vs-chef-pull.mdYAML
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
# Ansible (Push) — from a control node to managed hosts
# No agents required; uses SSH
- name: Ensure nginx is installed on web servers
hosts: webservers
tasks:
- name: Install nginx
ansible.builtin.package:
name: nginx
state: present
# Chef (Pull) — agent on each node polls ChefServer periodically
# Requires a chef-client agent and a ChefServer
# node.rb (client.rb)
chef_server_url "https://chef-server.example.com/organizations/myorg"
node_name "webserver1.example.com"
client_key "/etc/chef/client.pem"
# A run-list on the ChefServer defines which recipes to apply
# The agent runs every 30minutes (or triggered manually)
Output
Ansible push:
PLAY [webservers] ************************************************
[2026-05-12T10:00:01+00:00] INFO: Chef Run complete in 0.5 seconds
Production Reality: Hybrid Approach
In many production environments, you'll use push for provisioning (Terraform applies infrastructure) and pull for ongoing management (Chef/Puppet agents keep servers in compliance). This gives you the simplicity of push for initial setup and the resilience of pull for long-term configuration drift prevention.
Production Insight
A financial services company had 2000 VMs behind strict firewalls — Ansible couldn't connect because SSH was blocked.
They switched to Chef's pull model, and the agents ran successfully through NAT.
Rule: if your network topology prohibits inbound connections, pull models are your only option.
Key Takeaway
Push (Ansible) is simpler but requires network reachability; pull (Chef, Puppet) scales better behind firewalls.
Choose based on network constraints and fleet size.
Many teams use both: push for provisioning, pull for ongoing config.
Push vs Pull Model Flow
Idempotency vs Immutability — Two Pillars of Reliable Infrastructure
Two concepts are often confused but serve different purposes: idempotency and immutability. Both make infrastructure safer, but they achieve safety through different mechanisms.
Idempotency means running the same operation multiple times produces the same result as running it once. Terraform is idempotent: if you apply the same configuration twice, the second apply does nothing (no changes). Idempotent tools are forgiving — you can rerun them as often as needed without side effects. This is essential for CI/CD where multiple triggers may attempt an apply.
Immutability means you never modify a running resource. Instead, you replace it entirely. When a configuration change is needed, you build a new server (or container) and switch traffic to it, then destroy the old one. This guarantees that the running system always matches the artifact that was built in CI. No drift, no snowflake servers. Tools like Packer and Docker champion immutability — you bake the configuration into an image, then deploy the image.
Aspect
Idempotency
Immutability
What it guarantees
Safe re-runs of the same apply
The running resource is exactly what was built
How it's achieved
State tracking, diff calculation
Blue/green deployments, image-based builds
Tool examples
Terraform, Ansible (if written idempotently)
Packer, Docker, AWS AMI pipelines
Conflict with manual changes
Overwrites drift (idempotent apply)
Manual changes are impossible — server is replaced
Production benefit
Quick recovery from partial failures
Predictable, no drift, easier rollback (switch to old image)
Drawback
State can still drift between applies
Slower deployment — build and test new image each time
In a mature production environment you'll leverage both: use immutable images for core infrastructure (AMI pipelines) and idempotent configuration management for runtime adjustments (config files, secret rotation).
immutable-vs-idempotent.tfHCL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
# Immutable approach: Use an AMI built by Packer, not an inline user_data script
# This ensures the server is exactly what CI validated
resource "aws_instance""immutable_server" {
ami = var.baked_ami_id # AMI built in CI pipeline — never changed after launch
instance_type = "t3.micro"
# No provisioner — config is baked into the AMI
}
# Idempotent approach: Terraform ensures the instance exists and has certain tags
# But configuration inside the instance is managed by Ansible (idempotent tasks)
resource "aws_instance""idempotent_instance" {
ami = data.aws_ami.ubuntu.id
instance_type = "t3.micro"
tags = {
Name = "idempotent-inst"
}
# user_data script runs once at launch; Terraform won't rerun it on re-apply
user_data = file("setup.sh")
}
Output
# Immutable: no drift possible, but changing the AMI requires a new instance
# Idempotent: Terraform plan shows no changes, but user_data is one-shot — not idempotent!
# Ansible must manage runtime state to make it truly idempotent.
Interview Question: Which is more secure?
Immutability is generally considered more secure because there's no opportunity for drift to introduce vulnerabilities. An attacker cannot persistently alter an immutable server — the next deployment replaces it entirely. Idempotent tools can revert drift, but only if the drift is detected. Many security-conscious teams enforce immutability for production and use idempotent IaC for staging.
Production Insight
A team using immutable AMIs for a critical API service reduced incident response time from 45 minutes to 3 minutes — rollback was just a matter of redeploying the previous AMI.
With idempotent tools, rollback involved running an old playbook that often failed due to dependencies.
Rule: for stateless services, prefer immutability. For stateful databases, idempotent config management (with careful planning) is safer.
Key Takeaway
Idempotency makes reruns safe; immutability makes servers replaceable.
Use both: immutable images for stateless compute, idempotent config for mutable state.
Immutability is easier to audit and rollback — perfect for production critical paths.
IaC in a Real CI/CD Pipeline — Automate the Infrastructure Itself
Knowing how to run Terraform locally is a starting point. But IaC's real power unlocks when it runs automatically inside your CI/CD pipeline. Think about it: your application code goes through automated testing before it deploys. Why should your infrastructure changes be any different? A pull request that adds a new RDS database should go through the same review process as a pull request that adds a new API endpoint.
The pattern that works in production is this: on every pull request, run terraform plan and post the output as a PR comment. This gives reviewers an exact, human-readable diff of what will change in the real cloud — before anyone approves it. On merge to main, run terraform apply automatically. No one runs Terraform from their laptop. Ever.
This approach solves three problems at once. It creates an audit trail (every infrastructure change is a Git commit with an author and a timestamp). It prevents 'works on my machine' infrastructure (the pipeline always runs from a clean state). And it forces infrastructure changes through code review, which catches mistakes before they hit production.
The GitHub Actions workflow below implements this exact pattern. It's the real thing — not a toy example.
.github/workflows/terraform-ci-cd.ymlYAML
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
# .github/workflows/terraform-ci-cd.yml
# This pipeline runs Terraform plan on PRs and Terraform apply on merge to main.
# Prerequisites: Store AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY as GitHubSecrets.
name: TerraformInfrastructurePipeline
on:
pull_request:
branches: [main] # Run plan on any PR targeting main
paths: ['terraform/**'] # Only trigger when Terraform files actually changed
push:
branches: [main] # Run apply when code lands on main
paths: ['terraform/**']
env:
TF_VERSION: '1.7.0' # Pin the version — never use 'latest' in CI
TF_WORKING_DIR: './terraform' # All .tf files live in this directory
AWS_REGION: 'us-east-1'
jobs:
terraform-plan:
name: PlanInfrastructureChanges
runs-on: ubuntu-latest
# Only run the plan job on pull requests, not on direct pushes to main
if: github.event_name == 'pull_request'
steps:
- name: Check out repository code
uses: actions/checkout@v4
- name: ConfigureAWS credentials from GitHubSecrets
uses: aws-actions/configure-aws-credentials@v4
with:
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
aws-region: ${{ env.AWS_REGION }}
- name: Install the pinned version of Terraform
uses: hashicorp/setup-terraform@v3
with:
terraform_version: ${{ env.TF_VERSION }}
- name: TerraformInit — download providers and configure backend
working-directory: ${{ env.TF_WORKING_DIR }}
run: terraform init -input=false
# -input=false prevents the pipeline hanging waiting for keyboard input
- name: TerraformValidate — catch syntax errors before planning
working-directory: ${{ env.TF_WORKING_DIR }}
run: terraform validate
- name: TerraformPlan — calculate what will change
working-directory: ${{ env.TF_WORKING_DIR }}
id: tf_plan
# -no-color prevents ANSI escape codes from cluttering the GitHubPR comment
run: terraform plan -no-color -out=tfplan.binary
- name: Post plan output as a PR comment so reviewers can see the diff
uses: actions/github-script@v7
with:
script: |
const planOutput = `${{ steps.tf_plan.outputs.stdout }}`;
github.rest.issues.createComment({
issue_number: context.issue.number,
owner: context.repo.owner,
repo: context.repo.repo,
body: '### Terraform Plan\n```\n' + planOutput + '\n```'
})
terraform-apply:
name: ApplyInfrastructureChanges
runs-on: ubuntu-latest
# Only run apply when a PR is merged to main — never on open PRsif: github.event_name == 'push' && github.ref == 'refs/heads/main'
environment: production # Requires a GitHub environment approval gate if configured
steps:
- name: Check out repository code
uses: actions/checkout@v4
- name: ConfigureAWS credentials
uses: aws-actions/configure-aws-credentials@v4
with:
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
aws-region: ${{ env.AWS_REGION }}
- name: InstallTerraform
uses: hashicorp/setup-terraform@v3
with:
terraform_version: ${{ env.TF_VERSION }}
- name: TerraformInit
working-directory: ${{ env.TF_WORKING_DIR }}
run: terraform init -input=false
- name: TerraformApply — make the changes in the real cloud
working-directory: ${{ env.TF_WORKING_DIR }}
# -auto-approve skips the interactive yes/no prompt — safe here because
# this only runs after a human approved and merged the PR
run: terraform apply -input=false -auto-approve
Output
# On Pull Request — plan posted as PR comment:
### Terraform Plan
```
Terraform will perform the following actions:
# aws_instance.web_server will be created
+ resource "aws_instance" "web_server" {
+ ami = "ami-0c02fb55956c7d316"
+ instance_type = "t3.micro"
}
Plan: 1 to add, 0 to change, 0 to destroy.
```
# On merge to main — apply job output:
Run terraform apply -input=false -auto-approve
aws_instance.web_server: Creating...
aws_instance.web_server: Still creating... [10s elapsed]
aws_instance.web_server: Creation complete after 32s
Watch Out: Never Store Terraform State Locally in a Team
The default terraform.tfstate file is created locally. If two engineers run Terraform against the same environment, they'll corrupt each other's state and create duplicate or orphaned resources. Always configure a remote backend — S3 + DynamoDB for AWS, or Terraform Cloud — before anyone else joins the project. Add .tfstate and .tfstate.backup to your .gitignore immediately. State files contain plaintext secrets.
Production Insight
A team of three engineers once ran 'terraform apply' simultaneously from their laptops — state corruption caused two load balancers to be created, doubling the monthly bill for 6 weeks.
The fix: enforce a CI-only apply policy and use remote state with locking.
Rule: if more than one person touches the same environment, your state must be remote and locked.
Key Takeaway
Run 'terraform plan' on PRs and 'terraform apply' on merge — never from a laptop.
Remote state with locking prevents corruption.
Treat infrastructure changes like code: review, test, and deploy through CI/CD.
Remote State and Modules — The Patterns That Make IaC Scale
A single main.tf file works fine for a hobby project. It falls apart the moment you have two engineers, two environments, or two services. This is where two patterns become non-negotiable: remote state backends and modules.
Remote state is how Terraform remembers what it already built. Without it, every terraform apply is flying blind. With a remote backend — like an S3 bucket with a DynamoDB lock table — the state file lives in the cloud, is accessible to everyone on the team, and is locked during applies so two engineers can't run it simultaneously and corrupt each other's work.
Modules are reusable Terraform components. Think of them as functions for infrastructure. Instead of copy-pasting the same EC2 + security group + IAM role configuration for every service, you write it once as a module and call it with different variables for each service. This is the IaC equivalent of the DRY principle and it's what separates a professional IaC setup from a pile of disconnected config files.
Below is a minimal but real remote backend configuration alongside a module call pattern. This is the structure you'd actually find in a production repository.
backend.tf + modules/web_server/main.tfHCL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
# ─────────────────────────────────────────────
# FILE1: backend.tf — remote state configuration
# This tells Terraform to store its state file in S3 instead of locally.
# DynamoDB provides a lock so two engineers can't apply simultaneously.
# ─────────────────────────────────────────────
terraform {
required_version = ">= 1.7.0" # Prevent older Terraform versions from running this config
backend "s3" {
bucket = "theforge-terraform-state-prod" # Must exist before you run terraform init
key = "services/web-app/terraform.tfstate"
region = "us-east-1"
encrypt = true # Encrypt state at rest — critical because state contains secrets
dynamodb_table = "theforge-terraform-locks" # Table must have a partition key named 'LockID'
}
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0" # ~> means 5.x but not 6.x — prevents breaking changes
}
}
}
# ─────────────────────────────────────────────
# FILE2: modules/web_server/main.tf — a reusable module
# This module encapsulates all the resources needed for one web server.
# Call it multiple times with different variables for different services.
# ─────────────────────────────────────────────
variable "service_name" {
description = "Name of the service this server runs, e.g. 'payments' or 'user-api'"
type = string
}
variable "environment" {
description = "Deployment environment: dev, staging, or production"
type = string
validation {
# Terraform will refuse to apply if someone passes an unexpected value
condition = contains(["dev", "staging", "production"], var.environment)
error_message = "Environment must be dev, staging, or production."
}
}
variable "instance_type" {
description = "EC2 instance size — use t3.micro for dev, t3.medium for production"
type = string
default = "t3.micro"
}
resource "aws_instance""service_server" {
ami = "ami-0c02fb55956c7d316"
instance_type = var.instance_type
tags = {
Name = "${var.service_name}-${var.environment}" # e.g. 'payments-production'Environment = var.environment
Service = var.service_name
ManagedBy = "terraform"
}
}
output "server_id" {
value = aws_instance.service_server.id
}
# ─────────────────────────────────────────────
# FILE3: main.tf — calling the module twice for two different services
# This is the DRY pattern in action. Same infrastructure blueprint, different values.
# ─────────────────────────────────────────────
module "payments_server" {
source = "./modules/web_server"
service_name = "payments"
environment = "production"
instance_type = "t3.medium" # Payments needs more power than default
}
module "user_api_server" {
source = "./modules/web_server"
service_name = "user-api"
environment = "production"
# instance_type not specified — falls back to the default t3.micro
}
Output
Terraform will perform the following actions:
# module.payments_server.aws_instance.service_server will be created
+ resource "aws_instance" "service_server" {
+ instance_type = "t3.medium"
+ tags = {
+ "Environment" = "production"
+ "ManagedBy" = "terraform"
+ "Name" = "payments-production"
+ "Service" = "payments"
}
}
# module.user_api_server.aws_instance.service_server will be created
+ resource "aws_instance" "service_server" {
+ instance_type = "t3.micro"
+ tags = {
+ "Environment" = "production"
+ "ManagedBy" = "terraform"
+ "Name" = "user-api-production"
+ "Service" = "user-api"
}
}
Plan: 2 to add, 0 to change, 0 to destroy.
Interview Gold: Why is idempotency central to IaC?
Idempotency means running the same operation 10 times produces the same result as running it once. This is what makes Terraform safe to run in CI/CD: if infrastructure already matches the declared state, terraform apply does nothing. If you can't guarantee this, your pipeline becomes dangerous — applying twice could duplicate resources and double your AWS bill. This is a concept interviewers probe hard on.
Production Insight
Without remote state, two developers applying simultaneously can corrupt the state file, leading to orphaned resources or duplicate creations.
The DynamoDB lock table prevents concurrent applies — it's a distributed mutex.
Rule: always use a backend that supports locking; S3 alone isn't enough.
Key Takeaway
Remote state is non-negotiable for teams — it's the source of truth.
Modules enforce DRY and consistency across environments.
Locking prevents corruption; encrypt state to protect secrets.
IaC Security and Secrets Management — Don't Leak Your Infrastructure's Keys
Infrastructure code often requires secrets: API keys, database passwords, cloud provider credentials. A common rookie mistake is hardcoding these in the IaC files. Terraform state files, in particular, store resource attributes in plaintext, which can include sensitive values like database passwords or IAM secret keys.
The rule: IaC code should never contain secrets. Instead, use environment variables, encrypted variables in your CI/CD platform, or a dedicated secrets manager like HashiCorp Vault, AWS Secrets Manager, or GitHub Actions Secrets. For Terraform, use the sensitive = true attribute on outputs, and avoid outputting secrets in plan output. Use data sources that read from a secrets manager rather than embedding values.
Another critical practice: never commit .tfstate files to version control. Even if you delete them later, secrets are permanently exposed in Git history. Add .tfstate and .tfstate.backup to .gitignore on day one. Use a remote backend with encryption at rest.
Below is an example of using environment variables with Terraform, and a pattern for reading secrets from AWS Secrets Manager.
secrets.tfHCL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
# Use environment variables for provider credentials — never hardcode them
provider "aws" {
region = var.aws_region
# Credentials are sourced from environment variables AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY
}
# Read a secret from AWSSecretsManager at apply time
# This avoids storing the secret in code or state
data "aws_secretsmanager_secret""db_password" {
name = "prod/db/password"
}
data "aws_secretsmanager_secret_version""db_password_version" {
secret_id = data.aws_secretsmanager_secret.db_password.id
}
# Use the secret in a resource, but mark it sensitive
resource "aws_db_instance""main" {
identifier = "theforge-prod-db"
engine = "postgres"
username = "admin"
password = data.aws_secretsmanager_secret_version.db_password_version.secret_string
# PreventTerraform from showing this value in output
# The state file still contains the password, but it's encrypted at rest
# by the S3 backend's server-side encryption
}
output "db_endpoint" {
value = aws_db_instance.main.endpoint
sensitive = false # endpoint is not secret
}
output "db_password" {
value = aws_db_instance.main.password
sensitive = true # prevents Terraform from printing the value in plan output
}
Output
# Terraform plan output for the above configuration (password is redacted):
Terraform will perform the following actions:
# data.aws_secretsmanager_secret.db_password will be read during apply
# (configuration not shown)
# data.aws_secretsmanager_secret_version.db_password_version will be read during apply
# aws_db_instance.main will be created
+ resource "aws_db_instance" "main" {
+ identifier = "theforge-prod-db"
+ engine = "postgres"
+ username = "admin"
+ password = (sensitive value)
}
Plan: 1 to add, 0 to change, 0 to destroy.
Changes to Outputs:
+ db_endpoint = (known after apply)
Critical: State Files Contain Secrets in Plaintext
Even when you mark an output as sensitive, the secret value still lives in the state file. If your backend doesn't encrypt state at rest, anyone with access to the backend bucket can read your database passwords and API keys. Always enable encryption on your remote state backend (e.g., S3 server-side encryption with AWS KMS) and restrict IAM permissions to the least privilege.
Production Insight
A startup once committed terraform.tfstate containing AWS root credentials to a public GitHub repo.
The credentials were scraped within 2 hours, resulting in a $40,000 crypto mining bill.
Rule: never commit state files; use encrypted remote backends and scan for secrets in every commit.
Key Takeaway
Secrets in IaC must come from external stores — never hardcoded.
State files inherently contain sensitive data — encrypt them at rest.
Use 'sensitive = true' on outputs and audit plan output for leaks.
IaC Testing and Validation — Ensure Your Infrastructure Works Before You Apply
Writing infrastructure code without testing is like deploying a microservice without unit tests. You're one typo away from deleting a production database. IaC testing isn't as mature as application testing, but it's evolving fast. Here are three levels of validation every IaC pipeline needs:
Syntax and static analysis: Use terraform validate to catch basic HCL errors. Use tools like tflint for style and potential bugs, and checkov or tfsec for security policy violations. Run these on every PR before the plan step.
Plan review: The manual step where a human reviews the terraform plan output. This catches logical mistakes — like changing a security group that breaks connectivity, or accidentally destroying a stateful resource.
Integration testing: Tools like Terratest let you write Go tests that deploy real infrastructure, run assertions against it, and then destroy it. This is the gold standard, but it's expensive and slow. Use it sparingly for critical resources.
Below is a minimal GitHub Actions step that runs static analysis before the plan. Integrate this into your workflow to catch issues early.
static-analysis.ymlYAML
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
# Addthis as a step in your terraform-plan job before running terraform plan
# It runs tflint and checkov on the Terraform code
- name: Runtflint (Terraform linter)
uses: terraform-linters/setup-tflint@v3
with:
tflint_version: 'v0.50.0'
- name: LintTerraform code
run: tflint --format compact
working-directory: ${{ env.TF_WORKING_DIR }}
- name: Runcheckov (security scanning)
id: checkov
uses: bridgecrewio/checkov-action@v12
with:
directory: ${{ env.TF_WORKING_DIR }}
framework: terraform
# Fail the pipeline on any error in the security scan
soft_fail: false
# On failure, print detailed report
- name: Display checkov results
if: failure()
run: cat results
working-directory: ${{ env.TF_WORKING_DIR }}
Output
# tflint output when there are no issues:
No issues found.
# checkov output example:
Check: CKV_AWS_79: "Ensure Instance Metadata Service Version 1 is not enabled"
Check: CKV_AWS_126: "Ensure that detailed monitoring is enabled for EC2 instances"
PASSED
Summary: 1 failed, 0 passed, 0 skipped
Start Simple: Validate and Lint First
Don't try to implement full integration tests on day one. Start with terraform validate and a linter like tflint. That alone will catch 80% of the stupid mistakes — missing required tags, invalid arguments, incompatible provider versions. Add policy scanning (checkov, tfsec) as your next step.
Production Insight
A team once skipped validation and applied a configuration with a typo in 'ami' — they created an instance with an Ubuntu 16.04 AMI instead of 22.04.
The outdated AMI had a known kernel vulnerability, and the instance was compromised within 24 hours.
Rule: every pipeline must run validate and at least one security scanner before plan.
Key Takeaway
IaC testing has three layers: static analysis, plan review, and integration tests.
Start with validate + tflint — it's free and catches most mistakes.
Never apply without a human having seen the plan output.
Why Your Deployment Pipeline Should Fail When the State Lock Is Missing
Most teams treat state locking as optional. It's not. Without a lock, two engineers — or worse, two pipelines — can apply changes simultaneously. The result is state corruption, partial deployments, and an infrastructure that matches neither configuration. I've debugged a three-hour outage caused by concurrent Terraform applies fighting over the same S3 backend. The fix was one line: dynamodb_table = "terraform-lock". Treat state locking like a database transaction. If your CI/CD pipeline doesn't fail when the lock is missing, you're accepting risk. In production, configure your backend to reject concurrent operations. Check lock status before planning. If the lock exists, abort. Your staging environment isn't a rehearsal — it's the same code path. If you skip safety there, you'll skip it in production. Add a pre-flight check that verifies lock availability before any apply command.
preflight_lock_check.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
# io.thecodeforge.state_lock_preflightimport boto3
import sys
defcheck_lock(bucket, key, region='us-east-1'):
s3 = boto3.client('s3', region_name=region)
try:
resp = s3.get_object(Bucket=bucket, Key=key)
lock_body = resp['Body'].read().decode()
if lock_body:
print(f'⚠️ State lock ACTIVE. Aborting apply.')
sys.exit(1)
else:
print('✅ No active lock. Proceeding.')
except s3.exceptions.NoSuchKey:
print('✅ No state file. Proceeding.')
check_lock('my-terraform-state-bucket', 'env/prod/terraform.tfstate.lock')
Output
⚠️ State lock ACTIVE. Aborting apply.
Production Trap:
Lock timeouts under load: DynamoDB locks have a 60-second TTL. If your pipeline takes longer to plan, the lock expires mid-apply. Set a custom TTL that matches your worst-case apply duration.
Key Takeaway
State locking isn't optional — it's the difference between deterministic infrastructure and a three-way merge nightmare.
Stop Patching Your VMs — Burn Them and Rebuild
Mutable infrastructure feels comfortable. You SSH in, run yum update, and move on. But after six months, that server has accumulated packages, configs, and cron jobs no one remembers. Configuration drift turns your 'pet' into a fragile mystery. Immutable infrastructure solves this. When you need to update a server, you don't patch it — you destroy it and create a fresh one from a golden image. Your CI/CD pipeline builds that image, runs security scans, and deploys it to staging. If tests pass, the image goes to production. The old server is terminated. This shifts your mindset: you're not managing servers, you're managing releases. Use tools like Packer to build AMIs or Azure Image Builder for VHDs. Store images in a registry with version tags. Never SSH into production. If something's wrong, roll back to the previous image — not a 'patch Tuesday' prayer.
Building images takes time. Staging should use a CI/CD cache: store built AMIs in a shared registry with a TTL. If the image exists and its security scan is fresh, skip the rebuild. Speeds up deployment by 40%.
Key Takeaway
Treat servers like cattle, not pets — if you're patching in place, you're accumulating technical debt in silicon.
● Production incidentPOST-MORTEMseverity: high
The Terraform State Corruption That Took Down Production
Symptom
Users report missing files. The S3 bucket exists but is empty. No recent code changes. The last Terraform apply happened a week ago.
Assumption
The data loss was caused by a Terraform bug or a malicious actor.
Root cause
A developer manually deleted the bucket from the AWS console to 'clean up unused resources' without checking if Terraform managed it. The bucket lacked the 'ManagedBy: terraform' tag. Terraform's state still recorded the bucket, but the actual resource was gone.
Fix
Restored the bucket from a backup (versioning was enabled). Added the 'ManagedBy' tag to all critical Terraform resources. Set up S3 bucket policies to prevent deletion from the console. Ran 'terraform plan' to confirm state matched reality.
Key lesson
Always tag IaC-managed resources with ManagedBy: <tool> — it tells humans not to touch them manually.
Enable versioning on state-critical resources so you can recover from accidental deletion.
Use S3 bucket policies or IAM permissions to block console modifications for production resources.
Run 'terraform plan' periodically in a CI job to catch drift before it becomes a crisis.
Production debug guideSymptom → Action — Resolve the most common Terraform production failures fast.3 entries
Symptom · 01
terraform apply hangs forever showing 'Acquiring state lock...'
→
Fix
Run 'terraform force-unlock <LOCK_ID>' with the lock ID from DynamoDB. The lock was held by a previous apply that was killed abruptly. First, verify no other apply is running.
Symptom · 02
terraform plan shows changes for resources nobody modified
→
Fix
Check if someone made manual changes in the cloud console. Run 'terraform refresh' to sync state with reality. If drift persists, investigate who has console access and enforce read-only policies.
Symptom · 03
terraform apply fails with 'Error: Error creating resource: ConflictException'
→
Fix
A resource already exists with the same name outside Terraform management. Import it: 'terraform import <resource_type>.<name> <id>'. Alternatively, adjust the name in code to avoid collision.
★ Quick Debug: Terraform CI FailuresCommon Terraform errors in CI pipelines and the exact commands to fix them.
terraform init fails: 'Failed to query available provider packages'−
Immediate action
Check your network. If behind a proxy, configure HTTP_PROXY and HTTPS_PROXY env vars.
Commands
terraform init -input=false -lock=false
terraform providers mirror <path> # to cache providers offline
Fix now
Add 'required_providers' version constraints in the terraform block and ensure they match the lock file.
terraform plan fails: 'No changes. Infrastructure is up-to-date.' but you expected changes+
Immediate action
Check if the PR branch is behind main. The pipeline might be running plan against an old state.
Commands
git fetch origin main && git merge origin main
terraform plan -out=tfplan.binary -refresh-only
Fix now
Run 'terraform refresh' to pull the latest state before planning. Ensure the pipeline uses the correct backend configuration.
You describe desired state; Terraform calculates the steps
You write ordered tasks; Ansible executes them top to bottom
Idempotency
Built-in — plan always shows exact diff before apply
Achievable but manual — each task must be written idempotently
State management
Explicit state file tracks what was built
Stateless — re-runs the playbook against live systems
Language
HCL (HashiCorp Configuration Language)
YAML-based playbooks
Cloud-native fit
Excellent — purpose-built for cloud APIs
Good — best for OS-level configuration after provisioning
Learning curve
Steeper — must understand state, providers, backends
Gentler — YAML is readable; SSH-based, no agent required
Team use case
Infrastructure provisioning in CI/CD pipelines
Application deployment and server hardening automation
Key takeaways
1
IaC treats infrastructure the same way you treat application code
versioned, reviewed, tested, and deployed through a pipeline. The moment infrastructure lives in Git, it gains an audit trail and becomes repeatable.
2
Declarative tools like Terraform describe the what (desired end state); imperative tools like Ansible describe the how (ordered steps). Use Terraform to provision cloud resources and Ansible to configure the OS inside them
they solve different halves of the same problem.
3
Remote state backends (S3 + DynamoDB for AWS) are non-negotiable for any team setup. Local state files cause silent corruption when two engineers run Terraform against the same environment simultaneously.
4
The gold-standard CI/CD pattern is terraform plan on PR (posted as a comment for review) and terraform apply on merge to main
never run Terraform locally against production in a team environment.
5
Secrets in IaC must be externalised and state files must be encrypted at rest
one committed .tfstate file can expose every database password your team uses.
Common mistakes to avoid
3 patterns
×
Committing terraform.tfstate to Git
Symptom
State file contains plaintext database passwords, API keys, and sensitive outputs. Git history is permanent — even if you delete the file later, secrets are exposed forever in every clone.
Fix
Add .tfstate and .tfstate.backup to .gitignore on day one. Configure a remote S3 backend before your first apply. Run git secrets or truffleHog in your CI pipeline to catch accidental secret commits.
×
Making manual changes to IaC-managed resources in the cloud console
Symptom
You log into AWS, tweak a security group rule 'just this once' to fix a prod issue. Next time Terraform runs, it overwrites your manual change because the state file says the old rule is the desired state. Terraform wins every time.
Fix
Treat IaC-managed resources as read-only in any UI. Make the fix in code, open a PR, let the pipeline apply it. If you need emergency access, import the manual change back into Terraform state immediately with terraform import.
×
Not pinning provider and module versions
Symptom
Writing version = latest or omitting versions entirely means a provider upgrade can break your configuration without any code change on your part. This causes phantom CI failures that are extremely hard to debug because nothing in your diff changed.
Fix
Always pin with a pessimistic constraint operator — version = '~> 5.0' allows patch updates but blocks breaking major versions. Commit your .terraform.lock.hcl file to Git so the whole team and CI use identical provider binaries.
INTERVIEW PREP · PRACTICE MODE
Interview Questions on This Topic
Q01SENIOR
What is configuration drift, and how does Infrastructure as Code prevent...
Q02SENIOR
Explain the difference between declarative and imperative IaC. If you ha...
Q03SENIOR
If a colleague manually deleted an AWS resource that Terraform created, ...
Q04SENIOR
How do you handle secrets in Terraform? What mechanisms prevent them fro...
Q01 of 04SENIOR
What is configuration drift, and how does Infrastructure as Code prevent it? Can you give a concrete example of how drift occurs without IaC?
ANSWER
Configuration drift is the gradual divergence of infrastructure environments from their intended configuration over time. Without IaC, a sysadmin might manually patch a server, apply a hotfix, or change a firewall rule. That change never gets documented or replicated. Six months later, staging and prod are so different that a deployment that works in staging fails in production. IaC prevents drift by codifying the desired state. Every time you apply the code, the system reconciles actual vs. desired state, overwriting any manual changes. Example: a team manually updated the instance type of an EC2 from t3.micro to t3.medium to handle a load spike. They forgot to update the Terraform config. Next deploy cycle, Terraform scaled it back down to t3.micro, causing a performance outage.
Q02 of 04SENIOR
Explain the difference between declarative and imperative IaC. If you had to provision a new EC2 instance and then install Nginx on it, which tool would you use for each step and why?
ANSWER
Declarative IaC (Terraform, CloudFormation) focuses on the end state. You declare: 'I want an EC2 instance with these properties' and the tool determines the steps. Imperative IaC (Ansible, Chef) focuses on the steps: 'Run this command, then that command'. For provisioning the EC2 instance, use Terraform: it handles dependencies like VPC, subnet, security groups, and can track the resource in state. For installing Nginx, use Ansible: it connects to the running instance and executes a sequence of tasks (install package, configure virtual host, start service). Ansible is better at OS-level configuration because order matters (e.g., install package before editing config). Combining both tools is a common production pattern.
Q03 of 04SENIOR
If a colleague manually deleted an AWS resource that Terraform created, what happens when the next `terraform apply` runs — and how would you handle a situation where you need to remove a resource from Terraform management without destroying it?
ANSWER
If a resource is deleted outside Terraform, the state file still thinks it exists. The next terraform plan will show that resource as 'created' because Terraform sees a missing resource and wants to recreate it. To fix, run terraform refresh to update the state to reflect reality (the resource is gone). But if you want to keep the resource but remove it from Terraform management, use terraform state rm <resource_address>. That removes the resource from the state file without deleting the actual cloud resource. Then you can manually manage it or re-import later. This is useful when migrating a resource to a different Terraform configuration or when retiring Terraform but keeping the infrastructure.
Q04 of 04SENIOR
How do you handle secrets in Terraform? What mechanisms prevent them from leaking in state files or plan output?
ANSWER
Never hardcode secrets. Use a secrets manager like AWS Secrets Manager, HashiCorp Vault, or environment variables. For Terraform, read secrets via data sources (aws_secretsmanager_secret_version). Mark outputs containing secrets as sensitive = true to prevent them appearing in plan output. However, note that even sensitive values are stored in plaintext in the state file. Mitigations: use an encrypted remote backend (S3 with KMS, Terraform Cloud with at-rest encryption), and restrict access to the state file via IAM. Also, tools like checkov can detect potential secret leaks in the code. Finally, never commit .tfstate files to Git.
01
What is configuration drift, and how does Infrastructure as Code prevent it? Can you give a concrete example of how drift occurs without IaC?
SENIOR
02
Explain the difference between declarative and imperative IaC. If you had to provision a new EC2 instance and then install Nginx on it, which tool would you use for each step and why?
SENIOR
03
If a colleague manually deleted an AWS resource that Terraform created, what happens when the next `terraform apply` runs — and how would you handle a situation where you need to remove a resource from Terraform management without destroying it?
SENIOR
04
How do you handle secrets in Terraform? What mechanisms prevent them from leaking in state files or plan output?
SENIOR
FAQ · 4 QUESTIONS
Frequently Asked Questions
01
What is the difference between Infrastructure as Code and Configuration Management?
Infrastructure as Code provisions the actual cloud resources — servers, networks, databases — from scratch. Configuration management (Ansible, Chef, Puppet) takes those resources and configures what's running inside them: which packages are installed, which config files exist, which services are running. In practice, you use both: IaC to build the infrastructure, configuration management to set it up.
Was this helpful?
02
Do I need to know programming to use Terraform or Ansible?
Not deeply, but you do need to understand variables, conditionals, and loops — concepts that exist in both HCL and Ansible's YAML. Terraform's HCL is specifically designed to be more readable than a general-purpose programming language. The bigger learning curve is understanding cloud concepts (networking, IAM, VPCs) than the IaC syntax itself.
Was this helpful?
03
Is Terraform the same as CloudFormation?
Both are declarative IaC tools, but CloudFormation is AWS-only and tightly integrated with AWS services, while Terraform is cloud-agnostic — the same tool works for AWS, Azure, GCP, and hundreds of other providers using the same workflow. Teams on a single cloud often find CloudFormation simpler; multi-cloud teams or those who want portability prefer Terraform.
Was this helpful?
04
How much does Infrastructure as Code cost?
Terraform is open source and free. Terraform Cloud has a free tier for up to 5 users. Ansible is also free and open source. The main costs are cloud resources and the time to write and maintain the code. IaC saves money in the long run by reducing manual errors, simplifying disaster recovery, and enabling reliable scaling.