Infrastructure as Code Explained — Why, How, and Real-World IaC Patterns
Every modern software team has faced the same nightmare: a production server dies at 2am, and the engineer who built it left the company six months ago. Nobody wrote anything down. The replacement server gets rebuilt from memory, Slack messages, and guesswork — and it's never quite right. This isn't a people problem. It's a process problem, and Infrastructure as Code (IaC) exists specifically to eliminate it. When your infrastructure lives in code, it lives in Git, in pull requests, in code reviews, and in your CI/CD pipeline — just like the application it runs.
Before IaC, provisioning infrastructure meant logging into a cloud console, clicking through wizards, and hoping the person next to you was watching and taking notes. Every environment — dev, staging, production — drifted apart over time because human hands configured them differently. This 'configuration drift' is the silent killer of reliable deployments. IaC solves this by making infrastructure declarative and repeatable: you describe the desired state of your system, and a tool like Terraform or Ansible figures out how to get there. The same code that spins up your staging environment spins up production, byte for byte.
By the end of this article you'll understand why IaC exists at a systems level, know the difference between declarative and imperative approaches, and have a real working Terraform + GitHub Actions CI/CD pipeline you can adapt for your own projects. You'll also know the two mistakes that catch almost every intermediate engineer off guard when they go to use IaC in a team setting.
Declarative vs Imperative IaC — Choosing the Right Mental Model
There are two ways to tell someone how to make a cup of coffee. The imperative way: 'Boil water. Measure 18g of beans. Grind them. Pour water at 94°C. Wait 4 minutes.' The declarative way: 'I want a black filter coffee in this cup.' The declarative approach lets the system figure out the steps.
This distinction is the most important conceptual split in IaC. Terraform is declarative — you describe what your infrastructure should look like, and Terraform calculates the diff between current state and desired state, then makes the changes. Ansible is imperative by default — you write a sequence of tasks that run top to bottom. Both are valid. The right choice depends on what you're managing.
Declarative tools shine for cloud resource provisioning: creating VPCs, EC2 instances, databases, and load balancers. You don't want to think about order of operations — you just want the result. Imperative tools shine for configuration management: installing packages, editing config files, restarting services. The order genuinely matters there.
In a mature DevOps pipeline you'll often use both: Terraform provisions the server, Ansible configures it. Understanding why they work differently stops you from fighting the tool when it doesn't behave the way you expect.
# main.tf — Terraform declarative configuration for a basic web server on AWS # Run: terraform init && terraform plan && terraform apply # Tell Terraform which cloud provider to use and which region provider "aws" { region = "us-east-1" } # Declare the desired state: an EC2 instance running Ubuntu # Terraform will CREATE this if it doesn't exist, UPDATE it if the spec changed, # and do NOTHING if it already matches — this is idempotency in action resource "aws_instance" "web_server" { ami = "ami-0c02fb55956c7d316" # Ubuntu 22.04 LTS in us-east-1 instance_type = "t3.micro" # Free-tier eligible size # Tag the resource so humans can find it in the AWS console tags = { Name = "theforge-web-server" Environment = "staging" ManagedBy = "terraform" # Critical: tells the team this isn't manually managed } } # Output block: print the public IP after apply so we know where to connect output "web_server_public_ip" { description = "The public IP address of our web server" value = aws_instance.web_server.public_ip }
# aws_instance.web_server will be created
+ resource "aws_instance" "web_server" {
+ ami = "ami-0c02fb55956c7d316"
+ instance_type = "t3.micro"
+ tags = {
+ "Environment" = "staging"
+ "ManagedBy" = "terraform"
+ "Name" = "theforge-web-server"
}
}
Plan: 1 to add, 0 to change, 0 to destroy.
Apply complete! Resources: 1 added, 0 changed, 0 destroyed.
Outputs:
web_server_public_ip = "54.234.101.88"
IaC in a Real CI/CD Pipeline — Automate the Infrastructure Itself
Knowing how to run Terraform locally is a starting point. But IaC's real power unlocks when it runs automatically inside your CI/CD pipeline. Think about it: your application code goes through automated testing before it deploys. Why should your infrastructure changes be any different? A pull request that adds a new RDS database should go through the same review process as a pull request that adds a new API endpoint.
The pattern that works in production is this: on every pull request, run terraform plan and post the output as a PR comment. This gives reviewers an exact, human-readable diff of what will change in the real cloud — before anyone approves it. On merge to main, run terraform apply automatically. No one runs Terraform from their laptop. Ever.
This approach solves three problems at once. It creates an audit trail (every infrastructure change is a Git commit with an author and a timestamp). It prevents 'works on my machine' infrastructure (the pipeline always runs from a clean state). And it forces infrastructure changes through code review, which catches mistakes before they hit production.
The GitHub Actions workflow below implements this exact pattern. It's the real thing — not a toy example.
# .github/workflows/terraform-ci-cd.yml # This pipeline runs Terraform plan on PRs and Terraform apply on merge to main. # Prerequisites: Store AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY as GitHub Secrets. name: Terraform Infrastructure Pipeline on: pull_request: branches: [main] # Run plan on any PR targeting main paths: ['terraform/**'] # Only trigger when Terraform files actually changed push: branches: [main] # Run apply when code lands on main paths: ['terraform/**'] env: TF_VERSION: '1.7.0' # Pin the version — never use 'latest' in CI TF_WORKING_DIR: './terraform' # All .tf files live in this directory AWS_REGION: 'us-east-1' jobs: terraform-plan: name: Plan Infrastructure Changes runs-on: ubuntu-latest # Only run the plan job on pull requests, not on direct pushes to main if: github.event_name == 'pull_request' steps: - name: Check out repository code uses: actions/checkout@v4 - name: Configure AWS credentials from GitHub Secrets uses: aws-actions/configure-aws-credentials@v4 with: aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }} aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }} aws-region: ${{ env.AWS_REGION }} - name: Install the pinned version of Terraform uses: hashicorp/setup-terraform@v3 with: terraform_version: ${{ env.TF_VERSION }} - name: Terraform Init — download providers and configure backend working-directory: ${{ env.TF_WORKING_DIR }} run: terraform init -input=false # -input=false prevents the pipeline hanging waiting for keyboard input - name: Terraform Validate — catch syntax errors before planning working-directory: ${{ env.TF_WORKING_DIR }} run: terraform validate - name: Terraform Plan — calculate what will change working-directory: ${{ env.TF_WORKING_DIR }} id: tf_plan # -no-color prevents ANSI escape codes from cluttering the GitHub PR comment run: terraform plan -no-color -out=tfplan.binary - name: Post plan output as a PR comment so reviewers can see the diff uses: actions/github-script@v7 with: script: | const planOutput = `${{ steps.tf_plan.outputs.stdout }}`; github.rest.issues.createComment({ issue_number: context.issue.number, owner: context.repo.owner, repo: context.repo.repo, body: '### Terraform Plan\n```\n' + planOutput + '\n```' }) terraform-apply: name: Apply Infrastructure Changes runs-on: ubuntu-latest # Only run apply when a PR is merged to main — never on open PRs if: github.event_name == 'push' && github.ref == 'refs/heads/main' environment: production # Requires a GitHub environment approval gate if configured steps: - name: Check out repository code uses: actions/checkout@v4 - name: Configure AWS credentials uses: aws-actions/configure-aws-credentials@v4 with: aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }} aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }} aws-region: ${{ env.AWS_REGION }} - name: Install Terraform uses: hashicorp/setup-terraform@v3 with: terraform_version: ${{ env.TF_VERSION }} - name: Terraform Init working-directory: ${{ env.TF_WORKING_DIR }} run: terraform init -input=false - name: Terraform Apply — make the changes in the real cloud working-directory: ${{ env.TF_WORKING_DIR }} # -auto-approve skips the interactive yes/no prompt — safe here because # this only runs after a human approved and merged the PR run: terraform apply -input=false -auto-approve
### Terraform Plan
```
Terraform will perform the following actions:
# aws_instance.web_server will be created
+ resource "aws_instance" "web_server" {
+ ami = "ami-0c02fb55956c7d316"
+ instance_type = "t3.micro"
}
Plan: 1 to add, 0 to change, 0 to destroy.
```
# On merge to main — apply job output:
Run terraform apply -input=false -auto-approve
aws_instance.web_server: Creating...
aws_instance.web_server: Still creating... [10s elapsed]
aws_instance.web_server: Creation complete after 32s
Apply complete! Resources: 1 added, 0 changed, 0 destroyed.
Remote State and Modules — The Patterns That Make IaC Scale
A single main.tf file works fine for a hobby project. It falls apart the moment you have two engineers, two environments, or two services. This is where two patterns become non-negotiable: remote state backends and modules.
Remote state is how Terraform remembers what it already built. Without it, every terraform apply is flying blind. With a remote backend — like an S3 bucket with a DynamoDB lock table — the state file lives in the cloud, is accessible to everyone on the team, and is locked during applies so two engineers can't run it simultaneously and corrupt each other's work.
Modules are reusable Terraform components. Think of them as functions for infrastructure. Instead of copy-pasting the same EC2 + security group + IAM role configuration for every service, you write it once as a module and call it with different variables for each service. This is the IaC equivalent of the DRY principle and it's what separates a professional IaC setup from a pile of disconnected config files.
Below is a minimal but real remote backend configuration alongside a module call pattern. This is the structure you'd actually find in a production repository.
# ───────────────────────────────────────────── # FILE 1: backend.tf — remote state configuration # This tells Terraform to store its state file in S3 instead of locally. # DynamoDB provides a lock so two engineers can't apply simultaneously. # ───────────────────────────────────────────── terraform { required_version = ">= 1.7.0" # Prevent older Terraform versions from running this config backend "s3" { bucket = "theforge-terraform-state-prod" # Must exist before you run terraform init key = "services/web-app/terraform.tfstate" region = "us-east-1" encrypt = true # Encrypt state at rest — critical because state contains secrets dynamodb_table = "theforge-terraform-locks" # Table must have a partition key named 'LockID' } required_providers { aws = { source = "hashicorp/aws" version = "~> 5.0" # ~> means 5.x but not 6.x — prevents breaking changes } } } # ───────────────────────────────────────────── # FILE 2: modules/web_server/main.tf — a reusable module # This module encapsulates all the resources needed for one web server. # Call it multiple times with different variables for different services. # ───────────────────────────────────────────── variable "service_name" { description = "Name of the service this server runs, e.g. 'payments' or 'user-api'" type = string } variable "environment" { description = "Deployment environment: dev, staging, or production" type = string validation { # Terraform will refuse to apply if someone passes an unexpected value condition = contains(["dev", "staging", "production"], var.environment) error_message = "Environment must be dev, staging, or production." } } variable "instance_type" { description = "EC2 instance size — use t3.micro for dev, t3.medium for production" type = string default = "t3.micro" } resource "aws_instance" "service_server" { ami = "ami-0c02fb55956c7d316" instance_type = var.instance_type tags = { Name = "${var.service_name}-${var.environment}" # e.g. 'payments-production' Environment = var.environment Service = var.service_name ManagedBy = "terraform" } } output "server_id" { value = aws_instance.service_server.id } # ───────────────────────────────────────────── # FILE 3: main.tf — calling the module twice for two different services # This is the DRY pattern in action. Same infrastructure blueprint, different values. # ───────────────────────────────────────────── module "payments_server" { source = "./modules/web_server" service_name = "payments" environment = "production" instance_type = "t3.medium" # Payments needs more power than default } module "user_api_server" { source = "./modules/web_server" service_name = "user-api" environment = "production" # instance_type not specified — falls back to the default t3.micro }
# module.payments_server.aws_instance.service_server will be created
+ resource "aws_instance" "service_server" {
+ instance_type = "t3.medium"
+ tags = {
+ "Environment" = "production"
+ "ManagedBy" = "terraform"
+ "Name" = "payments-production"
+ "Service" = "payments"
}
}
# module.user_api_server.aws_instance.service_server will be created
+ resource "aws_instance" "service_server" {
+ instance_type = "t3.micro"
+ tags = {
+ "Environment" = "production"
+ "ManagedBy" = "terraform"
+ "Name" = "user-api-production"
+ "Service" = "user-api"
}
}
Plan: 2 to add, 0 to change, 0 to destroy.
| Aspect | Terraform (Declarative) | Ansible (Imperative) |
|---|---|---|
| Primary use case | Cloud resource provisioning — VPCs, VMs, databases | Configuration management — packages, files, services |
| Approach | You describe desired state; Terraform calculates the steps | You write ordered tasks; Ansible executes them top to bottom |
| Idempotency | Built-in — plan always shows exact diff before apply | Achievable but manual — each task must be written idempotently |
| State management | Explicit state file tracks what was built | Stateless — re-runs the playbook against live systems |
| Language | HCL (HashiCorp Configuration Language) | YAML-based playbooks |
| Cloud-native fit | Excellent — purpose-built for cloud APIs | Good — best for OS-level configuration after provisioning |
| Learning curve | Steeper — must understand state, providers, backends | Gentler — YAML is readable; SSH-based, no agent required |
| Team use case | Infrastructure provisioning in CI/CD pipelines | Application deployment and server hardening automation |
🎯 Key Takeaways
- IaC treats infrastructure the same way you treat application code — versioned, reviewed, tested, and deployed through a pipeline. The moment infrastructure lives in Git, it gains an audit trail and becomes repeatable.
- Declarative tools like Terraform describe the what (desired end state); imperative tools like Ansible describe the how (ordered steps). Use Terraform to provision cloud resources and Ansible to configure the OS inside them — they solve different halves of the same problem.
- Remote state backends (S3 + DynamoDB for AWS) are non-negotiable for any team setup. Local state files cause silent corruption when two engineers run Terraform against the same environment simultaneously.
- The gold-standard CI/CD pattern is
terraform planon PR (posted as a comment for review) andterraform applyon merge to main — never run Terraform locally against production in a team environment.
⚠ Common Mistakes to Avoid
- ✕Mistake 1: Committing terraform.tfstate to Git — The state file grows to contain plaintext database passwords, API keys and sensitive outputs. Git history is permanent, so even if you delete it later, the secrets are exposed forever in every clone. Fix: add
.tfstateand.tfstate.backupto.gitignoreon day one, and configure a remote S3 backend before your first apply. Rungit secretsortruffleHogin your CI pipeline to catch accidental secret commits. - ✕Mistake 2: Making manual changes to IaC-managed resources in the cloud console — You log into AWS, tweak a security group rule 'just this once' to fix a prod issue. Next time Terraform runs, it overwrites your manual change because the state file says the old rule is the desired state. Terraform wins every time. Fix: treat IaC-managed resources as read-only in any UI. Make the fix in code, open a PR, let the pipeline apply it. If you need emergency access, import the manual change back into Terraform state immediately with
terraform import. - ✕Mistake 3: Not pinning provider and module versions — Writing
version = latestor omitting versions entirely means a provider upgrade can break your configuration without any code change on your part. This causes phantom CI failures that are extremely hard to debug because nothing in your diff changed. Fix: always pin with a pessimistic constraint operator —version = '~> 5.0'allows patch updates but blocks breaking major versions. Commit your.terraform.lock.hclfile to Git so the whole team and CI use identical provider binaries.
Interview Questions on This Topic
- QWhat is configuration drift, and how does Infrastructure as Code prevent it? Can you give a concrete example of how drift occurs without IaC?
- QExplain the difference between declarative and imperative IaC. If you had to provision a new EC2 instance and then install Nginx on it, which tool would you use for each step and why?
- QIf a colleague manually deleted an AWS resource that Terraform created, what happens when the next `terraform apply` runs — and how would you handle a situation where you need to remove a resource from Terraform management without destroying it?
Frequently Asked Questions
What is the difference between Infrastructure as Code and Configuration Management?
Infrastructure as Code provisions the actual cloud resources — servers, networks, databases — from scratch. Configuration management (Ansible, Chef, Puppet) takes those resources and configures what's running inside them: which packages are installed, which config files exist, which services are running. In practice, you use both: IaC to build the infrastructure, configuration management to set it up.
Do I need to know programming to use Terraform or Ansible?
Not deeply, but you do need to understand variables, conditionals, and loops — concepts that exist in both HCL and Ansible's YAML. Terraform's HCL is specifically designed to be more readable than a general-purpose programming language. The bigger learning curve is understanding cloud concepts (networking, IAM, VPCs) than the IaC syntax itself.
Is Terraform the same as CloudFormation?
Both are declarative IaC tools, but CloudFormation is AWS-only and tightly integrated with AWS services, while Terraform is cloud-agnostic — the same tool works for AWS, Azure, GCP, and hundreds of other providers using the same workflow. Teams on a single cloud often find CloudFormation simpler; multi-cloud teams or those who want portability prefer Terraform.
Written and reviewed by senior developers with real-world experience across enterprise, startup and open-source projects. Every article on TheCodeForge is written to be clear, accurate and genuinely useful — not just SEO filler.