Home DevOps Terraform Basics Explained — State, Providers, and Real-World IaC Patterns

Terraform Basics Explained — State, Providers, and Real-World IaC Patterns

In Plain English 🔥
Imagine you're building a LEGO city. Instead of just building it by hand and hoping you remember every piece, you write down the exact instructions — 'place a red 2x4 brick here, a blue window there.' Terraform is that instruction manual for cloud infrastructure. You write down exactly what servers, databases, and networks you want, and Terraform builds it. Tear it down and rebuild it tomorrow? Same instructions, identical city — every single time.
⚡ Quick Answer
Imagine you're building a LEGO city. Instead of just building it by hand and hoping you remember every piece, you write down the exact instructions — 'place a red 2x4 brick here, a blue window there.' Terraform is that instruction manual for cloud infrastructure. You write down exactly what servers, databases, and networks you want, and Terraform builds it. Tear it down and rebuild it tomorrow? Same instructions, identical city — every single time.

Every company running in the cloud eventually hits the same wall: someone clicks around the AWS console to spin up a server, another person does it slightly differently, and six months later nobody knows what's running or why. Servers become 'pets' — hand-crafted, irreplaceable, and terrifying to touch. Terraform exists to end that chaos by letting you describe your entire infrastructure in version-controlled code, the same way you describe your application logic.

Before Terraform, teams either wrote brittle bash scripts full of AWS CLI commands or relied entirely on cloud-specific tools like CloudFormation (which only works on AWS) or Azure ARM templates (which only work on Azure). Terraform solved the vendor lock-in problem by introducing a single declarative language — HCL — that works across AWS, GCP, Azure, and hundreds of other providers. You write your intent ('I want three EC2 instances'), and Terraform figures out the sequence of API calls to make it real.

By the end of this article you'll understand why the Terraform state file is both its superpower and its biggest footgun, how providers and modules keep your code DRY at scale, and how a real-world multi-environment setup actually looks — not a toy example, but the kind of structure you'd find in a production codebase at a fast-growing startup or enterprise engineering team.

How Terraform's Core Loop Actually Works — Plan, Apply, State

Most tutorials show you terraform apply and move on. But the real magic — and the real danger — lives in the three-step loop Terraform runs every single time you touch your infrastructure.

First, Terraform reads your .tf files and builds a desired state — a mental model of what you want the world to look like. Then it reads the state file (more on this shortly) to understand what it already built. Finally, it calls your cloud provider's APIs to build a diff between those two pictures. That diff is your plan.

This is fundamentally different from imperative tools like Ansible where you say 'run these steps.' Terraform is declarative — you say 'here's the destination' and it plots the route. The benefit is idempotency: running terraform apply ten times on an unchanged config does nothing after the first run, because the desired state already matches reality.

The critical thing to internalise is that Terraform doesn't inspect your live cloud resources to build that diff — it trusts the state file. If someone manually changes a resource in the AWS console, Terraform doesn't know. Your state file lies. That's the source of more production incidents than almost any other Terraform mistake.

main.tf · HCL
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596
# main.tf — A minimal but complete AWS setup that demonstrates the core loop
# This creates a VPC and a single EC2 instance inside it.
# Run: terraform init -> terraform plan -> terraform apply

terraform {
  required_version = ">= 1.5.0"

  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"  # Pin to major version to avoid surprise breaking changes
    }
  }
}

# The provider block tells Terraform WHERE to build — credentials come from
# environment variables (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY) or an IAM role.
# Never hardcode credentials in .tf files — they'll end up in version control.
provider "aws" {
  region = var.aws_region
}

# Variables make this config reusable across environments.
# Actual values live in terraform.tfvars (git-ignored) or are passed via -var flags.
variable "aws_region" {
  type        = string
  description = "AWS region where all resources will be created"
  default     = "us-east-1"
}

variable "environment_name" {
  type        = string
  description = "Environment tag applied to every resource (e.g. staging, production)"
}

# The VPC is our private network — everything else lives inside it.
resource "aws_vpc" "primary_network" {
  cidr_block           = "10.0.0.0/16"
  enable_dns_hostnames = true  # Needed so EC2 instances get resolvable DNS names

  tags = {
    Name        = "${var.environment_name}-vpc"
    Environment = var.environment_name
    ManagedBy   = "terraform"  # Tagging as Terraform-managed helps ops teams know NOT to edit manually
  }
}

# A public subnet within that VPC.
resource "aws_subnet" "public_web_subnet" {
  vpc_id                  = aws_vpc.primary_network.id  # Reference to the VPC above — Terraform builds the dependency graph from this
  cidr_block              = "10.0.1.0/24"
  availability_zone       = "${var.aws_region}a"
  map_public_ip_on_launch = true

  tags = {
    Name        = "${var.environment_name}-public-subnet"
    Environment = var.environment_name
  }
}

# Data source — reads EXISTING resources rather than creating new ones.
# Here we fetch the latest Amazon Linux 2023 AMI ID dynamically,
# so we're never hardcoding an AMI that gets deprecated.
data "aws_ami" "amazon_linux_2023" {
  most_recent = true
  owners      = ["amazon"]

  filter {
    name   = "name"
    values = ["al2023-ami-*-x86_64"]  # Amazon Linux 2023 naming pattern
  }
}

# The EC2 instance. Notice it references both the subnet and the AMI data source.
resource "aws_instance" "web_server" {
  ami           = data.aws_ami.amazon_linux_2023.id  # Dynamic AMI from the data source above
  instance_type = "t3.micro"
  subnet_id     = aws_subnet.public_web_subnet.id

  tags = {
    Name        = "${var.environment_name}-web-server"
    Environment = var.environment_name
  }
}

# Outputs let you extract values after apply — useful for feeding into CI/CD pipelines
# or just confirming what was built.
output "web_server_public_ip" {
  description = "Public IP of the web server — use this to SSH in or configure DNS"
  value       = aws_instance.web_server.public_ip
}

output "vpc_id" {
  description = "ID of the created VPC — needed if other Terraform workspaces reference this network"
  value       = aws_vpc.primary_network.id
}
▶ Output
$ terraform plan -var='environment_name=staging'

Terraform will perform the following actions:

# aws_instance.web_server will be created
+ resource "aws_instance" "web_server" {
+ ami = "ami-0abcdef1234567890"
+ instance_type = "t3.micro"
+ tags = {
+ "Environment" = "staging"
+ "ManagedBy" = "terraform"
+ "Name" = "staging-web-server"
}
...
}

# aws_subnet.public_web_subnet will be created
+ resource "aws_subnet" "public_web_subnet" { ... }

# aws_vpc.primary_network will be created
+ resource "aws_vpc" "primary_network" { ... }

Plan: 3 to add, 0 to change, 0 to destroy.

$ terraform apply -var='environment_name=staging' -auto-approve

aws_vpc.primary_network: Creating...
aws_vpc.primary_network: Creation complete after 2s [id=vpc-0a1b2c3d4e5f67890]
aws_subnet.public_web_subnet: Creating...
aws_subnet.public_web_subnet: Creation complete after 1s [id=subnet-0f9e8d7c6b5a43210]
aws_instance.web_server: Creating...
aws_instance.web_server: Creation complete after 32s [id=i-0123456789abcdef0]

Apply complete! Resources: 3 added, 0 changed, 0 destroyed.

Outputs:

vpc_id = "vpc-0a1b2c3d4e5f67890"
web_server_public_ip = "54.210.167.83"
⚠️
Watch Out: The Dependency Graph Is ImplicitTerraform builds its execution order from resource references, not from the order you write blocks. If `aws_instance` references `aws_subnet.public_web_subnet.id`, Terraform knows to create the subnet first — automatically. You don't need `depends_on` for this. Only use `depends_on` for hidden dependencies that don't appear in a reference, like an IAM policy attachment that must exist before an EC2 instance can assume a role.

The State File — Why It's the Heart of Terraform and How to Not Kill It

The state file (terraform.tfstate) is a JSON document that maps your HCL resource names to real cloud resource IDs. When you write aws_instance.web_server, Terraform stores the fact that this logical name corresponds to i-0123456789abcdef0 in AWS. Without it, Terraform would have no idea what it already built and would try to create duplicates on every apply.

Here's the problem: by default the state file sits on your local machine. The moment two engineers on a team both run terraform apply, you have a race condition. Whoever writes their state file last wins — and the loser's changes get orphaned in AWS with no state record. Those resources become ghost infrastructure: real, billing you, invisible to Terraform.

The solution is remote state — storing the state file in a shared, locked backend like S3 with DynamoDB locking (for AWS teams) or Terraform Cloud. The DynamoDB lock table is what prevents two simultaneous applies: the first engineer acquires the lock, the second gets a clear error message and must wait.

You should also never manually edit the state file. If something goes wrong — a resource gets deleted outside of Terraform — use terraform import to bring the real resource back under management, or terraform state rm to drop a resource from state without destroying it.

backend.tf · HCL
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980
# backend.tf — Remote state configuration using S3 + DynamoDB
# This file MUST be committed to version control so the entire team
# uses the same backend. The S3 bucket and DynamoDB table themselves
# are usually bootstrapped manually (or via a separate 'bootstrap' Terraform workspace)
# because you can't store state for the thing that stores your state.

terraform {
  backend "s3" {
    bucket = "my-company-terraform-state"  # Must already exist — Terraform won't create it
    key    = "services/web-app/staging/terraform.tfstate"  # Path within the bucket — use a consistent naming scheme
    region = "us-east-1"

    # DynamoDB table provides state locking — prevents concurrent applies
    # Table must have a partition key named exactly 'LockID' (string type)
    dynamodb_table = "terraform-state-locks"

    # Encrypt the state file at rest — your state contains sensitive values
    # like database passwords and private IPs
    encrypt = true
  }
}

# ─── How to bootstrap the S3 bucket and DynamoDB table themselves ───
# The resources below are meant to run ONCE in a dedicated 'bootstrap' workspace
# that uses local state (committed to git for reference).
# After running this once, you never touch it again.

resource "aws_s3_bucket" "terraform_state_store" {
  bucket = "my-company-terraform-state"

  # Prevent accidental deletion of this bucket — if it's gone, all your state is gone
  lifecycle {
    prevent_destroy = true
  }

  tags = {
    Name      = "Terraform Remote State"
    ManagedBy = "terraform-bootstrap"
  }
}

# Versioning on the bucket means you can recover from a botched state write
# by rolling back to a previous version — this has saved production more than once
resource "aws_s3_bucket_versioning" "state_store_versioning" {
  bucket = aws_s3_bucket.terraform_state_store.id

  versioning_configuration {
    status = "Enabled"
  }
}

# Block all public access — state files contain secrets, never make them public
resource "aws_s3_bucket_public_access_block" "state_store_access_block" {
  bucket                  = aws_s3_bucket.terraform_state_store.id
  block_public_acls       = true
  block_public_policy     = true
  ignore_public_acls      = true
  restrict_public_buckets = true
}

# The DynamoDB table for distributed locking
resource "aws_dynamodb_table" "terraform_lock_table" {
  name         = "terraform-state-locks"
  billing_mode = "PAY_PER_REQUEST"  # No need to provision capacity for a low-traffic lock table
  hash_key     = "LockID"           # Must be exactly 'LockID'Terraform expects this name

  attribute {
    name = "LockID"
    type = "S"  # String type
  }

  lifecycle {
    prevent_destroy = true  # Losing this table means losing state locking — never delete it
  }

  tags = {
    Name      = "Terraform State Lock Table"
    ManagedBy = "terraform-bootstrap"
  }
}
▶ Output
$ terraform init

Initializing the backend...

Successfully configured the backend "s3"! Terraform will automatically
use this backend unless the backend configuration changes.

Initializing provider plugins...
- Finding hashicorp/aws versions matching "~> 5.0"...
- Installing hashicorp/aws v5.31.0...
- Installed hashicorp/aws v5.31.0 (signed by HashiCorp)

Terraform has been successfully initialized!

# When a second engineer tries to apply at the same time:
$ terraform apply
Acquiring state lock. This may take a few moments...

Error: Error acquiring the state lock

Error message: ConditionalCheckFailedException: The conditional request failed
Lock Info:
ID: f2a1b3c4-d5e6-7890-abcd-ef1234567890
Path: my-company-terraform-state/services/web-app/staging/terraform.tfstate
Operation: OperationTypeApply
Who: alice@build-server-01
Created: 2024-03-15 14:22:01 UTC

Terraform acquires a state lock to protect from concurrent modifications.
Another Terraform process is currently running. Wait for it to complete,
or use `terraform force-unlock f2a1b3c4-d5e6-7890-abcd-ef1234567890` if it crashed.
⚠️
Pro Tip: Sensitive Values in State Are Stored in PlaintextEven with `sensitive = true` on a variable or output, Terraform still writes the actual value to the state file in plaintext JSON. This means your S3 bucket encryption and IAM permissions on that bucket aren't optional — they're your last line of defence against leaking database passwords and API keys. Always enable `encrypt = true` in your S3 backend config and restrict bucket access to only the IAM roles that run Terraform.

Modules and Workspaces — Structuring Terraform for Real Teams at Scale

Once you move beyond a single environment, two problems emerge fast: you're copy-pasting .tf files between staging and production (violating DRY), and you're terrified of running terraform apply in the wrong directory.

Modules solve the DRY problem. A module is just a folder of .tf files with defined inputs (variables) and outputs. You write the VPC setup once as a module, then call it from your staging config with environment_name = staging and from your production config with environment_name = production. Changes to the VPC logic happen in one place.

Workspaces solve the isolation problem — but with a caveat. Terraform workspaces let you maintain separate state files for the same configuration, switching between them with terraform workspace select staging. They're great for lightweight environment separation, but they use the same backend bucket and the same code, so a misconfigured variable in terraform.tfvars can still nuke production.

For serious multi-environment setups, most teams graduate to a directory-based structure instead: environments/staging/ and environments/production/ each have their own main.tf that calls shared modules. Each directory has its own state file with its own backend key. It's more files, but it makes a terraform apply in the wrong environment physically impossible when you're in the wrong directory.

modules/web_application/main.tf · HCL
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110
# ─── Project structure ───────────────────────────────────────────
# terraform-infrastructure/
# ├── modules/
# │   └── web_application/
# │       ├── main.tf       ← You are here
# │       ├── variables.tf
# │       └── outputs.tf
# ├── environments/
# │   ├── staging/
# │   │   ├── main.tf       ← Calls the module with staging values
# │   │   └── terraform.tfvars
# │   └── production/
# │       ├── main.tf       ← Calls the module with production values
# │       └── terraform.tfvars
# └── backend.tf
# ─────────────────────────────────────────────────────────────────

# modules/web_application/variables.tf
variable "environment_name" {
  type        = string
  description = "Deployment environment — controls naming and sizing"
  validation {
    condition     = contains(["staging", "production"], var.environment_name)
    error_message = "environment_name must be either 'staging' or 'production'."  # Catches typos before they hit AWS
  }
}

variable "instance_type" {
  type        = string
  description = "EC2 instance size — use t3.micro for staging, t3.large for production"
  default     = "t3.micro"
}

variable "vpc_cidr_block" {
  type        = string
  description = "CIDR block for the VPC — must not overlap with other environments"
}

# modules/web_application/main.tf
resource "aws_vpc" "app_network" {
  cidr_block = var.vpc_cidr_block

  tags = {
    Name        = "${var.environment_name}-app-vpc"
    Environment = var.environment_name
  }
}

resource "aws_subnet" "app_subnet" {
  vpc_id     = aws_vpc.app_network.id
  cidr_block = cidrsubnet(var.vpc_cidr_block, 8, 1)  # cidrsubnet carves a /24 out of the /16 automatically

  tags = {
    Name        = "${var.environment_name}-app-subnet"
    Environment = var.environment_name
  }
}

# modules/web_application/outputs.tf
output "vpc_id" {
  value       = aws_vpc.app_network.id
  description = "VPC ID — expose this so callers can attach other resources to the same network"
}

output "subnet_id" {
  value       = aws_subnet.app_subnet.id
  description = "Subnet ID for the primary application subnet"
}

# ─── environments/staging/main.tf ────────────────────────────────
# This is how you CALL the module from an environment directory.
# The module keyword points to the relative path of the module folder.

terraform {
  required_version = ">= 1.5.0"
  backend "s3" {
    bucket         = "my-company-terraform-state"
    key            = "environments/staging/terraform.tfstate"  # Unique key per environment
    region         = "us-east-1"
    dynamodb_table = "terraform-state-locks"
    encrypt        = true
  }
}

provider "aws" {
  region = "us-east-1"
}

module "staging_web_app" {
  source = "../../modules/web_application"  # Relative path to the module

  environment_name = "staging"
  instance_type    = "t3.micro"   # Cheaper instance for non-production
  vpc_cidr_block   = "10.1.0.0/16"  # Non-overlapping CIDR — staging uses 10.1.x.x
}

# ─── environments/production/main.tf ─────────────────────────────

module "production_web_app" {
  source = "../../modules/web_application"

  environment_name = "production"
  instance_type    = "t3.large"     # Larger instance for production load
  vpc_cidr_block   = "10.2.0.0/16"  # Production uses 10.2.x.x — no CIDR collision
}

# Outputs from a module are accessed via module.<module_name>.<output_name>
output "production_vpc_id" {
  value = module.production_web_app.vpc_id
}
▶ Output
# Running from environments/staging/
$ terraform init && terraform apply

Initializing modules...
- staging_web_app in ../../modules/web_application

Apply complete! Resources: 2 added, 0 changed, 0 destroyed.

# Running from environments/production/
$ terraform init && terraform apply

Initializing modules...
- production_web_app in ../../modules/web_application

Apply complete! Resources: 2 added, 0 changed, 0 destroyed.

Outputs:
production_vpc_id = "vpc-0b2c3d4e5f67890a1"

# If you accidentally type the wrong environment name in a tfvars file:
$ terraform plan

Error: Invalid value for variable

on ../../modules/web_application/variables.tf line 4, in variable "environment_name":
4: validation {

Validation failed: environment_name must be either 'staging' or 'production'.

# The validation block caught a typo ('Staging' vs 'staging') before any AWS API call was made.
🔥
Interview Gold: Modules vs WorkspacesInterviewers love this distinction. Workspaces share code and differ only by state — they're ideal for feature branch testing where the infrastructure topology is identical. Directory-based modules share logic but have fully independent configurations, backends, and state files — they're the right choice for staging vs production where sizing, redundancy, and access controls genuinely differ. Most mature teams use both: modules for DRY logic, directories for environment isolation.
AspectTerraform WorkspacesDirectory-Based Environments
State isolationSeparate state files per workspace in the same backend pathFully separate backend key and state file per directory
Code duplicationZero — single set of .tf files for all workspacesMinimal — environments share modules, not raw .tf files
Risk of cross-environment mistakesHigh — easy to forget which workspace is selectedLow — you physically cd into the right directory
Supports different topologiesNo — staging and prod must have identical resource shapesYes — production can have Multi-AZ, staging can be single-AZ
Best forEphemeral feature branches, developer sandboxesLong-lived environments: staging, UAT, production
CI/CD integration complexityMedium — must inject workspace selection stepLow — different pipeline steps point to different directories

🎯 Key Takeaways

  • Terraform's plan is only as accurate as its state file — if someone manually edits cloud resources, the state lies and your next apply may surprise you. Run terraform refresh or use terraform plan output as a change review gate in CI/CD.
  • Remote state with DynamoDB locking isn't optional for teams — it's the single change that takes Terraform from a solo tool to a safe team tool. Set it up before your second engineer runs their first apply.
  • Modules are just folders with variables and outputs — there's no magic syntax. The power is in how you structure them: one module per logical concern (networking, compute, database) that gets called with different variable values per environment.
  • The prevent_destroy = true lifecycle rule is your last line of defence against accidental terraform destroy on critical resources. Add it to every stateful resource (databases, S3 buckets, DNS zones) immediately after creation.

⚠ Common Mistakes to Avoid

  • Mistake 1: Committing terraform.tfstate to Git — The state file contains plaintext sensitive values (passwords, private keys, IP addresses). Even one accidental commit exposes secrets in your repo history permanently. Fix: add .tfstate and .tfstate.backup to .gitignore immediately, then configure a remote backend (S3 + DynamoDB or Terraform Cloud) so the state file never lives on disk in your repo.
  • Mistake 2: Running terraform destroy in the wrong directory or workspace — The exact symptom is a confirmation prompt that lists your production database as 'will be destroyed'. Because Terraform destroy tears down everything in the current state, running it in the wrong context is catastrophic. Fix: add a lifecycle { prevent_destroy = true } block to any resource that should never be deleted (RDS instances, S3 buckets with data), and enforce a CI/CD pipeline where applies require a PR review rather than running locally.
  • Mistake 3: Hardcoding provider credentials in .tf files — You'll see AWS_ACCESS_KEY_ID committed directly in a provider block, usually by someone following an old tutorial. It fails in CI/CD environments and exposes credentials in version control. Fix: always use environment variables (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY) or, better yet, IAM roles attached to your CI runner or EC2 instance. The provider block should contain nothing but region.

Interview Questions on This Topic

  • QWhat happens when two engineers run `terraform apply` simultaneously against the same state file, and how does Terraform prevent corruption?
  • QExplain the difference between a data source and a resource in Terraform. When would you use a data source over creating a new resource?
  • QIf a colleague manually deleted an AWS resource that Terraform manages, what state does Terraform think the world is in, and what are your options to fix it without causing an outage?

Frequently Asked Questions

What is the Terraform state file and why is it important?

The state file is a JSON document Terraform uses to map your HCL resource names (like aws_instance.web_server) to real cloud resource IDs (like i-0123456789abcdef0). Without it, Terraform can't tell what it already built, so it would try to recreate everything on every apply. It's critical because it's the source of truth for Terraform's diff engine — if it's wrong or missing, your infrastructure can get out of sync with your code.

What's the difference between terraform plan and terraform apply?

terraform plan is a dry run — it reads your .tf files and state file, calls cloud APIs to check current resource state, and shows you exactly what it would create, change, or destroy. No real changes happen. terraform apply executes that plan against your cloud provider. In production workflows, you should always review the plan output before applying, and many teams require a second engineer to approve the plan in a pull request before the pipeline runs apply.

When should I use Terraform modules?

Use a module the moment you find yourself copying the same block of Terraform code into more than one place. Good candidates are patterns you repeat per environment (VPC + subnets + routing), per service (an ECS service with its IAM role and security group), or per team (a standard developer environment template). Modules enforce consistency — if the security group rule needs changing, you change it once in the module and all callers get the update on their next apply.

🔥
TheCodeForge Editorial Team Verified Author

Written and reviewed by senior developers with real-world experience across enterprise, startup and open-source projects. Every article on TheCodeForge is written to be clear, accurate and genuinely useful — not just SEO filler.

← PreviousIntroduction to AzureNext →Ansible Basics
Forged with 🔥 at TheCodeForge.io — Where Developers Are Forged