Terraform Basics Explained — State, Providers, and Real-World IaC Patterns
Every company running in the cloud eventually hits the same wall: someone clicks around the AWS console to spin up a server, another person does it slightly differently, and six months later nobody knows what's running or why. Servers become 'pets' — hand-crafted, irreplaceable, and terrifying to touch. Terraform exists to end that chaos by letting you describe your entire infrastructure in version-controlled code, the same way you describe your application logic.
Before Terraform, teams either wrote brittle bash scripts full of AWS CLI commands or relied entirely on cloud-specific tools like CloudFormation (which only works on AWS) or Azure ARM templates (which only work on Azure). Terraform solved the vendor lock-in problem by introducing a single declarative language — HCL — that works across AWS, GCP, Azure, and hundreds of other providers. You write your intent ('I want three EC2 instances'), and Terraform figures out the sequence of API calls to make it real.
By the end of this article you'll understand why the Terraform state file is both its superpower and its biggest footgun, how providers and modules keep your code DRY at scale, and how a real-world multi-environment setup actually looks — not a toy example, but the kind of structure you'd find in a production codebase at a fast-growing startup or enterprise engineering team.
How Terraform's Core Loop Actually Works — Plan, Apply, State
Most tutorials show you terraform apply and move on. But the real magic — and the real danger — lives in the three-step loop Terraform runs every single time you touch your infrastructure.
First, Terraform reads your .tf files and builds a desired state — a mental model of what you want the world to look like. Then it reads the state file (more on this shortly) to understand what it already built. Finally, it calls your cloud provider's APIs to build a diff between those two pictures. That diff is your plan.
This is fundamentally different from imperative tools like Ansible where you say 'run these steps.' Terraform is declarative — you say 'here's the destination' and it plots the route. The benefit is idempotency: running terraform apply ten times on an unchanged config does nothing after the first run, because the desired state already matches reality.
The critical thing to internalise is that Terraform doesn't inspect your live cloud resources to build that diff — it trusts the state file. If someone manually changes a resource in the AWS console, Terraform doesn't know. Your state file lies. That's the source of more production incidents than almost any other Terraform mistake.
# main.tf — A minimal but complete AWS setup that demonstrates the core loop # This creates a VPC and a single EC2 instance inside it. # Run: terraform init -> terraform plan -> terraform apply terraform { required_version = ">= 1.5.0" required_providers { aws = { source = "hashicorp/aws" version = "~> 5.0" # Pin to major version to avoid surprise breaking changes } } } # The provider block tells Terraform WHERE to build — credentials come from # environment variables (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY) or an IAM role. # Never hardcode credentials in .tf files — they'll end up in version control. provider "aws" { region = var.aws_region } # Variables make this config reusable across environments. # Actual values live in terraform.tfvars (git-ignored) or are passed via -var flags. variable "aws_region" { type = string description = "AWS region where all resources will be created" default = "us-east-1" } variable "environment_name" { type = string description = "Environment tag applied to every resource (e.g. staging, production)" } # The VPC is our private network — everything else lives inside it. resource "aws_vpc" "primary_network" { cidr_block = "10.0.0.0/16" enable_dns_hostnames = true # Needed so EC2 instances get resolvable DNS names tags = { Name = "${var.environment_name}-vpc" Environment = var.environment_name ManagedBy = "terraform" # Tagging as Terraform-managed helps ops teams know NOT to edit manually } } # A public subnet within that VPC. resource "aws_subnet" "public_web_subnet" { vpc_id = aws_vpc.primary_network.id # Reference to the VPC above — Terraform builds the dependency graph from this cidr_block = "10.0.1.0/24" availability_zone = "${var.aws_region}a" map_public_ip_on_launch = true tags = { Name = "${var.environment_name}-public-subnet" Environment = var.environment_name } } # Data source — reads EXISTING resources rather than creating new ones. # Here we fetch the latest Amazon Linux 2023 AMI ID dynamically, # so we're never hardcoding an AMI that gets deprecated. data "aws_ami" "amazon_linux_2023" { most_recent = true owners = ["amazon"] filter { name = "name" values = ["al2023-ami-*-x86_64"] # Amazon Linux 2023 naming pattern } } # The EC2 instance. Notice it references both the subnet and the AMI data source. resource "aws_instance" "web_server" { ami = data.aws_ami.amazon_linux_2023.id # Dynamic AMI from the data source above instance_type = "t3.micro" subnet_id = aws_subnet.public_web_subnet.id tags = { Name = "${var.environment_name}-web-server" Environment = var.environment_name } } # Outputs let you extract values after apply — useful for feeding into CI/CD pipelines # or just confirming what was built. output "web_server_public_ip" { description = "Public IP of the web server — use this to SSH in or configure DNS" value = aws_instance.web_server.public_ip } output "vpc_id" { description = "ID of the created VPC — needed if other Terraform workspaces reference this network" value = aws_vpc.primary_network.id }
Terraform will perform the following actions:
# aws_instance.web_server will be created
+ resource "aws_instance" "web_server" {
+ ami = "ami-0abcdef1234567890"
+ instance_type = "t3.micro"
+ tags = {
+ "Environment" = "staging"
+ "ManagedBy" = "terraform"
+ "Name" = "staging-web-server"
}
...
}
# aws_subnet.public_web_subnet will be created
+ resource "aws_subnet" "public_web_subnet" { ... }
# aws_vpc.primary_network will be created
+ resource "aws_vpc" "primary_network" { ... }
Plan: 3 to add, 0 to change, 0 to destroy.
$ terraform apply -var='environment_name=staging' -auto-approve
aws_vpc.primary_network: Creating...
aws_vpc.primary_network: Creation complete after 2s [id=vpc-0a1b2c3d4e5f67890]
aws_subnet.public_web_subnet: Creating...
aws_subnet.public_web_subnet: Creation complete after 1s [id=subnet-0f9e8d7c6b5a43210]
aws_instance.web_server: Creating...
aws_instance.web_server: Creation complete after 32s [id=i-0123456789abcdef0]
Apply complete! Resources: 3 added, 0 changed, 0 destroyed.
Outputs:
vpc_id = "vpc-0a1b2c3d4e5f67890"
web_server_public_ip = "54.210.167.83"
The State File — Why It's the Heart of Terraform and How to Not Kill It
The state file (terraform.tfstate) is a JSON document that maps your HCL resource names to real cloud resource IDs. When you write aws_instance.web_server, Terraform stores the fact that this logical name corresponds to i-0123456789abcdef0 in AWS. Without it, Terraform would have no idea what it already built and would try to create duplicates on every apply.
Here's the problem: by default the state file sits on your local machine. The moment two engineers on a team both run terraform apply, you have a race condition. Whoever writes their state file last wins — and the loser's changes get orphaned in AWS with no state record. Those resources become ghost infrastructure: real, billing you, invisible to Terraform.
The solution is remote state — storing the state file in a shared, locked backend like S3 with DynamoDB locking (for AWS teams) or Terraform Cloud. The DynamoDB lock table is what prevents two simultaneous applies: the first engineer acquires the lock, the second gets a clear error message and must wait.
You should also never manually edit the state file. If something goes wrong — a resource gets deleted outside of Terraform — use terraform import to bring the real resource back under management, or terraform state rm to drop a resource from state without destroying it.
# backend.tf — Remote state configuration using S3 + DynamoDB # This file MUST be committed to version control so the entire team # uses the same backend. The S3 bucket and DynamoDB table themselves # are usually bootstrapped manually (or via a separate 'bootstrap' Terraform workspace) # because you can't store state for the thing that stores your state. terraform { backend "s3" { bucket = "my-company-terraform-state" # Must already exist — Terraform won't create it key = "services/web-app/staging/terraform.tfstate" # Path within the bucket — use a consistent naming scheme region = "us-east-1" # DynamoDB table provides state locking — prevents concurrent applies # Table must have a partition key named exactly 'LockID' (string type) dynamodb_table = "terraform-state-locks" # Encrypt the state file at rest — your state contains sensitive values # like database passwords and private IPs encrypt = true } } # ─── How to bootstrap the S3 bucket and DynamoDB table themselves ─── # The resources below are meant to run ONCE in a dedicated 'bootstrap' workspace # that uses local state (committed to git for reference). # After running this once, you never touch it again. resource "aws_s3_bucket" "terraform_state_store" { bucket = "my-company-terraform-state" # Prevent accidental deletion of this bucket — if it's gone, all your state is gone lifecycle { prevent_destroy = true } tags = { Name = "Terraform Remote State" ManagedBy = "terraform-bootstrap" } } # Versioning on the bucket means you can recover from a botched state write # by rolling back to a previous version — this has saved production more than once resource "aws_s3_bucket_versioning" "state_store_versioning" { bucket = aws_s3_bucket.terraform_state_store.id versioning_configuration { status = "Enabled" } } # Block all public access — state files contain secrets, never make them public resource "aws_s3_bucket_public_access_block" "state_store_access_block" { bucket = aws_s3_bucket.terraform_state_store.id block_public_acls = true block_public_policy = true ignore_public_acls = true restrict_public_buckets = true } # The DynamoDB table for distributed locking resource "aws_dynamodb_table" "terraform_lock_table" { name = "terraform-state-locks" billing_mode = "PAY_PER_REQUEST" # No need to provision capacity for a low-traffic lock table hash_key = "LockID" # Must be exactly 'LockID' — Terraform expects this name attribute { name = "LockID" type = "S" # String type } lifecycle { prevent_destroy = true # Losing this table means losing state locking — never delete it } tags = { Name = "Terraform State Lock Table" ManagedBy = "terraform-bootstrap" } }
Initializing the backend...
Successfully configured the backend "s3"! Terraform will automatically
use this backend unless the backend configuration changes.
Initializing provider plugins...
- Finding hashicorp/aws versions matching "~> 5.0"...
- Installing hashicorp/aws v5.31.0...
- Installed hashicorp/aws v5.31.0 (signed by HashiCorp)
Terraform has been successfully initialized!
# When a second engineer tries to apply at the same time:
$ terraform apply
Acquiring state lock. This may take a few moments...
Error: Error acquiring the state lock
Error message: ConditionalCheckFailedException: The conditional request failed
Lock Info:
ID: f2a1b3c4-d5e6-7890-abcd-ef1234567890
Path: my-company-terraform-state/services/web-app/staging/terraform.tfstate
Operation: OperationTypeApply
Who: alice@build-server-01
Created: 2024-03-15 14:22:01 UTC
Terraform acquires a state lock to protect from concurrent modifications.
Another Terraform process is currently running. Wait for it to complete,
or use `terraform force-unlock f2a1b3c4-d5e6-7890-abcd-ef1234567890` if it crashed.
Modules and Workspaces — Structuring Terraform for Real Teams at Scale
Once you move beyond a single environment, two problems emerge fast: you're copy-pasting .tf files between staging and production (violating DRY), and you're terrified of running terraform apply in the wrong directory.
Modules solve the DRY problem. A module is just a folder of .tf files with defined inputs (variables) and outputs. You write the VPC setup once as a module, then call it from your staging config with environment_name = staging and from your production config with environment_name = production. Changes to the VPC logic happen in one place.
Workspaces solve the isolation problem — but with a caveat. Terraform workspaces let you maintain separate state files for the same configuration, switching between them with terraform workspace select staging. They're great for lightweight environment separation, but they use the same backend bucket and the same code, so a misconfigured variable in terraform.tfvars can still nuke production.
For serious multi-environment setups, most teams graduate to a directory-based structure instead: environments/staging/ and environments/production/ each have their own main.tf that calls shared modules. Each directory has its own state file with its own backend key. It's more files, but it makes a terraform apply in the wrong environment physically impossible when you're in the wrong directory.
# ─── Project structure ─────────────────────────────────────────── # terraform-infrastructure/ # ├── modules/ # │ └── web_application/ # │ ├── main.tf ← You are here # │ ├── variables.tf # │ └── outputs.tf # ├── environments/ # │ ├── staging/ # │ │ ├── main.tf ← Calls the module with staging values # │ │ └── terraform.tfvars # │ └── production/ # │ ├── main.tf ← Calls the module with production values # │ └── terraform.tfvars # └── backend.tf # ───────────────────────────────────────────────────────────────── # modules/web_application/variables.tf variable "environment_name" { type = string description = "Deployment environment — controls naming and sizing" validation { condition = contains(["staging", "production"], var.environment_name) error_message = "environment_name must be either 'staging' or 'production'." # Catches typos before they hit AWS } } variable "instance_type" { type = string description = "EC2 instance size — use t3.micro for staging, t3.large for production" default = "t3.micro" } variable "vpc_cidr_block" { type = string description = "CIDR block for the VPC — must not overlap with other environments" } # modules/web_application/main.tf resource "aws_vpc" "app_network" { cidr_block = var.vpc_cidr_block tags = { Name = "${var.environment_name}-app-vpc" Environment = var.environment_name } } resource "aws_subnet" "app_subnet" { vpc_id = aws_vpc.app_network.id cidr_block = cidrsubnet(var.vpc_cidr_block, 8, 1) # cidrsubnet carves a /24 out of the /16 automatically tags = { Name = "${var.environment_name}-app-subnet" Environment = var.environment_name } } # modules/web_application/outputs.tf output "vpc_id" { value = aws_vpc.app_network.id description = "VPC ID — expose this so callers can attach other resources to the same network" } output "subnet_id" { value = aws_subnet.app_subnet.id description = "Subnet ID for the primary application subnet" } # ─── environments/staging/main.tf ──────────────────────────────── # This is how you CALL the module from an environment directory. # The module keyword points to the relative path of the module folder. terraform { required_version = ">= 1.5.0" backend "s3" { bucket = "my-company-terraform-state" key = "environments/staging/terraform.tfstate" # Unique key per environment region = "us-east-1" dynamodb_table = "terraform-state-locks" encrypt = true } } provider "aws" { region = "us-east-1" } module "staging_web_app" { source = "../../modules/web_application" # Relative path to the module environment_name = "staging" instance_type = "t3.micro" # Cheaper instance for non-production vpc_cidr_block = "10.1.0.0/16" # Non-overlapping CIDR — staging uses 10.1.x.x } # ─── environments/production/main.tf ───────────────────────────── module "production_web_app" { source = "../../modules/web_application" environment_name = "production" instance_type = "t3.large" # Larger instance for production load vpc_cidr_block = "10.2.0.0/16" # Production uses 10.2.x.x — no CIDR collision } # Outputs from a module are accessed via module.<module_name>.<output_name> output "production_vpc_id" { value = module.production_web_app.vpc_id }
$ terraform init && terraform apply
Initializing modules...
- staging_web_app in ../../modules/web_application
Apply complete! Resources: 2 added, 0 changed, 0 destroyed.
# Running from environments/production/
$ terraform init && terraform apply
Initializing modules...
- production_web_app in ../../modules/web_application
Apply complete! Resources: 2 added, 0 changed, 0 destroyed.
Outputs:
production_vpc_id = "vpc-0b2c3d4e5f67890a1"
# If you accidentally type the wrong environment name in a tfvars file:
$ terraform plan
Error: Invalid value for variable
on ../../modules/web_application/variables.tf line 4, in variable "environment_name":
4: validation {
Validation failed: environment_name must be either 'staging' or 'production'.
# The validation block caught a typo ('Staging' vs 'staging') before any AWS API call was made.
| Aspect | Terraform Workspaces | Directory-Based Environments |
|---|---|---|
| State isolation | Separate state files per workspace in the same backend path | Fully separate backend key and state file per directory |
| Code duplication | Zero — single set of .tf files for all workspaces | Minimal — environments share modules, not raw .tf files |
| Risk of cross-environment mistakes | High — easy to forget which workspace is selected | Low — you physically cd into the right directory |
| Supports different topologies | No — staging and prod must have identical resource shapes | Yes — production can have Multi-AZ, staging can be single-AZ |
| Best for | Ephemeral feature branches, developer sandboxes | Long-lived environments: staging, UAT, production |
| CI/CD integration complexity | Medium — must inject workspace selection step | Low — different pipeline steps point to different directories |
🎯 Key Takeaways
- Terraform's plan is only as accurate as its state file — if someone manually edits cloud resources, the state lies and your next apply may surprise you. Run
terraform refreshor useterraform planoutput as a change review gate in CI/CD. - Remote state with DynamoDB locking isn't optional for teams — it's the single change that takes Terraform from a solo tool to a safe team tool. Set it up before your second engineer runs their first apply.
- Modules are just folders with variables and outputs — there's no magic syntax. The power is in how you structure them: one module per logical concern (networking, compute, database) that gets called with different variable values per environment.
- The
prevent_destroy = truelifecycle rule is your last line of defence against accidentalterraform destroyon critical resources. Add it to every stateful resource (databases, S3 buckets, DNS zones) immediately after creation.
⚠ Common Mistakes to Avoid
- ✕Mistake 1: Committing terraform.tfstate to Git — The state file contains plaintext sensitive values (passwords, private keys, IP addresses). Even one accidental commit exposes secrets in your repo history permanently. Fix: add
.tfstateand.tfstate.backupto.gitignoreimmediately, then configure a remote backend (S3 + DynamoDB or Terraform Cloud) so the state file never lives on disk in your repo. - ✕Mistake 2: Running terraform destroy in the wrong directory or workspace — The exact symptom is a confirmation prompt that lists your production database as 'will be destroyed'. Because Terraform destroy tears down everything in the current state, running it in the wrong context is catastrophic. Fix: add a
lifecycle { prevent_destroy = true }block to any resource that should never be deleted (RDS instances, S3 buckets with data), and enforce a CI/CD pipeline where applies require a PR review rather than running locally. - ✕Mistake 3: Hardcoding provider credentials in .tf files — You'll see AWS_ACCESS_KEY_ID committed directly in a provider block, usually by someone following an old tutorial. It fails in CI/CD environments and exposes credentials in version control. Fix: always use environment variables (
AWS_ACCESS_KEY_ID,AWS_SECRET_ACCESS_KEY) or, better yet, IAM roles attached to your CI runner or EC2 instance. The provider block should contain nothing butregion.
Interview Questions on This Topic
- QWhat happens when two engineers run `terraform apply` simultaneously against the same state file, and how does Terraform prevent corruption?
- QExplain the difference between a data source and a resource in Terraform. When would you use a data source over creating a new resource?
- QIf a colleague manually deleted an AWS resource that Terraform manages, what state does Terraform think the world is in, and what are your options to fix it without causing an outage?
Frequently Asked Questions
What is the Terraform state file and why is it important?
The state file is a JSON document Terraform uses to map your HCL resource names (like aws_instance.web_server) to real cloud resource IDs (like i-0123456789abcdef0). Without it, Terraform can't tell what it already built, so it would try to recreate everything on every apply. It's critical because it's the source of truth for Terraform's diff engine — if it's wrong or missing, your infrastructure can get out of sync with your code.
What's the difference between terraform plan and terraform apply?
terraform plan is a dry run — it reads your .tf files and state file, calls cloud APIs to check current resource state, and shows you exactly what it would create, change, or destroy. No real changes happen. terraform apply executes that plan against your cloud provider. In production workflows, you should always review the plan output before applying, and many teams require a second engineer to approve the plan in a pull request before the pipeline runs apply.
When should I use Terraform modules?
Use a module the moment you find yourself copying the same block of Terraform code into more than one place. Good candidates are patterns you repeat per environment (VPC + subnets + routing), per service (an ECS service with its IAM role and security group), or per team (a standard developer environment template). Modules enforce consistency — if the security group rule needs changing, you change it once in the module and all callers get the update on their next apply.
Written and reviewed by senior developers with real-world experience across enterprise, startup and open-source projects. Every article on TheCodeForge is written to be clear, accurate and genuinely useful — not just SEO filler.