AWS IAM Explained: Roles, Policies, and Least Privilege in Practice
- Roles beat Users for every machine identity — STS tokens auto-expire in hours, Access Keys do not expire until someone manually revokes them. A leaked temporary credential becomes useless. A leaked Access Key works until someone notices, which is often measured in days.
- There are two completely separate policies on every IAM Role: the Trust Policy controlling who can assume the role, and the Permission Policy controlling what the role can do. Both must be correct. The Trust Policy is evaluated first — if it fails, the Permission Policy is never consulted.
- An explicit Deny in any policy always wins over an Allow, regardless of what any other policy says. AWS IAM is default-deny: silence equals blocked, and every request must earn its way to allowed through explicit grants.
- IAM controls who can do what in your AWS account — Users (humans), Roles (machines), Groups (management containers)
- Every Role has two separate policies: Trust Policy (who can assume it) and Permission Policy (what it can do) — both must be correct or you get AccessDenied
- AWS IAM is default-deny — if a permission is not explicitly granted, it is blocked. Explicit Deny always wins over Allow, regardless of what any other policy says.
- Roles issue short-lived STS tokens (15min–12hr) that auto-rotate — leaked credentials expire in hours, not forever
- Use IAM Access Analyzer to generate least-privilege policies from actual CloudTrail data — deploy broad in staging, tighten in production
- Biggest mistake: using AdministratorAccess on Lambda or EC2 roles 'just to make it work' — a single compromised function gets full account access in seconds
AccessDenied — need to identify which policy is causing the denial.
aws cloudtrail lookup-events --lookup-attributes AttributeKey=EventName,AttributeValue=<api-action> --max-results 5 --query 'Events[*].CloudTrailEvent' --output text | jq .aws iam simulate-principal-policy --policy-source-arn <role-arn> --action-names <failing-action> --resource-arns <resource-arn>Need to see all policies attached to a role — both managed and inline.
aws iam list-attached-role-policies --role-name <role-name>aws iam list-role-policies --role-name <role-name>Lambda or EC2 may be using wrong credentials — need to verify the active role.
aws sts get-caller-identitycurl http://169.254.169.254/latest/meta-data/iam/security-credentials/Suspect an SCP is blocking access at the organization level.
aws organizations list-policies-for-target --target-id <account-id> --filter SERVICE_CONTROL_POLICYaws organizations describe-policy --policy-id <scp-id> | jq '.Policy.Document | fromjson'Production Incident
Production Debug GuideSystematic path from AccessDenied error to root cause.
Every significant AWS breach traces back to one root cause: IAM was misconfigured or ignored. Exposed S3 buckets, compromised Lambda functions, leaked credentials on GitHub — all IAM failures at their core. IAM is not a niche security topic you deal with after everything else is working; it is the foundation every other AWS service is built on, and getting it wrong is how companies make the news.
IAM introduced fine-grained, programmable permissions to replace the blunt root-account-or-nothing model. You can now specify that 'this Lambda function may read from exactly one S3 bucket and nothing else.' That specificity is the difference between a contained incident and a company-ending breach. The controls exist. The question is whether your team uses them.
The three concepts to master: Users (human identities with permanent credentials), Roles (machine identities with temporary STS tokens that auto-expire), and Policies (JSON documents defining what is allowed or denied). Getting the relationship between these three right — and specifically understanding that Trust Policies and Permission Policies are completely separate documents with completely different jobs — is what separates a secure architecture from a ticking time bomb.
In 2026, with AWS Identity Center replacing the old SSO console and OIDC-based authentication now the standard for CI/CD pipelines, the era of long-lived Access Keys for automation should be over. The patterns in this guide reflect where the security bar actually sits today, not where it was in 2019.
Users, Groups, and Roles — Picking the Right Identity Tool
The most common IAM confusion comes from mixing up three distinct concepts that look superficially similar but serve completely different architectural purposes.
IAM User represents a human or legacy script requiring long-term credentials — a password for the console and an Access Key and Secret Key pair for programmatic access. The modern rule is stark: machines should almost never be IAM Users. A User's Access Key does not expire. If it leaks, it works until someone manually revokes it.
IAM Group is a management container for Users. Instead of attaching policies to ten developers individually, you attach them once to a BackendEngineers group. Groups have no credentials of their own and cannot be assumed by services — they are purely an administrative convenience, not a security boundary.
IAM Role is the most important tool in your toolkit. A Role has no long-term credentials whatsoever. Instead, it is assumed temporarily by a principal — an EC2 instance, a Lambda function, a CI/CD pipeline, or a cross-account service. When something assumes a Role, AWS STS issues a set of temporary credentials (access key, secret key, and session token) that expire automatically between 15 minutes and 12 hours. That auto-expiry is your primary defence against credential leaks: a leaked temporary credential becomes useless within hours, not years.
In 2026 the modern standard is unambiguous: AWS IAM Identity Center (formerly SSO) for all human access, Roles for all machine and service identities, and OIDC-based federation for CI/CD pipelines. Permanent Access Keys on developer machines should be a red flag in any security review, not a routine practice.
# io.thecodeforge: Production-grade IAM Role for EC2 with S3 read access # Deploy: aws cloudformation deploy \ # --template-file ec2_s3_read_role.yaml \ # --stack-name forge-iam-stack \ # --capabilities CAPABILITY_NAMED_IAM AWSTemplateFormatVersion: '2010-09-09' Description: Minimalist IAM Role for EC2 with scoped S3 Read Access Parameters: TargetBucketName: Type: String Default: thecodeforge-app-data Description: The exact S3 bucket this instance is allowed to read from Resources: # 1. THE ROLE — defines who can wear the badge (Trust Policy) # The Principal block answers: which AWS service can assume this role? # Missing 'ec2.amazonaws.com' here = "Unable to locate credentials" at runtime AppServerInstanceRole: Type: AWS::IAM::Role Properties: RoleName: !Sub "${AWS::StackName}-EC2S3ReadOnly" Description: "Allows EC2 instances to read from the app data bucket only" AssumeRolePolicyDocument: Version: '2012-10-17' Statement: - Effect: Allow Principal: Service: ec2.amazonaws.com # TRUST POLICY: EC2 service can assume this Action: sts:AssumeRole Tags: - Key: ManagedBy Value: CloudFormation - Key: Purpose Value: AppServerS3Access # 2. THE POLICY — defines what the role can do (Permission Policy) # Scoped to a single bucket and only the two actions actually needed. # s3:ListBucket requires the bucket ARN; s3:GetObject requires the object ARN. # These are different ARN formats — a common gotcha. AppServerS3ReadPolicy: Type: AWS::IAM::Policy Properties: PolicyName: !Sub "${AWS::StackName}-S3ScopedReadAccess" PolicyDocument: Version: '2012-10-17' Statement: - Sid: AllowBucketList Effect: Allow Action: - s3:ListBucket Resource: - !Sub "arn:aws:s3:::${TargetBucketName}" Condition: StringLike: s3:prefix: "app-data/*" # further scope to a prefix if possible - Sid: AllowObjectRead Effect: Allow Action: - s3:GetObject Resource: - !Sub "arn:aws:s3:::${TargetBucketName}/app-data/*" Roles: - !Ref AppServerInstanceRole # 3. THE INSTANCE PROFILE — the bridge between EC2 and the Role # EC2 cannot use an IAM Role directly. It needs this wrapper. # Reference this profile in your LaunchTemplate — NOT the Role ARN. AppServerInstanceProfile: Type: AWS::IAM::InstanceProfile Properties: InstanceProfileName: !Sub "${AWS::StackName}-EC2Profile" Roles: - !Ref AppServerInstanceRole Outputs: RoleArn: Description: Role ARN for cross-stack references Value: !GetAtt AppServerInstanceRole.Arn Export: Name: !Sub "${AWS::StackName}-RoleArn" InstanceProfileArn: Description: Instance Profile ARN to attach to EC2 LaunchTemplate Value: !GetAtt AppServerInstanceProfile.Arn Export: Name: !Sub "${AWS::StackName}-InstanceProfileArn"
# Successfully created/updated stack - forge-iam-stack
#
# Stack Outputs:
# RoleArn: arn:aws:iam::123456789012:role/forge-iam-stack-EC2S3ReadOnly
# InstanceProfileArn: arn:aws:iam::123456789012:instance-profile/forge-iam-stack-EC2Profile
#
# The EC2 metadata service now serves rotating credentials automatically:
# curl http://169.254.169.254/latest/meta-data/iam/security-credentials/forge-iam-stack-EC2S3ReadOnly
IAM Policies Deep-Dive — Trust Policies vs Permission Policies
IAM policies are JSON documents that define permissions, but there are two completely different types of policies in play on every Role, and confusing them is the source of most AccessDenied errors and most of the time spent debugging them.
The Permission Policy answers the question: what actions can this identity perform? It is attached to a User or Role and defines the allowed or denied API calls, the resources those calls can target, and the conditions under which the permission applies. This is what most people think of when they hear 'IAM policy.'
The Trust Policy answers a completely different question: who is allowed to assume this role? It lives exclusively in the AssumeRolePolicyDocument on the Role resource itself and is evaluated before the permission policy is even consulted. If your Trust Policy does not list lambda.amazonaws.com as a trusted principal, no Lambda function can assume that role — it does not matter how many permissions the role has. The role is simply unavailable to that service.
The evaluation order when an API call hits AWS: first, can this principal assume the role at all? (Trust Policy). Second, is there an explicit Deny anywhere in the policy stack? (any SCP, Permission Boundary, or policy with Deny wins immediately). Third, is there at least one Allow for this specific action on this specific resource? (Permission Policy, Resource Policy). If the answer to the second question is yes, the request is denied regardless of any Allows. If the answer to the first or third question is no, the request is also denied. Default deny means that every request starts at denied and must earn its way to allowed through explicit grants.
Understanding this evaluation chain is what makes you fast at debugging access issues. Most engineers start at the permission policy and work outward. The right approach is to start at the Trust Policy and work forward through the chain.
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "AllowECRImagePush",
"Effect": "Allow",
"Action": [
"ecr:BatchCheckLayerAvailability",
"ecr:PutImage",
"ecr:InitiateLayerUpload",
"ecr:UploadLayerPart",
"ecr:CompleteLayerUpload"
],
"Resource": [
"arn:aws:ecr:eu-west-1:123456789012:repository/thecodeforge-app"
],
"Condition": {
"StringEquals": {
"aws:RequestedRegion": "eu-west-1"
}
}
},
{
"Sid": "AllowECRAuth",
"Effect": "Allow",
"Action": [
"ecr:GetAuthorizationToken"
],
"Resource": "*",
"Condition": {
"StringEquals": {
"aws:RequestedRegion": "eu-west-1"
}
}
},
{
"Sid": "AllowLambdaDeploymentOnly",
"Effect": "Allow",
"Action": [
"lambda:UpdateFunctionCode",
"lambda:UpdateFunctionConfiguration",
"lambda:PublishVersion"
],
"Resource": [
"arn:aws:lambda:eu-west-1:123456789012:function:thecodeforge-*"
]
},
{
"Sid": "DenyDestructiveActionsAlways",
"Effect": "Deny",
"Action": [
"lambda:DeleteFunction",
"ecr:DeleteRepository",
"ecr:DeleteRepositoryPolicy"
],
"Resource": "*"
},
{
"Sid": "DenyAllOutsideApprovedRegions",
"Effect": "Deny",
"Action": "*",
"Resource": "*",
"Condition": {
"StringNotEquals": {
"aws:RequestedRegion": ["eu-west-1", "us-east-1"]
}
}
}
]
}
# Validate before attaching:
# aws iam simulate-principal-policy \
# --policy-source-arn arn:aws:iam::123456789012:role/CICDDeployerRole \
# --action-names ecr:PutImage \
# --resource-arns arn:aws:ecr:eu-west-1:123456789012:repository/thecodeforge-app
#
# Expected: {"EvalDecision": "allowed"}
#
# Test the deny guard too:
# aws iam simulate-principal-policy \
# --policy-source-arn arn:aws:iam::123456789012:role/CICDDeployerRole \
# --action-names lambda:DeleteFunction \
# --resource-arns arn:aws:lambda:eu-west-1:123456789012:function:thecodeforge-api
#
# Expected: {"EvalDecision": "explicitDeny"}
- Gate 1: Trust Policy — can this principal assume the role at all? If no, the request never reaches the other gates.
- Gate 2: SCPs — does the AWS Organization allow this action in this account? An SCP Deny is absolute and cannot be overridden.
- Gate 3: Permission Boundary — does this action fall within the role's maximum ceiling? A Boundary Deny overrides any Allow in the Permission Policy.
- Gate 4: Permission Policy — does the role's identity-based policy explicitly allow this action on this resource?
- Gate 5: Resource Policy — does the target resource (S3 bucket, KMS key, SQS queue) allow access from this role? Required for cross-account access.
Least Privilege in Practice — Building a Real-World IAM Strategy
Least privilege is the security practice of granting the absolute minimum permissions required for a task to complete — nothing more. In fast-moving teams, AdministratorAccess is tempting because it eliminates permission debugging entirely. It also means that any bug in your code, any compromised dependency, or any leaked credential is immediately a full-account incident.
The practical path to least privilege is not guessing permissions upfront — it is observing what your application actually calls and generating a policy from that data. IAM Access Analyzer's policy generation feature does exactly this. You run your application in a staging environment with a broad logging policy, CloudTrail captures every API call the application makes, and Access Analyzer generates a tight JSON policy containing only those exact calls. The process takes 10 minutes and produces a policy you can trust to be accurate because it came from real usage data, not someone's best guess.
For human access, the standard in 2026 is short-lived sessions via AWS IAM Identity Center. Developers authenticate through their corporate identity provider (Okta, Google Workspace, Entra ID), receive temporary credentials valid for 1 to 12 hours, and use those to make API calls. A stolen laptop means credentials that expire before the attacker can do meaningful damage. There are no permanent Access Keys in ~/.aws/credentials to exfiltrate.
For CI/CD pipelines, the modern standard is OIDC federation. GitHub Actions, GitLab CI, and most modern CI platforms can authenticate as themselves using an OIDC token and assume an IAM role that trusts their identity. No stored credentials, no rotation, no credential scanning needed for a secret that does not exist.
The three-layer defence for least privilege at scale: Access Analyzer generates tight policies per role. SCPs at the AWS Organizations level set hard limits that no account-level policy can override — denying iam:CreateUser in non-security accounts, restricting compute to approved regions, preventing S3 public access changes. AWS Config rules provide continuous monitoring and alert within minutes when a role receives broader permissions than its baseline, catching the exact mistake from the production incident above.
#!/usr/bin/env bash # io.thecodeforge: Automated Least-Privilege Policy Generation # Prerequisites: # - CloudTrail enabled in the target region # - An AccessAnalyzer ServiceRole with CloudTrail read permissions # - The target role has been running in staging for at least 7 days set -euo pipefail ROLE_NAME="StagingAppRole" ACCOUNT_ID="123456789012" REGION="eu-west-1" CLOUDTRAIL_ARN="arn:aws:cloudtrail:${REGION}:${ACCOUNT_ID}:trail/main-trail" ANALYZER_ROLE="arn:aws:iam::${ACCOUNT_ID}:role/AccessAnalyzerServiceRole" ROLE_ARN="arn:aws:iam::${ACCOUNT_ID}:role/${ROLE_NAME}" echo "==> Starting policy generation for ${ROLE_NAME}" echo " Based on CloudTrail activity in ${REGION} for the last 90 days" echo "" # Step 1: Start the policy generation job # Access Analyzer reads CloudTrail logs and identifies every API call made by the role JOB_ID=$(aws accessanalyzer start-policy-generation \ --region "${REGION}" \ --policy-generation-details "principalArn=${ROLE_ARN}" \ --cloud-trail-details "{\ \"accessRole\": \"${ANALYZER_ROLE}\",\ \"trails\": [{\ \"cloudTrailArn\": \"${CLOUDTRAIL_ARN}\",\ \"regions\": [\"${REGION}\"]\ }],\ \"startTime\": \"$(date -d '90 days ago' --iso-8601=seconds)\",\ \"endTime\": \"$(date --iso-8601=seconds)\"\ }" \ --query 'jobId' \ --output text) echo "==> Job ID: ${JOB_ID} — waiting for completion..." # Step 2: Poll until the job completes (typically 1–5 minutes) while true; do STATUS=$(aws accessanalyzer get-generated-policy \ --region "${REGION}" \ --job-id "${JOB_ID}" \ --query 'jobDetails.status' \ --output text) if [[ "${STATUS}" == "SUCCEEDED" ]]; then echo "==> Job completed successfully" break elif [[ "${STATUS}" == "FAILED" ]]; then echo "ERROR: Policy generation failed. Check CloudTrail permissions on ${ANALYZER_ROLE}" exit 1 fi echo " Status: ${STATUS} — checking again in 20s..." sleep 20 done # Step 3: Extract the generated policy document OUTPUT_FILE="least_privilege_${ROLE_NAME}_$(date +%Y%m%d).json" aws accessanalyzer get-generated-policy \ --region "${REGION}" \ --job-id "${JOB_ID}" \ --query 'generatedPolicyResult.generatedPolicies[0].policy' \ --output text > "${OUTPUT_FILE}" echo "" echo "==> Generated policy written to: ${OUTPUT_FILE}" echo " Review the policy before applying. Key things to verify:" echo " 1. No wildcard (*) resources — every action should target a specific ARN" echo " 2. No unexpected actions — if you see iam:* or s3:* in the output, investigate" echo " 3. Condition keys — add aws:RequestedRegion conditions for all write actions" echo "" echo "==> To apply to the production role:" echo " aws iam put-role-policy \\" echo " --role-name ProductionAppRole \\" echo " --policy-name GeneratedLeastPrivilege \\" echo " --policy-document file://${OUTPUT_FILE}" echo "" cat "${OUTPUT_FILE}" | jq .
Based on CloudTrail activity in eu-west-1 for the last 90 days
==> Job ID: job_abc123def456 — waiting for completion...
Status: IN_PROGRESS — checking again in 20s...
Status: IN_PROGRESS — checking again in 20s...
==> Job completed successfully
==> Generated policy written to: least_privilege_StagingAppRole_20260420.json
# The output contains only the API calls the application actually made:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:PutObject"
],
"Resource": [
"arn:aws:s3:::thecodeforge-app-data/uploads/*"
]
},
{
"Effect": "Allow",
"Action": [
"dynamodb:PutItem",
"dynamodb:GetItem",
"dynamodb:Query"
],
"Resource": [
"arn:aws:dynamodb:eu-west-1:123456789012:table/ForgeOrders"
]
},
{
"Effect": "Allow",
"Action": [
"kms:Decrypt",
"kms:GenerateDataKey"
],
"Resource": [
"arn:aws:kms:eu-west-1:123456789012:key/mrk-abc123"
]
}
]
}
| Aspect | IAM Role | IAM User with Access Key |
|---|---|---|
| Credential lifetime | Temporary — 15 minutes to 12 hours, auto-rotated by STS on every assumption | Permanent — never expires until manually rotated or the key is explicitly deleted |
| Best used for | EC2 instances, Lambda functions, ECS tasks, CI/CD pipelines via OIDC, cross-account access | Legacy scripts or third-party tools that have no support for role assumption or OIDC federation |
| Rotation required? | No — STS handles credential rotation automatically on every assumption; you cannot create a stale role credential | Yes — manual rotation is mandatory, AWS recommends every 90 days, and it is frequently skipped or forgotten |
| Leaked credential impact | Low — the token expires within hours and cannot be renewed without re-assuming the role through the trust policy | High — the Access Key works indefinitely until someone notices and manually revokes it; discovery often takes days or weeks |
| Multi-account support | Yes — role chaining across accounts is native; a role in Account A can assume a role in Account B with appropriate trust policies | No — Access Keys are scoped to the account where the user was created |
| MFA enforcement | Can require MFA to assume the role via aws:MultiFactorAuthPresent condition key in the Trust Policy | MFA can be required for console access; programmatic Access Key usage bypasses MFA entirely by design |
| AWS recommended for workloads? | Yes — the AWS security documentation explicitly recommends Roles for all workload identities; it is the designed pattern | No — AWS explicitly recommends against Access Keys for workloads and has been deprecating their use in official guidance since 2022 |
🎯 Key Takeaways
- Roles beat Users for every machine identity — STS tokens auto-expire in hours, Access Keys do not expire until someone manually revokes them. A leaked temporary credential becomes useless. A leaked Access Key works until someone notices, which is often measured in days.
- There are two completely separate policies on every IAM Role: the Trust Policy controlling who can assume the role, and the Permission Policy controlling what the role can do. Both must be correct. The Trust Policy is evaluated first — if it fails, the Permission Policy is never consulted.
- An explicit Deny in any policy always wins over an Allow, regardless of what any other policy says. AWS IAM is default-deny: silence equals blocked, and every request must earn its way to allowed through explicit grants.
- IAM Access Analyzer's policy generation feature is the practical path to least privilege — run your application in staging with CloudTrail enabled, generate a policy from actual API calls, apply in production. It converts least privilege from a guessing exercise into a data-driven process.
⚠ Common Mistakes to Avoid
Interview Questions on This Topic
- QWhat is the difference between a Trust Policy and a Permission Policy on an IAM Role? What happens if the Trust Policy is missing the service that needs to use the role?Mid-levelReveal
- QA Lambda function in Account A needs to read from DynamoDB in Account A and write to an S3 bucket in Account B. Explain the complete configuration required for this to work securely.SeniorReveal
- QExplain the Confused Deputy problem in AWS IAM. How does the ExternalId condition in a Trust Policy prevent it?SeniorReveal
- QDescribe the IAM policy evaluation order when Identity-based, Resource-based, SCPs, and Permission Boundaries all apply to a single API call. What happens if the Identity-based policy allows s3:* but an SCP denies s3:DeleteBucket?Mid-levelReveal
- QA developer's Access Key was committed to a public GitHub repository. Walk through the immediate remediation steps and the architectural change to prevent recurrence.Mid-levelReveal
Frequently Asked Questions
What is the difference between an IAM Role and an IAM User in AWS?
An IAM User has permanent, long-term credentials — a password for the console and an Access Key and Secret Key for programmatic access. Those credentials do not expire until manually revoked. An IAM Role has no permanent credentials — it issues temporary tokens via AWS STS when assumed, valid for 15 minutes to 12 hours. Use Roles for all application workloads (Lambda, EC2, ECS), CI/CD pipelines, and cross-account access. Reserve Users only for humans who genuinely cannot use AWS IAM Identity Center, or legacy systems that have no support for role assumption — and in those cases, enforce mandatory 90-day key rotation via an AWS Config rule.
What does 'least privilege' mean in AWS IAM and how do you actually achieve it in practice?
Least privilege means each identity has exactly the permissions it needs to do its specific job — no broader. In practice, you do not guess the right permissions upfront. Instead, deploy your application in staging with CloudTrail enabled and a reasonably scoped policy, let it run for 7 or more days to exercise all code paths under realistic conditions, then use IAM Access Analyzer's policy generation feature to produce a policy based only on the API calls your application actually made. Apply that policy in production. Revisit it whenever your application's feature set changes significantly, and monitor for policy drift using AWS Config rules.
Why does my EC2 instance say 'Unable to locate credentials' even though I created an IAM Role for it?
EC2 cannot use an IAM Role directly — it needs an Instance Profile, which is a separate wrapper resource that contains the role. In the AWS Console, this is handled silently when you attach a role to an instance through the UI. In CloudFormation, you must create an explicit AWS::IAM::InstanceProfile resource, reference the Role inside it, and then attach the Instance Profile ARN (not the Role ARN) to the EC2 Instance or LaunchTemplate. Once correctly attached, the EC2 metadata service at 169.254.169.254 serves automatically rotating temporary credentials to any code running on the instance.
What is the difference between an Inline Policy and a Managed Policy in IAM?
A Managed Policy is a standalone IAM resource that can be attached to multiple users, groups, or roles. It has its own ARN, can be versioned, and persists independently of the identities it is attached to. AWS provides a library of AWS-managed policies (ReadOnlyAccess, PowerUserAccess) and you can create customer-managed policies. An Inline Policy is embedded directly within a single identity — it cannot be shared, has no independent ARN, and is deleted when the identity is deleted. Use Managed Policies for nearly everything: they are visible in the IAM console, can be referenced across stacks, and are easier to audit. Reserve Inline Policies for permissions that are strictly specific to one role and should not be accidentally reused.
Developer and founder of TheCodeForge. I built this site because I was tired of tutorials that explain what to type without explaining why it works. Every article here is written to make concepts actually click.