Skip to content
Home DevOps AWS IAM Explained: Roles, Policies, and Least Privilege in Practice

AWS IAM Explained: Roles, Policies, and Least Privilege in Practice

Where developers are forged. · Structured learning · Free forever.
📍 Part of: Cloud → Topic 8 of 23
Master AWS IAM: Learn to control access with Roles, Policies, and Least-Privilege patterns.
⚙️ Intermediate — basic DevOps knowledge assumed
In this tutorial, you'll learn
Master AWS IAM: Learn to control access with Roles, Policies, and Least-Privilege patterns.
  • Roles beat Users for every machine identity — STS tokens auto-expire in hours, Access Keys do not expire until someone manually revokes them. A leaked temporary credential becomes useless. A leaked Access Key works until someone notices, which is often measured in days.
  • There are two completely separate policies on every IAM Role: the Trust Policy controlling who can assume the role, and the Permission Policy controlling what the role can do. Both must be correct. The Trust Policy is evaluated first — if it fails, the Permission Policy is never consulted.
  • An explicit Deny in any policy always wins over an Allow, regardless of what any other policy says. AWS IAM is default-deny: silence equals blocked, and every request must earn its way to allowed through explicit grants.
✦ Plain-English analogy ✦ Real code with output ✦ Interview questions
Quick Answer
  • IAM controls who can do what in your AWS account — Users (humans), Roles (machines), Groups (management containers)
  • Every Role has two separate policies: Trust Policy (who can assume it) and Permission Policy (what it can do) — both must be correct or you get AccessDenied
  • AWS IAM is default-deny — if a permission is not explicitly granted, it is blocked. Explicit Deny always wins over Allow, regardless of what any other policy says.
  • Roles issue short-lived STS tokens (15min–12hr) that auto-rotate — leaked credentials expire in hours, not forever
  • Use IAM Access Analyzer to generate least-privilege policies from actual CloudTrail data — deploy broad in staging, tighten in production
  • Biggest mistake: using AdministratorAccess on Lambda or EC2 roles 'just to make it work' — a single compromised function gets full account access in seconds
🚨 START HERE
IAM Debug Cheat Sheet
Quick commands to diagnose IAM permission issues in production.
🟡AccessDenied — need to identify which policy is causing the denial.
Immediate ActionCheck CloudTrail for the exact event and then simulate it with the policy simulator.
Commands
aws cloudtrail lookup-events --lookup-attributes AttributeKey=EventName,AttributeValue=<api-action> --max-results 5 --query 'Events[*].CloudTrailEvent' --output text | jq .
aws iam simulate-principal-policy --policy-source-arn <role-arn> --action-names <failing-action> --resource-arns <resource-arn>
Fix NowThe simulator output shows exactly which statement evaluated to deny and at which policy level. Fix that specific statement — do not add more Allows hoping one will stick.
🟡Need to see all policies attached to a role — both managed and inline.
Immediate ActionList both managed and inline policies separately — AWS separates them in the API.
Commands
aws iam list-attached-role-policies --role-name <role-name>
aws iam list-role-policies --role-name <role-name>
Fix NowFor each managed policy ARN returned, inspect its document: aws iam get-policy-version --policy-arn <arn> --version-id <version>. For inline policies: aws iam get-role-policy --role-name <name> --policy-name <name>
🟡Lambda or EC2 may be using wrong credentials — need to verify the active role.
Immediate ActionCall STS from inside the running workload to confirm the actual identity.
Commands
aws sts get-caller-identity
curl http://169.254.169.254/latest/meta-data/iam/security-credentials/
Fix NowIf the role ARN does not match your expectation, check the Lambda's execution role in its function configuration or the EC2 instance's attached Instance Profile — not just the role definition itself.
🟡Suspect an SCP is blocking access at the organization level.
Immediate ActionList SCPs applied to the account and inspect their statements.
Commands
aws organizations list-policies-for-target --target-id <account-id> --filter SERVICE_CONTROL_POLICY
aws organizations describe-policy --policy-id <scp-id> | jq '.Policy.Document | fromjson'
Fix NowLook for explicit Deny statements matching your failing action. SCPs override all identity-based and resource-based policies — an SCP deny cannot be overridden by any IAM policy at the account level.
Production IncidentCompromised Lambda with AdministratorAccess — full account takeover in 4 minutesA developer attached AdministratorAccess to a Lambda to debug a permissions issue and never removed it. An attacker exploited a code vulnerability in the function, obtained the role's IAM credentials, and used them to create backdoor IAM users, spin up cryptocurrency mining EC2 instances, and exfiltrate S3 data — all within 4 minutes of initial compromise.
SymptomAWS Cost Explorer shows $47,000 in unexpected EC2 charges overnight. CloudTrail logs reveal three new IAM users created at 3:14 AM with AdministratorAccess attached. S3 access logs show 2TB downloaded from a customer data bucket. The Lambda function's CloudWatch logs show no errors — the function itself was working fine, which is why nobody looked there first.
AssumptionTeam initially assumed the cost spike was a misconfigured auto-scaling group. Then they suspected a compromised developer laptop — reasonable, given the IAM user creation events. The Lambda function was never considered because it 'only processes webhooks' and was not a security concern in anyone's mental model.
Root causeTwo weeks prior, a developer debugging an S3 permissions error attached the AdministratorAccess managed policy to the Lambda execution role for a quick test. The debugging was completed, the original error was resolved, and the policy was never removed. The Lambda had a transitive dependency — an npm package — with a known prototype pollution vulnerability that had a published CVE and a public proof-of-concept exploit. An attacker sent a crafted webhook payload that exploited the vulnerability and achieved code execution inside the Lambda runtime. The Lambda's IAM role had AdministratorAccess, so every API call the attacker's code made had the same permissions as a root-adjacent administrator. In 4 minutes: three IAM users created for persistent backdoor access, 50 p3.8xlarge EC2 instances launched for cryptocurrency mining, and 2TB of customer data downloaded from S3.
Fix1. Immediate: revoked the Lambda role's active STS session tokens by modifying the trust policy to deny all principals temporarily, deleted the three backdoor IAM users, terminated all mining EC2 instances. 2. Short-term: replaced AdministratorAccess with a scoped policy generated by IAM Access Analyzer from the previous 90 days of CloudTrail activity — actual required permissions were s3:GetObject, s3:PutObject, and logs:CreateLogGroup. 3. Long-term: implemented an AWS Config rule iam-policy-no-statements-with-admin-access to alert within 5 minutes of any role receiving admin permissions. Added an SCP in AWS Organizations denying iam:CreateUser and iam:AttachUserPolicy in all non-security accounts. Added Snyk and Dependabot to the CI/CD pipeline for dependency vulnerability scanning before merge.
Key Lesson
AdministratorAccess on a workload role is a single point of total account compromise — a code vulnerability in the function is now a full account vulnerabilityUse IAM Access Analyzer to generate least-privilege policies from actual CloudTrail calls — it takes 10 minutes and eliminates a category of risk entirelyA compromised Lambda inherits its role's permissions exactly — the security of the function's code and the security of its IAM role are the same problemAWS Config rules and SCPs provide automated guardrails that catch what code review misses — deploy them before you need them, not after
Production Debug GuideSystematic path from AccessDenied error to root cause.
AccessDenied with no additional context in the error message.Check CloudTrail for the exact API call, the principal ARN, and the error code. The event record contains the errorCode and errorMessage fields that specify which policy evaluation caused the denial. Filter by eventName=<failing-api> and errorCode=AccessDenied. The CloudTrail event is almost always more informative than the client-side error message.
AccessDenied even though the permission policy clearly allows the action.Check the Trust Policy first — the AssumeRolePolicyDocument on the role itself. If the calling service or principal is not listed in the Trust Policy's Principal block, the role cannot be assumed at all, regardless of what the permission policy says. This is the single most common cause of 'but my policy allows it' debugging sessions. Then check for SCPs at the organization level and Permission Boundaries on the role.
Lambda or EC2 works with local credentials but gets AccessDenied in AWS.Your local credentials are your IAM user's permissions. The Lambda or EC2 runs under its execution role, which has different — and usually more restricted — permissions. Run aws sts get-caller-identity inside the Lambda (via a log statement) or on the EC2 instance to confirm which role is actually active. Then check that role's attached policies, not your user's policies.
AccessDenied for cross-account resource access despite both sides configured.Cross-account access requires two separate grants to both be present: a Trust Policy on the target role allowing the source account's principal to assume it, AND a resource-based policy on the target resource (S3 bucket policy, KMS key policy, SQS queue policy) allowing the role's ARN. If either one is missing, the access fails. Check both sides before assuming either is wrong.
Intermittent AccessDenied — works sometimes, fails other times.Check for Service Control Policies at the AWS Organizations level. SCPs override all identity-based and resource-based policies — an SCP deny is absolute. Intermittent failures often occur when requests route through different accounts or when SCPs are applied at the OU level rather than the account level. Run aws organizations list-policies-for-target --target-id <account-id> --filter SERVICE_CONTROL_POLICY and examine the results.
AccessDenied for an action that worked yesterday with no apparent changes.Check if a Permission Boundary was recently attached to the role. Permission boundaries set a maximum ceiling on what a role can do — even if the permission policy explicitly allows an action, the boundary can deny it, and the denial wins. Run aws iam get-role --role-name <name> and look for the PermissionsBoundary field in the response. Permission boundaries are often added by platform teams without notifying application teams.

Every significant AWS breach traces back to one root cause: IAM was misconfigured or ignored. Exposed S3 buckets, compromised Lambda functions, leaked credentials on GitHub — all IAM failures at their core. IAM is not a niche security topic you deal with after everything else is working; it is the foundation every other AWS service is built on, and getting it wrong is how companies make the news.

IAM introduced fine-grained, programmable permissions to replace the blunt root-account-or-nothing model. You can now specify that 'this Lambda function may read from exactly one S3 bucket and nothing else.' That specificity is the difference between a contained incident and a company-ending breach. The controls exist. The question is whether your team uses them.

The three concepts to master: Users (human identities with permanent credentials), Roles (machine identities with temporary STS tokens that auto-expire), and Policies (JSON documents defining what is allowed or denied). Getting the relationship between these three right — and specifically understanding that Trust Policies and Permission Policies are completely separate documents with completely different jobs — is what separates a secure architecture from a ticking time bomb.

In 2026, with AWS Identity Center replacing the old SSO console and OIDC-based authentication now the standard for CI/CD pipelines, the era of long-lived Access Keys for automation should be over. The patterns in this guide reflect where the security bar actually sits today, not where it was in 2019.

Users, Groups, and Roles — Picking the Right Identity Tool

The most common IAM confusion comes from mixing up three distinct concepts that look superficially similar but serve completely different architectural purposes.

IAM User represents a human or legacy script requiring long-term credentials — a password for the console and an Access Key and Secret Key pair for programmatic access. The modern rule is stark: machines should almost never be IAM Users. A User's Access Key does not expire. If it leaks, it works until someone manually revokes it.

IAM Group is a management container for Users. Instead of attaching policies to ten developers individually, you attach them once to a BackendEngineers group. Groups have no credentials of their own and cannot be assumed by services — they are purely an administrative convenience, not a security boundary.

IAM Role is the most important tool in your toolkit. A Role has no long-term credentials whatsoever. Instead, it is assumed temporarily by a principal — an EC2 instance, a Lambda function, a CI/CD pipeline, or a cross-account service. When something assumes a Role, AWS STS issues a set of temporary credentials (access key, secret key, and session token) that expire automatically between 15 minutes and 12 hours. That auto-expiry is your primary defence against credential leaks: a leaked temporary credential becomes useless within hours, not years.

In 2026 the modern standard is unambiguous: AWS IAM Identity Center (formerly SSO) for all human access, Roles for all machine and service identities, and OIDC-based federation for CI/CD pipelines. Permanent Access Keys on developer machines should be a red flag in any security review, not a routine practice.

io/thecodeforge/iam/ec2_s3_read_role.yaml · YAML
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889
# io.thecodeforge: Production-grade IAM Role for EC2 with S3 read access
# Deploy: aws cloudformation deploy \
#   --template-file ec2_s3_read_role.yaml \
#   --stack-name forge-iam-stack \
#   --capabilities CAPABILITY_NAMED_IAM

AWSTemplateFormatVersion: '2010-09-09'
Description: Minimalist IAM Role for EC2 with scoped S3 Read Access

Parameters:
  TargetBucketName:
    Type: String
    Default: thecodeforge-app-data
    Description: The exact S3 bucket this instance is allowed to read from

Resources:

  # 1. THE ROLE — defines who can wear the badge (Trust Policy)
  # The Principal block answers: which AWS service can assume this role?
  # Missing 'ec2.amazonaws.com' here = "Unable to locate credentials" at runtime
  AppServerInstanceRole:
    Type: AWS::IAM::Role
    Properties:
      RoleName: !Sub "${AWS::StackName}-EC2S3ReadOnly"
      Description: "Allows EC2 instances to read from the app data bucket only"
      AssumeRolePolicyDocument:
        Version: '2012-10-17'
        Statement:
          - Effect: Allow
            Principal:
              Service: ec2.amazonaws.com   # TRUST POLICY: EC2 service can assume this
            Action: sts:AssumeRole
      Tags:
        - Key: ManagedBy
          Value: CloudFormation
        - Key: Purpose
          Value: AppServerS3Access

  # 2. THE POLICY — defines what the role can do (Permission Policy)
  # Scoped to a single bucket and only the two actions actually needed.
  # s3:ListBucket requires the bucket ARN; s3:GetObject requires the object ARN.
  # These are different ARN formats — a common gotcha.
  AppServerS3ReadPolicy:
    Type: AWS::IAM::Policy
    Properties:
      PolicyName: !Sub "${AWS::StackName}-S3ScopedReadAccess"
      PolicyDocument:
        Version: '2012-10-17'
        Statement:
          - Sid: AllowBucketList
            Effect: Allow
            Action:
              - s3:ListBucket
            Resource:
              - !Sub "arn:aws:s3:::${TargetBucketName}"
            Condition:
              StringLike:
                s3:prefix: "app-data/*"   # further scope to a prefix if possible

          - Sid: AllowObjectRead
            Effect: Allow
            Action:
              - s3:GetObject
            Resource:
              - !Sub "arn:aws:s3:::${TargetBucketName}/app-data/*"
      Roles:
        - !Ref AppServerInstanceRole

  # 3. THE INSTANCE PROFILE — the bridge between EC2 and the Role
  # EC2 cannot use an IAM Role directly. It needs this wrapper.
  # Reference this profile in your LaunchTemplateNOT the Role ARN.
  AppServerInstanceProfile:
    Type: AWS::IAM::InstanceProfile
    Properties:
      InstanceProfileName: !Sub "${AWS::StackName}-EC2Profile"
      Roles:
        - !Ref AppServerInstanceRole

Outputs:
  RoleArn:
    Description: Role ARN for cross-stack references
    Value: !GetAtt AppServerInstanceRole.Arn
    Export:
      Name: !Sub "${AWS::StackName}-RoleArn"
  InstanceProfileArn:
    Description: Instance Profile ARN to attach to EC2 LaunchTemplate
    Value: !GetAtt AppServerInstanceProfile.Arn
    Export:
      Name: !Sub "${AWS::StackName}-InstanceProfileArn"
▶ Output
# Deployment output:
# Successfully created/updated stack - forge-iam-stack
#
# Stack Outputs:
# RoleArn: arn:aws:iam::123456789012:role/forge-iam-stack-EC2S3ReadOnly
# InstanceProfileArn: arn:aws:iam::123456789012:instance-profile/forge-iam-stack-EC2Profile
#
# The EC2 metadata service now serves rotating credentials automatically:
# curl http://169.254.169.254/latest/meta-data/iam/security-credentials/forge-iam-stack-EC2S3ReadOnly
⚠ Watch Out: EC2 Roles Require an Instance Profile Wrapper
You cannot attach an IAM Role directly to an EC2 instance — you must wrap it in an AWS::IAM::InstanceProfile resource first. This catches almost everyone writing their first CloudFormation template. The Role and the InstanceProfile are two separate resources that must both exist. The AWS Console hides this detail by creating the InstanceProfile automatically when you attach a role to an instance interactively — CloudFormation does not. Always reference the InstanceProfile ARN in your LaunchTemplate or Instance resource, never the Role ARN.
📊 Production Insight
IAM Users with Access Keys are permanent credentials — they do not expire until manually rotated or deleted. A key sitting in a developer's ~/.aws/credentials file for two years is a two-year window of compromise if that machine is lost or that file is accidentally committed.
Roles issue STS tokens that auto-expire in 15 minutes to 12 hours — a leaked token from a Lambda log becomes useless before most incident response processes even start.
AWS IAM Identity Center (the renamed SSO service) connects to your corporate directory (Okta, Google Workspace, Entra ID) and issues temporary credentials on demand. Developers run aws sso login, get a token valid for a few hours, and when their machine is stolen the attacker gets credentials that expire before lunch.
Rule: Roles for all machine identities. IAM Identity Center for all human identities. Access Keys only for legacy systems that genuinely cannot support role assumption, with 90-day rotation enforced by AWS Config.
🎯 Key Takeaway
Roles beat Users for every machine identity — STS tokens auto-expire, Access Keys do not. Groups are management containers only — they have no credentials and cannot be assumed by any service. The 2026 standard: IAM Identity Center for humans, Roles for everything else, OIDC federation for CI/CD. Permanent Access Keys for automation are a security anti-pattern that no new architecture should introduce.
Choosing the Right IAM Identity
IfA human developer needs AWS Console and CLI access
UseUse IAM Identity Center (SSO) linked to your corporate directory — no permanent Access Keys, temporary credentials via role assumption that expire in 1–12 hours
IfAn EC2 instance or Lambda function needs AWS API access
UseUse an IAM Role — attach via Instance Profile for EC2, or configure directly as the Lambda execution role. The metadata service handles credential rotation automatically.
IfA CI/CD pipeline (GitHub Actions, GitLab CI, CircleCI) needs deployment access
UseUse OIDC-based role assumption — the pipeline authenticates with its identity provider token and assumes an IAM role. No stored credentials anywhere, no rotation required.
IfA legacy third-party tool that only supports Access Keys with no role support
UseCreate an IAM User with the minimum required permissions, enforce 90-day key rotation with an AWS Config rule, and monitor usage via CloudTrail alerts. This is the last resort, not the default.

IAM Policies Deep-Dive — Trust Policies vs Permission Policies

IAM policies are JSON documents that define permissions, but there are two completely different types of policies in play on every Role, and confusing them is the source of most AccessDenied errors and most of the time spent debugging them.

The Permission Policy answers the question: what actions can this identity perform? It is attached to a User or Role and defines the allowed or denied API calls, the resources those calls can target, and the conditions under which the permission applies. This is what most people think of when they hear 'IAM policy.'

The Trust Policy answers a completely different question: who is allowed to assume this role? It lives exclusively in the AssumeRolePolicyDocument on the Role resource itself and is evaluated before the permission policy is even consulted. If your Trust Policy does not list lambda.amazonaws.com as a trusted principal, no Lambda function can assume that role — it does not matter how many permissions the role has. The role is simply unavailable to that service.

The evaluation order when an API call hits AWS: first, can this principal assume the role at all? (Trust Policy). Second, is there an explicit Deny anywhere in the policy stack? (any SCP, Permission Boundary, or policy with Deny wins immediately). Third, is there at least one Allow for this specific action on this specific resource? (Permission Policy, Resource Policy). If the answer to the second question is yes, the request is denied regardless of any Allows. If the answer to the first or third question is no, the request is also denied. Default deny means that every request starts at denied and must earn its way to allowed through explicit grants.

Understanding this evaluation chain is what makes you fast at debugging access issues. Most engineers start at the permission policy and work outward. The right approach is to start at the Trust Policy and work forward through the chain.

io/thecodeforge/iam/production_deployer_policy.json · JSON
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "AllowECRImagePush",
      "Effect": "Allow",
      "Action": [
        "ecr:BatchCheckLayerAvailability",
        "ecr:PutImage",
        "ecr:InitiateLayerUpload",
        "ecr:UploadLayerPart",
        "ecr:CompleteLayerUpload"
      ],
      "Resource": [
        "arn:aws:ecr:eu-west-1:123456789012:repository/thecodeforge-app"
      ],
      "Condition": {
        "StringEquals": {
          "aws:RequestedRegion": "eu-west-1"
        }
      }
    },
    {
      "Sid": "AllowECRAuth",
      "Effect": "Allow",
      "Action": [
        "ecr:GetAuthorizationToken"
      ],
      "Resource": "*",
      "Condition": {
        "StringEquals": {
          "aws:RequestedRegion": "eu-west-1"
        }
      }
    },
    {
      "Sid": "AllowLambdaDeploymentOnly",
      "Effect": "Allow",
      "Action": [
        "lambda:UpdateFunctionCode",
        "lambda:UpdateFunctionConfiguration",
        "lambda:PublishVersion"
      ],
      "Resource": [
        "arn:aws:lambda:eu-west-1:123456789012:function:thecodeforge-*"
      ]
    },
    {
      "Sid": "DenyDestructiveActionsAlways",
      "Effect": "Deny",
      "Action": [
        "lambda:DeleteFunction",
        "ecr:DeleteRepository",
        "ecr:DeleteRepositoryPolicy"
      ],
      "Resource": "*"
    },
    {
      "Sid": "DenyAllOutsideApprovedRegions",
      "Effect": "Deny",
      "Action": "*",
      "Resource": "*",
      "Condition": {
        "StringNotEquals": {
          "aws:RequestedRegion": ["eu-west-1", "us-east-1"]
        }
      }
    }
  ]
}
▶ Output
# Attach this as a managed policy to your CI/CD pipeline's assumed role.
# Validate before attaching:
# aws iam simulate-principal-policy \
# --policy-source-arn arn:aws:iam::123456789012:role/CICDDeployerRole \
# --action-names ecr:PutImage \
# --resource-arns arn:aws:ecr:eu-west-1:123456789012:repository/thecodeforge-app
#
# Expected: {"EvalDecision": "allowed"}
#
# Test the deny guard too:
# aws iam simulate-principal-policy \
# --policy-source-arn arn:aws:iam::123456789012:role/CICDDeployerRole \
# --action-names lambda:DeleteFunction \
# --resource-arns arn:aws:lambda:eu-west-1:123456789012:function:thecodeforge-api
#
# Expected: {"EvalDecision": "explicitDeny"}
Mental Model
The Five Gates Every IAM Request Must Pass
AccessDenied means at least one gate said no. Default Deny means all gates must say yes.
  • Gate 1: Trust Policy — can this principal assume the role at all? If no, the request never reaches the other gates.
  • Gate 2: SCPs — does the AWS Organization allow this action in this account? An SCP Deny is absolute and cannot be overridden.
  • Gate 3: Permission Boundary — does this action fall within the role's maximum ceiling? A Boundary Deny overrides any Allow in the Permission Policy.
  • Gate 4: Permission Policy — does the role's identity-based policy explicitly allow this action on this resource?
  • Gate 5: Resource Policy — does the target resource (S3 bucket, KMS key, SQS queue) allow access from this role? Required for cross-account access.
📊 Production Insight
In most debugging sessions, the Trust Policy is the problem — the service or principal is not listed, or the Principal is correct but the account or ARN has a typo. Check the Trust Policy before anything else.
The IAM Policy Simulator (aws iam simulate-principal-policy) is the single most underused tool in IAM debugging. It tells you exactly which statement caused a deny without requiring you to reproduce the actual API call.
Condition keys in policies are powerful but fragile. aws:RequestedRegion, aws:PrincipalOrgID, and aws:SourceVpc are essential for defence-in-depth, but a typo in a condition key name silently fails open — AWS ignores unknown condition keys rather than blocking the request.
Rule: always simulate policy changes with the IAM simulator before deploying them. A policy that silently fails open is as dangerous as one that is too broad.
🎯 Key Takeaway
Trust Policy controls WHO can assume a role. Permission Policy controls WHAT the role can do. Both must be correct — missing either one produces AccessDenied. Explicit Deny always wins over Allow at every level of the policy stack. Default Deny means silence equals blocked, never allowed. Check the Trust Policy before the Permission Policy when debugging.
Debugging AccessDenied Errors
IfAccessDenied with no additional context
UseCheck CloudTrail for the exact event, principal ARN, and error. Run aws iam simulate-principal-policy to reproduce in a controlled environment and see which statement caused the deny.
IfPermission policy clearly allows the action but AccessDenied persists
UseCheck the Trust Policy first — the principal may not be listed. Then check SCPs at the organization level and any Permission Boundaries on the role. Permission policy is evaluated last, not first.
IfCross-account access fails despite both sides appearing correctly configured
UseBoth the Trust Policy on the target role AND the resource-based policy on the target resource must explicitly allow access. Check both independently. A missing bucket policy or KMS key policy is the most common cause.
IfAccessDenied only for destructive actions like Delete or Put
UseCheck for an explicit Deny in an SCP or Permission Boundary — organizations often deny destructive actions globally as a guardrail. The deny in an SCP overrides any Allow in any other policy and cannot be worked around at the account level.

Least Privilege in Practice — Building a Real-World IAM Strategy

Least privilege is the security practice of granting the absolute minimum permissions required for a task to complete — nothing more. In fast-moving teams, AdministratorAccess is tempting because it eliminates permission debugging entirely. It also means that any bug in your code, any compromised dependency, or any leaked credential is immediately a full-account incident.

The practical path to least privilege is not guessing permissions upfront — it is observing what your application actually calls and generating a policy from that data. IAM Access Analyzer's policy generation feature does exactly this. You run your application in a staging environment with a broad logging policy, CloudTrail captures every API call the application makes, and Access Analyzer generates a tight JSON policy containing only those exact calls. The process takes 10 minutes and produces a policy you can trust to be accurate because it came from real usage data, not someone's best guess.

For human access, the standard in 2026 is short-lived sessions via AWS IAM Identity Center. Developers authenticate through their corporate identity provider (Okta, Google Workspace, Entra ID), receive temporary credentials valid for 1 to 12 hours, and use those to make API calls. A stolen laptop means credentials that expire before the attacker can do meaningful damage. There are no permanent Access Keys in ~/.aws/credentials to exfiltrate.

For CI/CD pipelines, the modern standard is OIDC federation. GitHub Actions, GitLab CI, and most modern CI platforms can authenticate as themselves using an OIDC token and assume an IAM role that trusts their identity. No stored credentials, no rotation, no credential scanning needed for a secret that does not exist.

The three-layer defence for least privilege at scale: Access Analyzer generates tight policies per role. SCPs at the AWS Organizations level set hard limits that no account-level policy can override — denying iam:CreateUser in non-security accounts, restricting compute to approved regions, preventing S3 public access changes. AWS Config rules provide continuous monitoring and alert within minutes when a role receives broader permissions than its baseline, catching the exact mistake from the production incident above.

io/thecodeforge/security/generate_least_privilege_policy.sh · BASH
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081
#!/usr/bin/env bash
# io.thecodeforge: Automated Least-Privilege Policy Generation
# Prerequisites:
#   - CloudTrail enabled in the target region
#   - An AccessAnalyzer ServiceRole with CloudTrail read permissions
#   - The target role has been running in staging for at least 7 days

set -euo pipefail

ROLE_NAME="StagingAppRole"
ACCOUNT_ID="123456789012"
REGION="eu-west-1"
CLOUDTRAIL_ARN="arn:aws:cloudtrail:${REGION}:${ACCOUNT_ID}:trail/main-trail"
ANALYZER_ROLE="arn:aws:iam::${ACCOUNT_ID}:role/AccessAnalyzerServiceRole"
ROLE_ARN="arn:aws:iam::${ACCOUNT_ID}:role/${ROLE_NAME}"

echo "==> Starting policy generation for ${ROLE_NAME}"
echo "    Based on CloudTrail activity in ${REGION} for the last 90 days"
echo ""

# Step 1: Start the policy generation job
# Access Analyzer reads CloudTrail logs and identifies every API call made by the role
JOB_ID=$(aws accessanalyzer start-policy-generation \
  --region "${REGION}" \
  --policy-generation-details "principalArn=${ROLE_ARN}" \
  --cloud-trail-details "{\
    \"accessRole\": \"${ANALYZER_ROLE}\",\
    \"trails\": [{\
      \"cloudTrailArn\": \"${CLOUDTRAIL_ARN}\",\
      \"regions\": [\"${REGION}\"]\
    }],\
    \"startTime\": \"$(date -d '90 days ago' --iso-8601=seconds)\",\
    \"endTime\": \"$(date --iso-8601=seconds)\"\
  }" \
  --query 'jobId' \
  --output text)

echo "==> Job ID: ${JOB_ID} — waiting for completion..."

# Step 2: Poll until the job completes (typically 15 minutes)
while true; do
  STATUS=$(aws accessanalyzer get-generated-policy \
    --region "${REGION}" \
    --job-id "${JOB_ID}" \
    --query 'jobDetails.status' \
    --output text)

  if [[ "${STATUS}" == "SUCCEEDED" ]]; then
    echo "==> Job completed successfully"
    break
  elif [[ "${STATUS}" == "FAILED" ]]; then
    echo "ERROR: Policy generation failed. Check CloudTrail permissions on ${ANALYZER_ROLE}"
    exit 1
  fi

  echo "    Status: ${STATUS} — checking again in 20s..."
  sleep 20
done

# Step 3: Extract the generated policy document
OUTPUT_FILE="least_privilege_${ROLE_NAME}_$(date +%Y%m%d).json"
aws accessanalyzer get-generated-policy \
  --region "${REGION}" \
  --job-id "${JOB_ID}" \
  --query 'generatedPolicyResult.generatedPolicies[0].policy' \
  --output text > "${OUTPUT_FILE}"

echo ""
echo "==> Generated policy written to: ${OUTPUT_FILE}"
echo "    Review the policy before applying. Key things to verify:"
echo "    1. No wildcard (*) resources — every action should target a specific ARN"
echo "    2. No unexpected actions — if you see iam:* or s3:* in the output, investigate"
echo "    3. Condition keys — add aws:RequestedRegion conditions for all write actions"
echo ""
echo "==> To apply to the production role:"
echo "    aws iam put-role-policy \\"
echo "      --role-name ProductionAppRole \\"
echo "      --policy-name GeneratedLeastPrivilege \\"
echo "      --policy-document file://${OUTPUT_FILE}"
echo ""
cat "${OUTPUT_FILE}" | jq .
▶ Output
==> Starting policy generation for StagingAppRole
Based on CloudTrail activity in eu-west-1 for the last 90 days

==> Job ID: job_abc123def456 — waiting for completion...
Status: IN_PROGRESS — checking again in 20s...
Status: IN_PROGRESS — checking again in 20s...
==> Job completed successfully

==> Generated policy written to: least_privilege_StagingAppRole_20260420.json

# The output contains only the API calls the application actually made:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:PutObject"
],
"Resource": [
"arn:aws:s3:::thecodeforge-app-data/uploads/*"
]
},
{
"Effect": "Allow",
"Action": [
"dynamodb:PutItem",
"dynamodb:GetItem",
"dynamodb:Query"
],
"Resource": [
"arn:aws:dynamodb:eu-west-1:123456789012:table/ForgeOrders"
]
},
{
"Effect": "Allow",
"Action": [
"kms:Decrypt",
"kms:GenerateDataKey"
],
"Resource": [
"arn:aws:kms:eu-west-1:123456789012:key/mrk-abc123"
]
}
]
}
💡The Three-Layer Defence for Least Privilege at Scale
Layer 1 — IAM Access Analyzer: generates tight, data-driven policies per role from actual CloudTrail usage. Eliminates guesswork entirely. Layer 2 — AWS Organizations SCPs: set hard organisational limits that no account-level policy can override. Deny iam:CreateUser in application accounts. Restrict compute to approved regions. Prevent public S3 bucket creation globally. Layer 3 — AWS Config rules: continuously monitor and alert when any role receives permissions above its approved baseline. iam-policy-no-statements-with-admin-access catches the exact pattern from the production incident above within minutes of the change being made. This is a continuous process, not a one-time audit.
📊 Production Insight
AdministratorAccess on any workload role is a single point of total account compromise — it converts every vulnerability in that workload from an application incident into an infrastructure incident.
IAM Access Analyzer policy generation is based on actual API calls from CloudTrail, not guesses. The output is specific to your application's real behaviour, not a generic template. It takes 10 minutes and produces a policy accurate enough to ship to production after review.
For GitHub Actions OIDC setup, the trust policy condition should validate both the repository and the branch: StringLike on token.actions.githubusercontent.com:sub matching repo:your-org/your-repo:ref:refs/heads/main. Without the branch condition, any branch in the repo can assume the deployment role.
Rule: deploy broad in staging with CloudTrail enabled, let it run for at least 7 days to capture all code paths, use Access Analyzer to generate the tight policy, apply in production, and set up a Config rule to alert if permissions ever broaden again.
🎯 Key Takeaway
Least privilege is not a one-time task — it is a continuous process of generation, validation, and monitoring. Access Analyzer turns CloudTrail data into tight, accurate policies in 10 minutes. Three-layer defence: Access Analyzer per role, SCPs per organisation, Config rules for continuous alerting. The combination of all three makes AdministratorAccess on workload roles a Config alarm within minutes rather than a problem discovered during a breach investigation.
Least Privilege Implementation Strategy
IfNew service being deployed for the first time with unknown permission requirements
UseDeploy in staging with a scoped policy covering the expected actions plus a broad read policy for the relevant service. Run real traffic. Use Access Analyzer to generate the exact minimum policy from 7+ days of CloudTrail data.
IfExisting service with overly broad permissions that needs tightening
UseEnable CloudTrail if not already enabled, let the service run for 7 to 14 days under normal load to capture all code paths, then use Access Analyzer to generate a policy from actual usage. Apply in staging first, validate, then promote to production.
IfOrganisation-wide enforcement needed to prevent future overpermissioning
UseImplement SCPs at the AWS Organizations OU level: deny iam:CreateUser in application accounts, restrict ec2:RunInstances to approved instance types and regions, deny s3:PutBucketPublicAccessBlock to prevent public bucket creation.
IfContinuous compliance monitoring across many accounts and roles
UseDeploy AWS Config rules: iam-policy-no-statements-with-admin-access flags any role with admin permissions, iam-user-no-policies-check flags direct user policy attachments, iam-root-access-key-check confirms no root access keys exist.
🗂 IAM Role vs IAM User with Access Key
Understanding when to use each identity type and why the gap between them matters
AspectIAM RoleIAM User with Access Key
Credential lifetimeTemporary — 15 minutes to 12 hours, auto-rotated by STS on every assumptionPermanent — never expires until manually rotated or the key is explicitly deleted
Best used forEC2 instances, Lambda functions, ECS tasks, CI/CD pipelines via OIDC, cross-account accessLegacy scripts or third-party tools that have no support for role assumption or OIDC federation
Rotation required?No — STS handles credential rotation automatically on every assumption; you cannot create a stale role credentialYes — manual rotation is mandatory, AWS recommends every 90 days, and it is frequently skipped or forgotten
Leaked credential impactLow — the token expires within hours and cannot be renewed without re-assuming the role through the trust policyHigh — the Access Key works indefinitely until someone notices and manually revokes it; discovery often takes days or weeks
Multi-account supportYes — role chaining across accounts is native; a role in Account A can assume a role in Account B with appropriate trust policiesNo — Access Keys are scoped to the account where the user was created
MFA enforcementCan require MFA to assume the role via aws:MultiFactorAuthPresent condition key in the Trust PolicyMFA can be required for console access; programmatic Access Key usage bypasses MFA entirely by design
AWS recommended for workloads?Yes — the AWS security documentation explicitly recommends Roles for all workload identities; it is the designed patternNo — AWS explicitly recommends against Access Keys for workloads and has been deprecating their use in official guidance since 2022

🎯 Key Takeaways

  • Roles beat Users for every machine identity — STS tokens auto-expire in hours, Access Keys do not expire until someone manually revokes them. A leaked temporary credential becomes useless. A leaked Access Key works until someone notices, which is often measured in days.
  • There are two completely separate policies on every IAM Role: the Trust Policy controlling who can assume the role, and the Permission Policy controlling what the role can do. Both must be correct. The Trust Policy is evaluated first — if it fails, the Permission Policy is never consulted.
  • An explicit Deny in any policy always wins over an Allow, regardless of what any other policy says. AWS IAM is default-deny: silence equals blocked, and every request must earn its way to allowed through explicit grants.
  • IAM Access Analyzer's policy generation feature is the practical path to least privilege — run your application in staging with CloudTrail enabled, generate a policy from actual API calls, apply in production. It converts least privilege from a guessing exercise into a data-driven process.

⚠ Common Mistakes to Avoid

    Using AdministratorAccess on a Lambda or EC2 execution role to avoid debugging permission errors
    Symptom

    The application works correctly, so nobody notices. But if the function is compromised through a code vulnerability or a dependency with a CVE, the attacker has full account access — every IAM action, every service, every region. A single npm package vulnerability becomes a company-wide incident.

    Fix

    Use IAM Access Analyzer to generate the exact minimum policy from 7+ days of staging CloudTrail data. It takes 10 minutes. The resulting policy contains only the specific actions and resources the application actually used. Apply that policy to production and set up a Config rule to alert if permissions ever broaden beyond the approved baseline.

    Forgetting the InstanceProfile wrapper when attaching an IAM Role to EC2 in CloudFormation
    Symptom

    You create a perfect IAM Role in CloudFormation, deploy successfully, and EC2 throws 'Unable to locate credentials' at runtime. The role exists, the permissions are correct, but the instance cannot access credentials because no Instance Profile was created.

    Fix

    Always create a separate AWS::IAM::InstanceProfile resource and attach the Role to it. Then reference the InstanceProfile ARN (not the Role ARN) in your EC2 LaunchTemplate or Instance resource. The EC2 console hides this by creating the Instance Profile automatically — CloudFormation does not.

    Writing a Trust Policy with Principal set to asterisk with no conditions
    Symptom

    Any AWS principal in any account anywhere can assume your role. A security audit flags it as a critical finding. In a worst case, an attacker who discovers the role ARN can assume it from their own AWS account without any stolen credentials.

    Fix

    Always scope the Principal to a specific AWS service endpoint (lambda.amazonaws.com), a specific account ID, or a specific ARN. For cross-account access to third parties, add a Condition requiring aws:PrincipalOrgID to match your AWS Organization ID — this ensures only principals within your organisation can assume the role even if the account ID changes.

    Storing Access Keys in ~/.aws/credentials for automation instead of using role assumption or OIDC
    Symptom

    A developer's laptop is stolen. The Access Key in credentials works indefinitely until the security team notices and manually revokes it — typically hours to days later. API access bypasses MFA entirely, so there is no second factor protecting the key.

    Fix

    Implement AWS IAM Identity Center linked to your corporate identity provider. Developers authenticate with their corporate credentials and receive temporary credentials expiring in 1 to 12 hours. For CI/CD, configure OIDC federation so pipelines assume a role using their identity token. No stored credentials exist to steal, rotate, or accidentally commit to a repository.

    Debugging AccessDenied without checking SCPs at the AWS Organizations level
    Symptom

    The permission policy allows the action, the Trust Policy is correct, no Permission Boundary is attached, but AccessDenied persists. The developer spends hours modifying the permission policy, running simulations, and re-reading the documentation before someone mentions the account is in an AWS Organization.

    Fix

    Always check SCPs early in the debugging process: aws organizations list-policies-for-target --target-id <account-id> --filter SERVICE_CONTROL_POLICY. Inspect each SCP for explicit Deny statements matching your failing action. An SCP Deny overrides every other policy type and cannot be overridden by any policy within the account.

Interview Questions on This Topic

  • QWhat is the difference between a Trust Policy and a Permission Policy on an IAM Role? What happens if the Trust Policy is missing the service that needs to use the role?Mid-levelReveal
    A Permission Policy defines what actions a role can perform — it is attached to the role and specifies allowed or denied API calls, resource targets, and conditions. A Trust Policy defines who is allowed to assume the role — it lives in the AssumeRolePolicyDocument on the role itself and is evaluated before the permission policy is ever consulted. If the Trust Policy does not list a service (for example, lambda.amazonaws.com), that service cannot assume the role regardless of what permissions the role has. The role is effectively invisible to that service. This is the most common cause of 'my policy allows it but I still get AccessDenied' — the Permission Policy is not the problem, the Trust Policy gate never opened.
  • QA Lambda function in Account A needs to read from DynamoDB in Account A and write to an S3 bucket in Account B. Explain the complete configuration required for this to work securely.SeniorReveal
    Step 1: The Lambda in Account A has an execution role (Role A) with permission for DynamoDB reads in Account A, plus sts:AssumeRole permission targeting Role B's ARN in Account B. Step 2: Account B has a role (Role B) whose Trust Policy explicitly allows Role A's full ARN — not the account ID, the specific role ARN — to assume it. Step 3: Role B's Permission Policy allows s3:PutObject on the specific target bucket ARN, scoped to the specific prefix if possible. Step 4: The S3 bucket policy in Account B must also explicitly allow access from Role B's ARN — for cross-account access, the resource policy and the identity policy must both grant access. Step 5: The Lambda code calls sts:AssumeRole at runtime to get temporary credentials for Role B, then uses those credentials to write to S3. Credential caching is important here — assume the role once per Lambda execution context, not per API call.
  • QExplain the Confused Deputy problem in AWS IAM. How does the ExternalId condition in a Trust Policy prevent it?SeniorReveal
    The Confused Deputy problem occurs when a trusted third party (a monitoring tool, a security scanner, a SaaS integration) is given access to your AWS account via a cross-account role. The problem: if the Trust Policy only checks the third party's AWS account ID, any customer of that third party could potentially trick the service into performing actions on your account by providing your role ARN as the target. The third party is the confused deputy — it has legitimate access, but it cannot distinguish between requests that are legitimately about your account and requests fabricated by an attacker claiming to target your account. ExternalId adds a shared secret to the Trust Policy's Condition block — a value that only you and the third party know, agreed upon out-of-band during integration setup. The third party must include this ExternalId when assuming your role. If an attacker tries to manipulate the third party into assuming your role, they do not know the ExternalId, the condition fails, and the assumption is denied. AWS recommends that all cross-account integrations with third parties use ExternalId.
  • QDescribe the IAM policy evaluation order when Identity-based, Resource-based, SCPs, and Permission Boundaries all apply to a single API call. What happens if the Identity-based policy allows s3:* but an SCP denies s3:DeleteBucket?Mid-levelReveal
    AWS evaluates policies in a specific order with Explicit Deny as the absolute override at every level. The process: (1) Start with a default deny — every request begins as denied. (2) Evaluate all SCPs from the AWS Organisation — if any SCP contains an explicit Deny for the action, the request is denied immediately and evaluation stops. No other policy can override an SCP deny. (3) Evaluate the Permission Boundary if one is attached — if the boundary does not include the action, the request is denied even if the permission policy allows it. (4) Evaluate Identity-based policies — if an explicit Allow is found and no Deny exists, the action can proceed. (5) For cross-account or resource-specific access, evaluate Resource-based policies. In your example: the Identity-based policy allows s3:*, but the SCP contains an explicit Deny for s3:DeleteBucket. The SCP deny wins absolutely — the bucket cannot be deleted regardless of what any other policy says. This is the intended behaviour: SCPs are organisation-level guardrails that individual accounts cannot override.
  • QA developer's Access Key was committed to a public GitHub repository. Walk through the immediate remediation steps and the architectural change to prevent recurrence.Mid-levelReveal
    Immediate steps, in order: (1) Disable the compromised key immediately — aws iam update-access-key --access-key-id <key-id> --status Inactive. Do not delete yet — you may need it for forensics. (2) Search CloudTrail for all API calls made with that Access Key ID starting from when the commit first appeared publicly. Filter CloudTrail events by the AccessKeyId field. (3) Assess damage: were any resources created, data accessed, or permissions modified? (4) If unauthorized resources were created, identify and terminate them. If IAM changes were made, revert them and check for backdoor users or roles. (5) Delete the key after forensics are complete. (6) Rotate all other credentials for that user as a precaution. Architectural change to prevent recurrence: implement AWS IAM Identity Center so developers authenticate through your corporate identity provider and receive temporary credentials expiring in 1 to 12 hours. No permanent Access Keys exist to leak. Add git-secrets or truffleHog as a pre-commit hook to scan for AWS credential patterns before they reach the repository. Configure an AWS Config rule (access-keys-rotated) to alert on keys older than 90 days for any remaining legacy Users that genuinely require Access Keys.

Frequently Asked Questions

What is the difference between an IAM Role and an IAM User in AWS?

An IAM User has permanent, long-term credentials — a password for the console and an Access Key and Secret Key for programmatic access. Those credentials do not expire until manually revoked. An IAM Role has no permanent credentials — it issues temporary tokens via AWS STS when assumed, valid for 15 minutes to 12 hours. Use Roles for all application workloads (Lambda, EC2, ECS), CI/CD pipelines, and cross-account access. Reserve Users only for humans who genuinely cannot use AWS IAM Identity Center, or legacy systems that have no support for role assumption — and in those cases, enforce mandatory 90-day key rotation via an AWS Config rule.

What does 'least privilege' mean in AWS IAM and how do you actually achieve it in practice?

Least privilege means each identity has exactly the permissions it needs to do its specific job — no broader. In practice, you do not guess the right permissions upfront. Instead, deploy your application in staging with CloudTrail enabled and a reasonably scoped policy, let it run for 7 or more days to exercise all code paths under realistic conditions, then use IAM Access Analyzer's policy generation feature to produce a policy based only on the API calls your application actually made. Apply that policy in production. Revisit it whenever your application's feature set changes significantly, and monitor for policy drift using AWS Config rules.

Why does my EC2 instance say 'Unable to locate credentials' even though I created an IAM Role for it?

EC2 cannot use an IAM Role directly — it needs an Instance Profile, which is a separate wrapper resource that contains the role. In the AWS Console, this is handled silently when you attach a role to an instance through the UI. In CloudFormation, you must create an explicit AWS::IAM::InstanceProfile resource, reference the Role inside it, and then attach the Instance Profile ARN (not the Role ARN) to the EC2 Instance or LaunchTemplate. Once correctly attached, the EC2 metadata service at 169.254.169.254 serves automatically rotating temporary credentials to any code running on the instance.

What is the difference between an Inline Policy and a Managed Policy in IAM?

A Managed Policy is a standalone IAM resource that can be attached to multiple users, groups, or roles. It has its own ARN, can be versioned, and persists independently of the identities it is attached to. AWS provides a library of AWS-managed policies (ReadOnlyAccess, PowerUserAccess) and you can create customer-managed policies. An Inline Policy is embedded directly within a single identity — it cannot be shared, has no independent ARN, and is deleted when the identity is deleted. Use Managed Policies for nearly everything: they are visible in the IAM console, can be referenced across stacks, and are easier to audit. Reserve Inline Policies for permissions that are strictly specific to one role and should not be accidentally reused.

🔥
Naren Founder & Author

Developer and founder of TheCodeForge. I built this site because I was tired of tutorials that explain what to type without explaining why it works. Every article here is written to make concepts actually click.

← PreviousAWS VPC and NetworkingNext →AWS CloudFront and Route 53
Forged with 🔥 at TheCodeForge.io — Where Developers Are Forged