Mid-level 13 min · March 06, 2026

AWS IAM Roles — Why AdministratorAccess Cost $47K

Lambda with AdministratorAccess: full account takeover in 4 minutes.

N
Naren · Founder
Plain-English first. Then code. Then the interview question.
About
 ● Production Incident 🔎 Debug Guide
Quick Answer
  • IAM controls who can do what in your AWS account — Users (humans), Roles (machines), Groups (management containers)
  • Every Role has two separate policies: Trust Policy (who can assume it) and Permission Policy (what it can do) — both must be correct or you get AccessDenied
  • AWS IAM is default-deny — if a permission is not explicitly granted, it is blocked. Explicit Deny always wins over Allow, regardless of what any other policy says.
  • Roles issue short-lived STS tokens (15min–12hr) that auto-rotate — leaked credentials expire in hours, not forever
  • Use IAM Access Analyzer to generate least-privilege policies from actual CloudTrail data — deploy broad in staging, tighten in production
  • Biggest mistake: using AdministratorAccess on Lambda or EC2 roles 'just to make it work' — a single compromised function gets full account access in seconds
Plain-English First

Imagine your AWS account is a giant office building. IAM is the security desk at the front door — it decides who gets a key card, which floors they can visit, and whether they can open the filing cabinets once they get there. A contractor (an EC2 instance) might get a temporary badge that expires at 5pm, while a full-time employee (a developer) gets access to their own floor but cannot wander into the CEO's office. The key insight most people miss: there are two separate questions the security desk asks. First, are you allowed through the front door at all? That is the Trust Policy. Second, once you are inside, which rooms can you enter? That is the Permission Policy. Both gates must open or you are not getting in.

Every significant AWS breach traces back to one root cause: IAM was misconfigured or ignored. Exposed S3 buckets, compromised Lambda functions, leaked credentials on GitHub — all IAM failures at their core. IAM is not a niche security topic you deal with after everything else is working; it is the foundation every other AWS service is built on, and getting it wrong is how companies make the news.

IAM introduced fine-grained, programmable permissions to replace the blunt root-account-or-nothing model. You can now specify that 'this Lambda function may read from exactly one S3 bucket and nothing else.' That specificity is the difference between a contained incident and a company-ending breach. The controls exist. The question is whether your team uses them.

The three concepts to master: Users (human identities with permanent credentials), Roles (machine identities with temporary STS tokens that auto-expire), and Policies (JSON documents defining what is allowed or denied). Getting the relationship between these three right — and specifically understanding that Trust Policies and Permission Policies are completely separate documents with completely different jobs — is what separates a secure architecture from a ticking time bomb.

In 2026, with AWS Identity Center replacing the old SSO console and OIDC-based authentication now the standard for CI/CD pipelines, the era of long-lived Access Keys for automation should be over. The patterns in this guide reflect where the security bar actually sits today, not where it was in 2019.

Users, Groups, and Roles — Picking the Right Identity Tool

The most common IAM confusion comes from mixing up three distinct concepts that look superficially similar but serve completely different architectural purposes.

IAM User represents a human or legacy script requiring long-term credentials — a password for the console and an Access Key and Secret Key pair for programmatic access. The modern rule is stark: machines should almost never be IAM Users. A User's Access Key does not expire. If it leaks, it works until someone manually revokes it.

IAM Group is a management container for Users. Instead of attaching policies to ten developers individually, you attach them once to a BackendEngineers group. Groups have no credentials of their own and cannot be assumed by services — they are purely an administrative convenience, not a security boundary.

IAM Role is the most important tool in your toolkit. A Role has no long-term credentials whatsoever. Instead, it is assumed temporarily by a principal — an EC2 instance, a Lambda function, a CI/CD pipeline, or a cross-account service. When something assumes a Role, AWS STS issues a set of temporary credentials (access key, secret key, and session token) that expire automatically between 15 minutes and 12 hours. That auto-expiry is your primary defence against credential leaks: a leaked temporary credential becomes useless within hours, not years.

In 2026 the modern standard is unambiguous: AWS IAM Identity Center (formerly SSO) for all human access, Roles for all machine and service identities, and OIDC-based federation for CI/CD pipelines. Permanent Access Keys on developer machines should be a red flag in any security review, not a routine practice.

io/thecodeforge/iam/ec2_s3_read_role.yamlYAML
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
# io.thecodeforge: Production-grade IAM Role for EC2 with S3 read access
# Deploy: aws cloudformation deploy \
#   --template-file ec2_s3_read_role.yaml \
#   --stack-name forge-iam-stack \
#   --capabilities CAPABILITY_NAMED_IAM

AWSTemplateFormatVersion: '2010-09-09'
Description: Minimalist IAM Role for EC2 with scoped S3 Read Access

Parameters:
  TargetBucketName:
    Type: String
    Default: thecodeforge-app-data
    Description: The exact S3 bucket this instance is allowed to read from

Resources:

  # 1. THE ROLE — defines who can wear the badge (Trust Policy)
  # The Principal block answers: which AWS service can assume this role?
  # Missing 'ec2.amazonaws.com' here = "Unable to locate credentials" at runtime
  AppServerInstanceRole:
    Type: AWS::IAM::Role
    Properties:
      RoleName: !Sub "${AWS::StackName}-EC2S3ReadOnly"
      Description: "Allows EC2 instances to read from the app data bucket only"
      AssumeRolePolicyDocument:
        Version: '2012-10-17'
        Statement:
          - Effect: Allow
            Principal:
              Service: ec2.amazonaws.com   # TRUST POLICY: EC2 service can assume this
            Action: sts:AssumeRole
      Tags:
        - Key: ManagedBy
          Value: CloudFormation
        - Key: Purpose
          Value: AppServerS3Access

  # 2. THE POLICY — defines what the role can do (Permission Policy)
  # Scoped to a single bucket and only the two actions actually needed.
  # s3:ListBucket requires the bucket ARN; s3:GetObject requires the object ARN.
  # These are different ARN formats — a common gotcha.
  AppServerS3ReadPolicy:
    Type: AWS::IAM::Policy
    Properties:
      PolicyName: !Sub "${AWS::StackName}-S3ScopedReadAccess"
      PolicyDocument:
        Version: '2012-10-17'
        Statement:
          - Sid: AllowBucketList
            Effect: Allow
            Action:
              - s3:ListBucket
            Resource:
              - !Sub "arn:aws:s3:::${TargetBucketName}"
            Condition:
              StringLike:
                s3:prefix: "app-data/*"   # further scope to a prefix if possible

          - Sid: AllowObjectRead
            Effect: Allow
            Action:
              - s3:GetObject
            Resource:
              - !Sub "arn:aws:s3:::${TargetBucketName}/app-data/*"
      Roles:
        - !Ref AppServerInstanceRole

  # 3. THE INSTANCE PROFILE — the bridge between EC2 and the Role
  # EC2 cannot use an IAM Role directly. It needs this wrapper.
  # Reference this profile in your LaunchTemplateNOT the Role ARN.
  AppServerInstanceProfile:
    Type: AWS::IAM::InstanceProfile
    Properties:
      InstanceProfileName: !Sub "${AWS::StackName}-EC2Profile"
      Roles:
        - !Ref AppServerInstanceRole

Outputs:
  RoleArn:
    Description: Role ARN for cross-stack references
    Value: !GetAtt AppServerInstanceRole.Arn
    Export:
      Name: !Sub "${AWS::StackName}-RoleArn"
  InstanceProfileArn:
    Description: Instance Profile ARN to attach to EC2 LaunchTemplate
    Value: !GetAtt AppServerInstanceProfile.Arn
    Export:
      Name: !Sub "${AWS::StackName}-InstanceProfileArn"
Output
# Deployment output:
# Successfully created/updated stack - forge-iam-stack
#
# Stack Outputs:
# RoleArn: arn:aws:iam::123456789012:role/forge-iam-stack-EC2S3ReadOnly
# InstanceProfileArn: arn:aws:iam::123456789012:instance-profile/forge-iam-stack-EC2Profile
#
# The EC2 metadata service now serves rotating credentials automatically:
# curl http://169.254.169.254/latest/meta-data/iam/security-credentials/forge-iam-stack-EC2S3ReadOnly
Watch Out: EC2 Roles Require an Instance Profile Wrapper
You cannot attach an IAM Role directly to an EC2 instance — you must wrap it in an AWS::IAM::InstanceProfile resource first. This catches almost everyone writing their first CloudFormation template. The Role and the InstanceProfile are two separate resources that must both exist. The AWS Console hides this detail by creating the InstanceProfile automatically when you attach a role to an instance interactively — CloudFormation does not. Always reference the InstanceProfile ARN in your LaunchTemplate or Instance resource, never the Role ARN.
Production Insight
IAM Users with Access Keys are permanent credentials — they do not expire until manually rotated or deleted. A key sitting in a developer's ~/.aws/credentials file for two years is a two-year window of compromise if that machine is lost or that file is accidentally committed.
Roles issue STS tokens that auto-expire in 15 minutes to 12 hours — a leaked token from a Lambda log becomes useless before most incident response processes even start.
AWS IAM Identity Center (the renamed SSO service) connects to your corporate directory (Okta, Google Workspace, Entra ID) and issues temporary credentials on demand. Developers run aws sso login, get a token valid for a few hours, and when their machine is stolen the attacker gets credentials that expire before lunch.
Rule: Roles for all machine identities. IAM Identity Center for all human identities. Access Keys only for legacy systems that genuinely cannot support role assumption, with 90-day rotation enforced by AWS Config.
Key Takeaway
Roles beat Users for every machine identity — STS tokens auto-expire, Access Keys do not. Groups are management containers only — they have no credentials and cannot be assumed by any service. The 2026 standard: IAM Identity Center for humans, Roles for everything else, OIDC federation for CI/CD. Permanent Access Keys for automation are a security anti-pattern that no new architecture should introduce.
Choosing the Right IAM Identity
IfA human developer needs AWS Console and CLI access
UseUse IAM Identity Center (SSO) linked to your corporate directory — no permanent Access Keys, temporary credentials via role assumption that expire in 1–12 hours
IfAn EC2 instance or Lambda function needs AWS API access
UseUse an IAM Role — attach via Instance Profile for EC2, or configure directly as the Lambda execution role. The metadata service handles credential rotation automatically.
IfA CI/CD pipeline (GitHub Actions, GitLab CI, CircleCI) needs deployment access
UseUse OIDC-based role assumption — the pipeline authenticates with its identity provider token and assumes an IAM role. No stored credentials anywhere, no rotation required.
IfA legacy third-party tool that only supports Access Keys with no role support
UseCreate an IAM User with the minimum required permissions, enforce 90-day key rotation with an AWS Config rule, and monitor usage via CloudTrail alerts. This is the last resort, not the default.

IAM Policies Deep-Dive — Trust Policies vs Permission Policies

IAM policies are JSON documents that define permissions, but there are two completely different types of policies in play on every Role, and confusing them is the source of most AccessDenied errors and most of the time spent debugging them.

The Permission Policy answers the question: what actions can this identity perform? It is attached to a User or Role and defines the allowed or denied API calls, the resources those calls can target, and the conditions under which the permission applies. This is what most people think of when they hear 'IAM policy.'

The Trust Policy answers a completely different question: who is allowed to assume this role? It lives exclusively in the AssumeRolePolicyDocument on the Role resource itself and is evaluated before the permission policy is even consulted. If your Trust Policy does not list lambda.amazonaws.com as a trusted principal, no Lambda function can assume that role — it does not matter how many permissions the role has. The role is simply unavailable to that service.

The evaluation order when an API call hits AWS: first, can this principal assume the role at all? (Trust Policy). Second, is there an explicit Deny anywhere in the policy stack? (any SCP, Permission Boundary, or policy with Deny wins immediately). Third, is there at least one Allow for this specific action on this specific resource? (Permission Policy, Resource Policy). If the answer to the second question is yes, the request is denied regardless of any Allows. If the answer to the first or third question is no, the request is also denied. Default deny means that every request starts at denied and must earn its way to allowed through explicit grants.

Understanding this evaluation chain is what makes you fast at debugging access issues. Most engineers start at the permission policy and work outward. The right approach is to start at the Trust Policy and work forward through the chain.

io/thecodeforge/iam/production_deployer_policy.jsonJSON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "AllowECRImagePush",
      "Effect": "Allow",
      "Action": [
        "ecr:BatchCheckLayerAvailability",
        "ecr:PutImage",
        "ecr:InitiateLayerUpload",
        "ecr:UploadLayerPart",
        "ecr:CompleteLayerUpload"
      ],
      "Resource": [
        "arn:aws:ecr:eu-west-1:123456789012:repository/thecodeforge-app"
      ],
      "Condition": {
        "StringEquals": {
          "aws:RequestedRegion": "eu-west-1"
        }
      }
    },
    {
      "Sid": "AllowECRAuth",
      "Effect": "Allow",
      "Action": [
        "ecr:GetAuthorizationToken"
      ],
      "Resource": "*",
      "Condition": {
        "StringEquals": {
          "aws:RequestedRegion": "eu-west-1"
        }
      }
    },
    {
      "Sid": "AllowLambdaDeploymentOnly",
      "Effect": "Allow",
      "Action": [
        "lambda:UpdateFunctionCode",
        "lambda:UpdateFunctionConfiguration",
        "lambda:PublishVersion"
      ],
      "Resource": [
        "arn:aws:lambda:eu-west-1:123456789012:function:thecodeforge-*"
      ]
    },
    {
      "Sid": "DenyDestructiveActionsAlways",
      "Effect": "Deny",
      "Action": [
        "lambda:DeleteFunction",
        "ecr:DeleteRepository",
        "ecr:DeleteRepositoryPolicy"
      ],
      "Resource": "*"
    },
    {
      "Sid": "DenyAllOutsideApprovedRegions",
      "Effect": "Deny",
      "Action": "*",
      "Resource": "*",
      "Condition": {
        "StringNotEquals": {
          "aws:RequestedRegion": ["eu-west-1", "us-east-1"]
        }
      }
    }
  ]
}
Output
# Attach this as a managed policy to your CI/CD pipeline's assumed role.
# Validate before attaching:
# aws iam simulate-principal-policy \
# --policy-source-arn arn:aws:iam::123456789012:role/CICDDeployerRole \
# --action-names ecr:PutImage \
# --resource-arns arn:aws:ecr:eu-west-1:123456789012:repository/thecodeforge-app
#
# Expected: {"EvalDecision": "allowed"}
#
# Test the deny guard too:
# aws iam simulate-principal-policy \
# --policy-source-arn arn:aws:iam::123456789012:role/CICDDeployerRole \
# --action-names lambda:DeleteFunction \
# --resource-arns arn:aws:lambda:eu-west-1:123456789012:function:thecodeforge-api
#
# Expected: {"EvalDecision": "explicitDeny"}
The Five Gates Every IAM Request Must Pass
  • Gate 1: Trust Policy — can this principal assume the role at all? If no, the request never reaches the other gates.
  • Gate 2: SCPs — does the AWS Organization allow this action in this account? An SCP Deny is absolute and cannot be overridden.
  • Gate 3: Permission Boundary — does this action fall within the role's maximum ceiling? A Boundary Deny overrides any Allow in the Permission Policy.
  • Gate 4: Permission Policy — does the role's identity-based policy explicitly allow this action on this resource?
  • Gate 5: Resource Policy — does the target resource (S3 bucket, KMS key, SQS queue) allow access from this role? Required for cross-account access.
Production Insight
In most debugging sessions, the Trust Policy is the problem — the service or principal is not listed, or the Principal is correct but the account or ARN has a typo. Check the Trust Policy before anything else.
The IAM Policy Simulator (aws iam simulate-principal-policy) is the single most underused tool in IAM debugging. It tells you exactly which statement caused a deny without requiring you to reproduce the actual API call.
Condition keys in policies are powerful but fragile. aws:RequestedRegion, aws:PrincipalOrgID, and aws:SourceVpc are essential for defence-in-depth, but a typo in a condition key name silently fails open — AWS ignores unknown condition keys rather than blocking the request.
Rule: always simulate policy changes with the IAM simulator before deploying them. A policy that silently fails open is as dangerous as one that is too broad.
Key Takeaway
Trust Policy controls WHO can assume a role. Permission Policy controls WHAT the role can do. Both must be correct — missing either one produces AccessDenied. Explicit Deny always wins over Allow at every level of the policy stack. Default Deny means silence equals blocked, never allowed. Check the Trust Policy before the Permission Policy when debugging.
Debugging AccessDenied Errors
IfAccessDenied with no additional context
UseCheck CloudTrail for the exact event, principal ARN, and error. Run aws iam simulate-principal-policy to reproduce in a controlled environment and see which statement caused the deny.
IfPermission policy clearly allows the action but AccessDenied persists
UseCheck the Trust Policy first — the principal may not be listed. Then check SCPs at the organization level and any Permission Boundaries on the role. Permission policy is evaluated last, not first.
IfCross-account access fails despite both sides appearing correctly configured
UseBoth the Trust Policy on the target role AND the resource-based policy on the target resource must explicitly allow access. Check both independently. A missing bucket policy or KMS key policy is the most common cause.
IfAccessDenied only for destructive actions like Delete or Put
UseCheck for an explicit Deny in an SCP or Permission Boundary — organizations often deny destructive actions globally as a guardrail. The deny in an SCP overrides any Allow in any other policy and cannot be worked around at the account level.

Least Privilege in Practice — Building a Real-World IAM Strategy

Least privilege is the security practice of granting the absolute minimum permissions required for a task to complete — nothing more. In fast-moving teams, AdministratorAccess is tempting because it eliminates permission debugging entirely. It also means that any bug in your code, any compromised dependency, or any leaked credential is immediately a full-account incident.

The practical path to least privilege is not guessing permissions upfront — it is observing what your application actually calls and generating a policy from that data. IAM Access Analyzer's policy generation feature does exactly this. You run your application in a staging environment with a broad logging policy, CloudTrail captures every API call the application makes, and Access Analyzer generates a tight JSON policy containing only those exact calls. The process takes 10 minutes and produces a policy you can trust to be accurate because it came from real usage data, not someone's best guess.

For human access, the standard in 2026 is short-lived sessions via AWS IAM Identity Center. Developers authenticate through their corporate identity provider (Okta, Google Workspace, Entra ID), receive temporary credentials valid for 1 to 12 hours, and use those to make API calls. A stolen laptop means credentials that expire before the attacker can do meaningful damage. There are no permanent Access Keys in ~/.aws/credentials to exfiltrate.

For CI/CD pipelines, the modern standard is OIDC federation. GitHub Actions, GitLab CI, and most modern CI platforms can authenticate as themselves using an OIDC token and assume an IAM role that trusts their identity. No stored credentials, no rotation, no credential scanning needed for a secret that does not exist.

The three-layer defence for least privilege at scale: Access Analyzer generates tight policies per role. SCPs at the AWS Organizations level set hard limits that no account-level policy can override — denying iam:CreateUser in non-security accounts, restricting compute to approved regions, preventing S3 public access changes. AWS Config rules provide continuous monitoring and alert within minutes when a role receives broader permissions than its baseline, catching the exact mistake from the production incident above.

io/thecodeforge/security/generate_least_privilege_policy.shBASH
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
#!/usr/bin/env bash
# io.thecodeforge: Automated Least-Privilege Policy Generation
# Prerequisites:
#   - CloudTrail enabled in the target region
#   - An AccessAnalyzer ServiceRole with CloudTrail read permissions
#   - The target role has been running in staging for at least 7 days

set -euo pipefail

ROLE_NAME="StagingAppRole"
ACCOUNT_ID="123456789012"
REGION="eu-west-1"
CLOUDTRAIL_ARN="arn:aws:cloudtrail:${REGION}:${ACCOUNT_ID}:trail/main-trail"
ANALYZER_ROLE="arn:aws:iam::${ACCOUNT_ID}:role/AccessAnalyzerServiceRole"
ROLE_ARN="arn:aws:iam::${ACCOUNT_ID}:role/${ROLE_NAME}"

echo "==> Starting policy generation for ${ROLE_NAME}"
echo "    Based on CloudTrail activity in ${REGION} for the last 90 days"
echo ""

# Step 1: Start the policy generation job
# Access Analyzer reads CloudTrail logs and identifies every API call made by the role
JOB_ID=$(aws accessanalyzer start-policy-generation \
  --region "${REGION}" \
  --policy-generation-details "principalArn=${ROLE_ARN}" \
  --cloud-trail-details "{\
    \"accessRole\": \"${ANALYZER_ROLE}\",\
    \"trails\": [{\
      \"cloudTrailArn\": \"${CLOUDTRAIL_ARN}\",\
      \"regions\": [\"${REGION}\"]\
    }],\
    \"startTime\": \"$(date -d '90 days ago' --iso-8601=seconds)\",\
    \"endTime\": \"$(date --iso-8601=seconds)\"\
  }" \
  --query 'jobId' \
  --output text)

echo "==> Job ID: ${JOB_ID} — waiting for completion..."

# Step 2: Poll until the job completes (typically 15 minutes)
while true; do
  STATUS=$(aws accessanalyzer get-generated-policy \
    --region "${REGION}" \
    --job-id "${JOB_ID}" \
    --query 'jobDetails.status' \
    --output text)

  if [[ "${STATUS}" == "SUCCEEDED" ]]; then
    echo "==> Job completed successfully"
    break
  elif [[ "${STATUS}" == "FAILED" ]]; then
    echo "ERROR: Policy generation failed. Check CloudTrail permissions on ${ANALYZER_ROLE}"
    exit 1
  fi

  echo "    Status: ${STATUS} — checking again in 20s..."
  sleep 20
done

# Step 3: Extract the generated policy document
OUTPUT_FILE="least_privilege_${ROLE_NAME}_$(date +%Y%m%d).json"
aws accessanalyzer get-generated-policy \
  --region "${REGION}" \
  --job-id "${JOB_ID}" \
  --query 'generatedPolicyResult.generatedPolicies[0].policy' \
  --output text > "${OUTPUT_FILE}"

echo ""
echo "==> Generated policy written to: ${OUTPUT_FILE}"
echo "    Review the policy before applying. Key things to verify:"
echo "    1. No wildcard (*) resources — every action should target a specific ARN"
echo "    2. No unexpected actions — if you see iam:* or s3:* in the output, investigate"
echo "    3. Condition keys — add aws:RequestedRegion conditions for all write actions"
echo ""
echo "==> To apply to the production role:"
echo "    aws iam put-role-policy \\"
echo "      --role-name ProductionAppRole \\"
echo "      --policy-name GeneratedLeastPrivilege \\"
echo "      --policy-document file://${OUTPUT_FILE}"
echo ""
cat "${OUTPUT_FILE}" | jq .
Output
==> Starting policy generation for StagingAppRole
Based on CloudTrail activity in eu-west-1 for the last 90 days
==> Job ID: job_abc123def456 — waiting for completion...
Status: IN_PROGRESS — checking again in 20s...
Status: IN_PROGRESS — checking again in 20s...
==> Job completed successfully
==> Generated policy written to: least_privilege_StagingAppRole_20260420.json
# The output contains only the API calls the application actually made:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:PutObject"
],
"Resource": [
"arn:aws:s3:::thecodeforge-app-data/uploads/*"
]
},
{
"Effect": "Allow",
"Action": [
"dynamodb:PutItem",
"dynamodb:GetItem",
"dynamodb:Query"
],
"Resource": [
"arn:aws:dynamodb:eu-west-1:123456789012:table/ForgeOrders"
]
},
{
"Effect": "Allow",
"Action": [
"kms:Decrypt",
"kms:GenerateDataKey"
],
"Resource": [
"arn:aws:kms:eu-west-1:123456789012:key/mrk-abc123"
]
}
]
}
The Three-Layer Defence for Least Privilege at Scale
Layer 1 — IAM Access Analyzer: generates tight, data-driven policies per role from actual CloudTrail usage. Eliminates guesswork entirely. Layer 2 — AWS Organizations SCPs: set hard organisational limits that no account-level policy can override. Deny iam:CreateUser in application accounts. Restrict compute to approved regions. Prevent public S3 bucket creation globally. Layer 3 — AWS Config rules: continuously monitor and alert when any role receives permissions above its approved baseline. iam-policy-no-statements-with-admin-access catches the exact pattern from the production incident above within minutes of the change being made. This is a continuous process, not a one-time audit.
Production Insight
AdministratorAccess on any workload role is a single point of total account compromise — it converts every vulnerability in that workload from an application incident into an infrastructure incident.
IAM Access Analyzer policy generation is based on actual API calls from CloudTrail, not guesses. The output is specific to your application's real behaviour, not a generic template. It takes 10 minutes and produces a policy accurate enough to ship to production after review.
For GitHub Actions OIDC setup, the trust policy condition should validate both the repository and the branch: StringLike on token.actions.githubusercontent.com:sub matching repo:your-org/your-repo:ref:refs/heads/main. Without the branch condition, any branch in the repo can assume the deployment role.
Rule: deploy broad in staging with CloudTrail enabled, let it run for at least 7 days to capture all code paths, use Access Analyzer to generate the tight policy, apply in production, and set up a Config rule to alert if permissions ever broaden again.
Key Takeaway
Least privilege is not a one-time task — it is a continuous process of generation, validation, and monitoring. Access Analyzer turns CloudTrail data into tight, accurate policies in 10 minutes. Three-layer defence: Access Analyzer per role, SCPs per organisation, Config rules for continuous alerting. The combination of all three makes AdministratorAccess on workload roles a Config alarm within minutes rather than a problem discovered during a breach investigation.
Least Privilege Implementation Strategy
IfNew service being deployed for the first time with unknown permission requirements
UseDeploy in staging with a scoped policy covering the expected actions plus a broad read policy for the relevant service. Run real traffic. Use Access Analyzer to generate the exact minimum policy from 7+ days of CloudTrail data.
IfExisting service with overly broad permissions that needs tightening
UseEnable CloudTrail if not already enabled, let the service run for 7 to 14 days under normal load to capture all code paths, then use Access Analyzer to generate a policy from actual usage. Apply in staging first, validate, then promote to production.
IfOrganisation-wide enforcement needed to prevent future overpermissioning
UseImplement SCPs at the AWS Organizations OU level: deny iam:CreateUser in application accounts, restrict ec2:RunInstances to approved instance types and regions, deny s3:PutBucketPublicAccessBlock to prevent public bucket creation.
IfContinuous compliance monitoring across many accounts and roles
UseDeploy AWS Config rules: iam-policy-no-statements-with-admin-access flags any role with admin permissions, iam-user-no-policies-check flags direct user policy attachments, iam-root-access-key-check confirms no root access keys exist.

IAM Policy JSON Structure — Effect, Action, Resource, and Condition

Every IAM policy is a JSON document with a specific structure. Understanding this structure down to the field level is essential for writing, auditing, and troubleshooting policies. The core of any policy is the Statement array, where each statement is an independent permission rule.

The table below breaks down each top-level field and the sub-fields inside a statement.

FieldRequired?DescriptionExample Value
VersionYesPolicy language version. Always use "2012-10-17" — the only version AWS supports."2012-10-17"
StatementYesArray of one or more individual permission statements. Each statement is evaluated independently; if one allows and another denies, the deny wins.[ { ... } ]
SidNoOptional statement identifier. Useful for auditing and debugging — you can include a human-readable name for each rule."AllowS3ReadAccess"
EffectYesEither "Allow" or "Deny". Deny always overrides Allow, regardless of the order of statements."Deny"
ActionYesOne or more AWS API actions. Use wildcard * to match all actions for a service. Use full service prefixes like s3:GetObject.[ "s3:GetObject", "s3:PutObject" ] or "ec2:*"
ResourceYesOne or more ARNs that the action applies to. Use "*" sparingly — each wildcard is an opportunity for privilege escalation."arn:aws:s3:::my-bucket/" or ""
ConditionNoOptional block that specifies when the policy applies. Conditions use operators like StringEquals, IpAddress, ArnLike with keys like aws:SourceIp or aws:RequestedRegion.{ "StringEquals": { "aws:RequestedRegion": "us-east-1" } }
PrincipalOnly in Trust/Resource PoliciesDefines who the policy applies to. In identity-based policies, this field is not allowed; using Principal in a permission policy causes an error.{ "Service": "lambda.amazonaws.com" } or { "AWS": "arn:aws:iam::123456789012:root" }
NotAction / NotResourceNoInverse conditions — specify actions or resources that are excluded. Use with care: NotAction often leads to unintended allowances."NotAction": "iam:*" would deny everything except IAM actions when paired with Effect": "Deny"

Key rule for beginners: Every statement must have Effect, Action, and Resource. The most common mistake is forgetting the Resource field — it is required for almost every service, and leaving it out causes a policy validation error during attachment. Use the IAM Policy Simulator before putting a new policy into production to catch these mistakes early.

Production insight: When auditing existing policies, look for statements that use "Resource": "" combined with "Effect": "Allow" and a broad action like s3: or ec2:*. Each such statement is a potential privilege escalation vector. IAM Access Analyzer can generate a report of all policies in your account with these patterns, giving you a concrete backlog of items to tighten.

io/thecodeforge/iam/policy_structure_example.jsonJSON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "AllowDynamoDBReadWrite",
      "Effect": "Allow",
      "Action": [
        "dynamodb:GetItem",
        "dynamodb:PutItem",
        "dynamodb:Query"
      ],
      "Resource": "arn:aws:dynamodb:us-east-1:123456789012:table/Orders"
    },
    {
      "Sid": "DenyDeleteTable",
      "Effect": "Deny",
      "Action": "dynamodb:DeleteTable",
      "Resource": "arn:aws:dynamodb:us-east-1:123456789012:table/Orders"
    }
  ]
}
Output
# Attach this policy to a role; the two statements will be evaluated:
# First statement: Allows GetItem, PutItem, Query on the Orders table.
# Second statement: Denies DeleteTable on the same table.
# Because Deny overrides Allow, the role cannot delete the table even though other actions are permitted.
# Use the simulator to confirm:
# aws iam simulate-principal-policy --policy-source-arn <role-arn> --action-names dynamodb:DeleteTable --resource-arns arn:aws:dynamodb:us-east-1:123456789012:table/Orders
# Expected: {"EvalDecision": "explicitDeny"}
Condition: Your Secret Weapon for Scoping Permissions
Many policies are wildly broad because the writer did not know about Condition keys. For example, instead of allowing s3:GetObject on all buckets, you can write a single statement with a Condition that restricts access to a specific source IP range or a specific VPC endpoint. Condition keys like aws:SourceIp, aws:RequestedRegion, aws:PrincipalOrgID, and aws:ResourceTag are your primary tools for narrowing permissions without adding more statements. Always think: can I add a condition to this Allow to make it narrower?
Production Insight
The Principal field is the most common source of confusion for engineers new to cross-account access. In identity-based policies (attached to a User/Role), Principal is forbidden — AWS tells you who you are from the credentials, and the policy just defines what you can do. In resource-based policies (S3 bucket policy, KMS key policy, SQS queue policy) and trust policies, Principal is mandatory: it specifies who gets the permission. Putting Principal in the wrong kind of policy is a syntax error that prevents deployment.
Wildcard resources ("Resource": "") are necessary for some actions (like sts:AssumeRole, iam:GetUser, cloudwatch:PutMetricData) but should be audited rigorously. Every wildcard resource that can be replaced with a specific ARN should be. The IAM Policy Simulator can help you test whether a specific ARN would work instead of "".
Key Takeaway
The IAM policy JSON is a well-defined structure: Version, Statement array, each statement with Effect, Action, Resource, and optional Sid/Condition. NotAction and NotResource exist but are rarely needed. The most impactful skill is knowing how to use Condition keys to narrow permissions without expanding the Action or Resource lists.

Root Account Hardening — MFA and Access Key Removal Checklist

The AWS root account is the most powerful identity in any AWS account. It has unlimited access to every service and cannot be restricted by any IAM policy, SCP, or permission boundary. For that exact reason, root account credentials must never be used for daily operations, and the account must be hardened so that even if credentials leak, the blast radius is limited.

The following checklist covers every mandatory step to secure the root user. AWS recommends completing these steps within the first 24 hours of creating a new account.

☐ Enable MFA on the root account - Go to the IAM console, select "Security credentials" for the root user, and activate a virtual MFA device (Google Authenticator, Authy) or a hardware TOTP token. Do not rely on SMS — it is vulnerable to SIM-swap attacks. - Store the recovery code (QR code or secret key) in a secure offline location accessible to at least two trusted team members. The most common cause of lockout is losing the MFA device and having no backup.

☐ Remove or disable root access keys - Root access keys are rarely needed and are a massive security risk because they cannot be restricted by any policy. Run aws iam get-account-summary and check the AccountAccessKeysPresent field. If true, delete the keys immediately: aws iam delete-access-key --user-name <root> --access-key-id <key>. - If you must keep a root access key for a legacy use case (such as a long-running CloudFormation bootstrap), create a strict rotation policy with a calendar reminder and set up an AWS Config rule (iam-root-access-key-check) to alert if the key exists at all.

☐ Set up an IAM user or role for administrative tasks - Create a dedicated IAM user for a few break-glass administrators, or better, use IAM Identity Center with role assumption. Never use the root user for anything except the initial account setup. - Attach the AdministratorAccess policy to this admin identity, then enforce MFA on that user as well.

☐ Configure an email alias for root account recovery - The root user's contact email address is used for password reset and billing notifications. Ensure it goes to a monitored distribution list (e.g., aws-admin@yourcompany.com) rather than an individual's inbox. - Keep the phone number for the root account current — it is used as a second factor for some support cases.

☐ Enable CloudTrail on the root account - CloudTrail logs all root user API calls. Create a trail that applies to all regions and delivers logs to a central S3 bucket in a security account. Set up an SNS notification for any root-level action via CloudTrail events with userIdentity.type = Root.

☐ Set up AWS Config rules to monitor root account activity - Rule: iam-root-access-key-check — alerts if any root access key exists. - Rule: iam-root-mfa-check — alerts if root MFA is not enabled. - Rule: cloudtrail-security-trail-enabled — ensures CloudTrail is logging.

☐ Lock down root user API access with an SCP (for Organizations) - If your account is part of AWS Organizations, create an SCP that explicitly denies all actions from the root user except GetAccountSummary, ChangePassword, and List* read actions. This prevents the root user from making changes even if credentials are compromised. - Note: SCPs cannot completely block the root user in a standalone account; they only apply in Organizations.

☐ Test the hardening - Log out, then attempt to sign in as root. Verify MFA works. Attempt an API call with root credentials (if any remain) and confirm it is blocked by SCP or policy. - Test that your administrative IAM user can perform all required management tasks without ever touching the root credentials.

Production insight: The most common root account incident is not a sophisticated attack — it is an engineer needing to change a support case or update billing and realising they have lost the MFA device. The fix for that is a multi-day support case with AWS. Having a backup MFA recovery code in a password manager accessible to two people prevents this. The second most common incident is a compromised AWS Partner Network (APN) account where the root access key was shared for convenience.

io/thecodeforge/security/root_account_hardening.shBASH
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
#!/usr/bin/env bash
# io.thecodeforge: Audit and Harden Root AccountRun under root credentials only
set -euo pipefail

echo "=== Root Account Security Audit ==="
echo "Date: $(date)"
echo ""

# 1. Check if root access keys exist
ROOT_KEY_STATUS=$(aws iam get-account-summary --query 'SummaryMap.AccountAccessKeysPresent' --output text)
if [ "$ROOT_KEY_STATUS" == "1" ]; then
  echo "⚠ WARNING: Root access keys are present!"
  echo "   List them: aws iam list-access-keys --user-name <root-user>"
  echo "   Delete manually: aws iam delete-access-key --user-name <root-user> --access-key-id <key-id>"
else
  echo "✓ Good: No root access keys found."
fi

# 2. Check if MFA is enabled on root
# This needs root credentials; if using an admin user, rely on AWS Config rule
ROOT_MFA=$(aws iam get-account-summary --query 'SummaryMap.AccountMFAEnabled' --output text)
if [ "$ROOT_MFA" == "1" ]; then
  echo "✓ MFA enabled on root user."
else
  echo "⚠ WARNING: Root MFA is NOT enabled. Enable immediately in IAM console."
fi

# 3. Check CloudTrail status
TRAIL_NAME=$(aws cloudtrail describe-trails --query 'trailList[0].Name' --output text 2>/dev/null || echo "none")
if [ "$TRAIL_NAME" == "none" ]; then
  echo "⚠ WARNING: No CloudTrail trails found. Create one."
else
  echo "✓ CloudTrail trail '$TRAIL_NAME' exists."
fi

echo ""
echo "=== Next Steps ==="
echo "- Enable Config rules: iam-root-access-key-check, iam-root-mfa-check"
echo "- Set up CloudTrail alert for root activity (EventBridge rule filtering Root user identity)"
echo "- Store MFA recovery code in a secure location"
Output
=== Root Account Security Audit ===
Date: Tue May 12 12:00:00 UTC 2026
⚠ WARNING: Root access keys are present!
List them: aws iam list-access-keys --user-name <root-user>
Delete manually: aws iam delete-access-key --user-name <root-user> --access-key-id <key-id>
⚠ WARNING: Root MFA is NOT enabled. Enable immediately in IAM console.
⚠ WARNING: No CloudTrail trails found. Create one.
=== Next Steps ===
- Enable Config rules: iam-root-access-key-check, iam-root-mfa-check
- Set up CloudTrail alert for root activity (EventBridge rule filtering Root user identity)
- Store MFA recovery code in a secure location
Never Use Root Credentials for Daily Operations
The root user is not subject to any IAM restrictions. If root credentials are used in a CI/CD pipeline or local environment, a compromise gives an attacker full control of the account with no guardrails. The only valid uses for root are: initial account setup, changing support plans, and closing the account. Use IAM Identity Center with an AdministratorAccess role for everything else. If you find a Root access key in a codebase, treat it as a critical incident and rotate immediately.
Production Insight
Even after enabling MFA and removing access keys, the root user's password remains a vector if someone guesses it and bypasses MFA? No, MFA prevents login. But the password itself could be weak. Ensure the root password is complex, stored in a password manager, and changed every 90 days. For accounts in AWS Organizations, an SCP can deny root actions to add an extra layer:
```json
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "DenyRootActions",
"Effect": "Deny",
"Action": "*",
"Resource": "*",
"Condition": {
"StringLike": {
"aws:PrincipalArn": "arn:aws:iam::*:root"
}
}
}
]
}
```
This SCP blocks all root actions except those explicitly allowed by another SCP (like changing password or contact info). Implement it at the root OU to cover all accounts.
Key Takeaway
Root account hardening is a one-time setup that prevents a class of catastrophic, non-auditable access. Enable MFA, delete all root access keys, set up CloudTrail alerts for root activity, and use an SCP to deny root API actions if using Organizations. Store MFA recovery codes in a team-accessible vault with at least two recovery contacts.

Cross-Account Access — STS AssumeRole Visual Walkthrough

Cross-account access is the ability for a resource or user in one AWS account to access resources in another account. This is essential for multi-account strategies, third-party integrations, and centralised logging or deployment pipelines. The mechanism is always the same: the requesting account assumes an IAM Role in the target account using the AWS Security Token Service (STS) via the sts:AssumeRole API call.

Understanding this flow visually is much easier than reading the policy documents in isolation. The diagram below maps the complete handshake between Account A (the caller) and Account B (the target resource owner).

The flow involves three policy documents that must all grant the appropriate permission: 1. Resource-based policy on the target resource (e.g., S3 bucket policy, KMS key policy, SQS queue policy) — this allows the assumed role's ARN to access the resource. 2. Trust policy on the target IAM Role — this allows the calling principal (from Account A) to assume the role. 3. Permission policy attached to the target role — this defines what actions the role can perform after assumption.

The requesting entity (user/role in Account A) also needs sts:AssumeRole permission on its identity policy targeting the role ARN in Account B.

Step-by-step process: 1. The caller (a Lambda function in Account A that has its own IAM role with sts:AssumeRole permission) sends an AssumeRole API call to Account B's role ARN. 2. AWS evaluates the trust policy on the target role in Account B. If the caller's principal ARN is allowed (and optional conditions like MFA or ExternalId are satisfied), STS returns temporary credentials (AccessKeyId, SecretAccessKey, SessionToken) that are valid for a configurable duration (default 1 hour, max 12 hours). 3. The caller uses those temporary credentials to make API calls to the target resource (e.g., write to an S3 bucket) in Account B. 4. The bucket policy in Account B must also allow the assumed role's ARN to perform the action.

Both the trust policy and the resource policy must be in place — missing either one results in AccessDenied. This is the most common debugging pitfall for cross-account setups.

Best practices: - Always use the ExternalId condition in the trust policy when granting access to a third party. This prevents the confused deputy problem where another customer of the same third party could trick the service into assuming your role. - Use aws:SourceArn or aws:SourceAccount condition keys instead of or in addition to ExternalId where possible — these are more secure because they tie the assumption to a specific resource in your account. - Set the role's MaxSessionDuration to the minimum required. For a nightly batch job, 1 hour is enough. For a long-running ETL, maybe 4 hours. Never use the maximum 12 hours unless there is a concrete need.

io/thecodeforge/iam/cross_account_role_example.jsonJSON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "TrustPolicyForAccountA",
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::111111111111:role/AppServerRole"
      },
      "Action": "sts:AssumeRole",
      "Condition": {
        "StringEquals": {
          "sts:ExternalId": "secure-shared-secret-abc123"
        }
      }
    }
  ]
}
Output
# This is the AssumeRolePolicyDocument (Trust Policy) attached to Role-B in Account B.
# It allows only the specific role in Account A to assume Role-B, and only if the caller provides the correct ExternalId.
# The Permission Policy on Role-B would then allow S3:PutObject on bucket-bucket.
# The caller's identity policy must also include:
# {
# "Effect": "Allow",
# "Action": "sts:AssumeRole",
# "Resource": "arn:aws:iam::222222222222:role/CrossAccountRole"
# }
# To test:
# aws sts assume-role --role-arn arn:aws:iam::222222222222:role/CrossAccountRole --role-session-name test-session --external-id secure-shared-secret-abc123
Debugging Cross-Account Access: Check Both Sides
When cross-account access fails, the error message often says "AccessDenied" but doesn't say which side is wrong. Your debugging process must check three things sequentially: (1) Does the caller have sts:AssumeRole perms on the target role ARN? (2) Does the target role's trust policy allow the caller's principal? (3) Does the target resource's policy (bucket policy, key policy) allow the role's ARN? If any of these is missing or wrong, the call fails. Use the IAM Policy Simulator on both the caller and the target role to isolate the issue.
Production Insight
One common cross-account pattern that catches teams off guard is when the target resource uses KMS encryption. The KMS key policy must explicitly grant kms:Decrypt and kms:GenerateDataKey to the assumed role ARN. Without that, you get an AccessDenied from KMS even when S3 and IAM policies are perfect. Always check KMS key policies when debugging cross-account S3 or DynamoDB access with server-side encryption.
Another pitfall: the trust policy's Principal field must use the full ARN of the caller (including the role name), not just the account ID. Using "AWS": "arn:aws:iam::111111111111:role/AppServerRole" is precise. Using "AWS": "111111111111" (the account ID) would allow any principal in that account — including IAM users — which may be too broad. Be explicit.
Key Takeaway
Cross-account access requires three separate permissions to all be present: the caller's permission to assume the role, the target role's trust policy allowing the caller, and the target resource's resource-based policy allowing the role. The ExternalId condition prevents the confused deputy problem when working with third parties. Always validate both the identity side and the resource side using the Policy Simulator.
● Production incidentPOST-MORTEMseverity: high

Compromised Lambda with AdministratorAccess — full account takeover in 4 minutes

Symptom
AWS Cost Explorer shows $47,000 in unexpected EC2 charges overnight. CloudTrail logs reveal three new IAM users created at 3:14 AM with AdministratorAccess attached. S3 access logs show 2TB downloaded from a customer data bucket. The Lambda function's CloudWatch logs show no errors — the function itself was working fine, which is why nobody looked there first.
Assumption
Team initially assumed the cost spike was a misconfigured auto-scaling group. Then they suspected a compromised developer laptop — reasonable, given the IAM user creation events. The Lambda function was never considered because it 'only processes webhooks' and was not a security concern in anyone's mental model.
Root cause
Two weeks prior, a developer debugging an S3 permissions error attached the AdministratorAccess managed policy to the Lambda execution role for a quick test. The debugging was completed, the original error was resolved, and the policy was never removed. The Lambda had a transitive dependency — an npm package — with a known prototype pollution vulnerability that had a published CVE and a public proof-of-concept exploit. An attacker sent a crafted webhook payload that exploited the vulnerability and achieved code execution inside the Lambda runtime. The Lambda's IAM role had AdministratorAccess, so every API call the attacker's code made had the same permissions as a root-adjacent administrator. In 4 minutes: three IAM users created for persistent backdoor access, 50 p3.8xlarge EC2 instances launched for cryptocurrency mining, and 2TB of customer data downloaded from S3.
Fix
1. Immediate: revoked the Lambda role's active STS session tokens by modifying the trust policy to deny all principals temporarily, deleted the three backdoor IAM users, terminated all mining EC2 instances. 2. Short-term: replaced AdministratorAccess with a scoped policy generated by IAM Access Analyzer from the previous 90 days of CloudTrail activity — actual required permissions were s3:GetObject, s3:PutObject, and logs:CreateLogGroup. 3. Long-term: implemented an AWS Config rule iam-policy-no-statements-with-admin-access to alert within 5 minutes of any role receiving admin permissions. Added an SCP in AWS Organizations denying iam:CreateUser and iam:AttachUserPolicy in all non-security accounts. Added Snyk and Dependabot to the CI/CD pipeline for dependency vulnerability scanning before merge.
Key lesson
  • AdministratorAccess on a workload role is a single point of total account compromise — a code vulnerability in the function is now a full account vulnerability
  • Use IAM Access Analyzer to generate least-privilege policies from actual CloudTrail calls — it takes 10 minutes and eliminates a category of risk entirely
  • A compromised Lambda inherits its role's permissions exactly — the security of the function's code and the security of its IAM role are the same problem
  • AWS Config rules and SCPs provide automated guardrails that catch what code review misses — deploy them before you need them, not after
Production debug guideSystematic path from AccessDenied error to root cause.6 entries
Symptom · 01
AccessDenied with no additional context in the error message.
Fix
Check CloudTrail for the exact API call, the principal ARN, and the error code. The event record contains the errorCode and errorMessage fields that specify which policy evaluation caused the denial. Filter by eventName=<failing-api> and errorCode=AccessDenied. The CloudTrail event is almost always more informative than the client-side error message.
Symptom · 02
AccessDenied even though the permission policy clearly allows the action.
Fix
Check the Trust Policy first — the AssumeRolePolicyDocument on the role itself. If the calling service or principal is not listed in the Trust Policy's Principal block, the role cannot be assumed at all, regardless of what the permission policy says. This is the single most common cause of 'but my policy allows it' debugging sessions. Then check for SCPs at the organization level and Permission Boundaries on the role.
Symptom · 03
Lambda or EC2 works with local credentials but gets AccessDenied in AWS.
Fix
Your local credentials are your IAM user's permissions. The Lambda or EC2 runs under its execution role, which has different — and usually more restricted — permissions. Run aws sts get-caller-identity inside the Lambda (via a log statement) or on the EC2 instance to confirm which role is actually active. Then check that role's attached policies, not your user's policies.
Symptom · 04
AccessDenied for cross-account resource access despite both sides configured.
Fix
Cross-account access requires two separate grants to both be present: a Trust Policy on the target role allowing the source account's principal to assume it, AND a resource-based policy on the target resource (S3 bucket policy, KMS key policy, SQS queue policy) allowing the role's ARN. If either one is missing, the access fails. Check both sides before assuming either is wrong.
Symptom · 05
Intermittent AccessDenied — works sometimes, fails other times.
Fix
Check for Service Control Policies at the AWS Organizations level. SCPs override all identity-based and resource-based policies — an SCP deny is absolute. Intermittent failures often occur when requests route through different accounts or when SCPs are applied at the OU level rather than the account level. Run aws organizations list-policies-for-target --target-id <account-id> --filter SERVICE_CONTROL_POLICY and examine the results.
Symptom · 06
AccessDenied for an action that worked yesterday with no apparent changes.
Fix
Check if a Permission Boundary was recently attached to the role. Permission boundaries set a maximum ceiling on what a role can do — even if the permission policy explicitly allows an action, the boundary can deny it, and the denial wins. Run aws iam get-role --role-name <name> and look for the PermissionsBoundary field in the response. Permission boundaries are often added by platform teams without notifying application teams.
★ IAM Debug Cheat SheetQuick commands to diagnose IAM permission issues in production.
AccessDenied — need to identify which policy is causing the denial.
Immediate action
Check CloudTrail for the exact event and then simulate it with the policy simulator.
Commands
aws cloudtrail lookup-events --lookup-attributes AttributeKey=EventName,AttributeValue=<api-action> --max-results 5 --query 'Events[*].CloudTrailEvent' --output text | jq .
aws iam simulate-principal-policy --policy-source-arn <role-arn> --action-names <failing-action> --resource-arns <resource-arn>
Fix now
The simulator output shows exactly which statement evaluated to deny and at which policy level. Fix that specific statement — do not add more Allows hoping one will stick.
Need to see all policies attached to a role — both managed and inline.+
Immediate action
List both managed and inline policies separately — AWS separates them in the API.
Commands
aws iam list-attached-role-policies --role-name <role-name>
aws iam list-role-policies --role-name <role-name>
Fix now
For each managed policy ARN returned, inspect its document: aws iam get-policy-version --policy-arn <arn> --version-id <version>. For inline policies: aws iam get-role-policy --role-name <name> --policy-name <name>
Lambda or EC2 may be using wrong credentials — need to verify the active role.+
Immediate action
Call STS from inside the running workload to confirm the actual identity.
Commands
aws sts get-caller-identity
curl http://169.254.169.254/latest/meta-data/iam/security-credentials/
Fix now
If the role ARN does not match your expectation, check the Lambda's execution role in its function configuration or the EC2 instance's attached Instance Profile — not just the role definition itself.
Suspect an SCP is blocking access at the organization level.+
Immediate action
List SCPs applied to the account and inspect their statements.
Commands
aws organizations list-policies-for-target --target-id <account-id> --filter SERVICE_CONTROL_POLICY
aws organizations describe-policy --policy-id <scp-id> | jq '.Policy.Document | fromjson'
Fix now
Look for explicit Deny statements matching your failing action. SCPs override all identity-based and resource-based policies — an SCP deny cannot be overridden by any IAM policy at the account level.
IAM Role vs IAM User with Access Key
AspectIAM RoleIAM User with Access Key
Credential lifetimeTemporary — 15 minutes to 12 hours, auto-rotated by STS on every assumptionPermanent — never expires until manually rotated or the key is explicitly deleted
Best used forEC2 instances, Lambda functions, ECS tasks, CI/CD pipelines via OIDC, cross-account accessLegacy scripts or third-party tools that have no support for role assumption or OIDC federation
Rotation required?No — STS handles credential rotation automatically on every assumption; you cannot create a stale role credentialYes — manual rotation is mandatory, AWS recommends every 90 days, and it is frequently skipped or forgotten
Leaked credential impactLow — the token expires within hours and cannot be renewed without re-assuming the role through the trust policyHigh — the Access Key works indefinitely until someone notices and manually revokes it; discovery often takes days or weeks
Multi-account supportYes — role chaining across accounts is native; a role in Account A can assume a role in Account B with appropriate trust policiesNo — Access Keys are scoped to the account where the user was created
MFA enforcementCan require MFA to assume the role via aws:MultiFactorAuthPresent condition key in the Trust PolicyMFA can be required for console access; programmatic Access Key usage bypasses MFA entirely by design
AWS recommended for workloads?Yes — the AWS security documentation explicitly recommends Roles for all workload identities; it is the designed patternNo — AWS explicitly recommends against Access Keys for workloads and has been deprecating their use in official guidance since 2022

Key takeaways

1
Roles beat Users for every machine identity
STS tokens auto-expire in hours, Access Keys do not expire until someone manually revokes them. A leaked temporary credential becomes useless. A leaked Access Key works until someone notices, which is often measured in days.
2
There are two completely separate policies on every IAM Role
the Trust Policy controlling who can assume the role, and the Permission Policy controlling what the role can do. Both must be correct. The Trust Policy is evaluated first — if it fails, the Permission Policy is never consulted.
3
An explicit Deny in any policy always wins over an Allow, regardless of what any other policy says. AWS IAM is default-deny
silence equals blocked, and every request must earn its way to allowed through explicit grants.
4
IAM Access Analyzer's policy generation feature is the practical path to least privilege
run your application in staging with CloudTrail enabled, generate a policy from actual API calls, apply in production. It converts least privilege from a guessing exercise into a data-driven process.

Common mistakes to avoid

5 patterns
×

Using AdministratorAccess on a Lambda or EC2 execution role to avoid debugging permission errors

Symptom
The application works correctly, so nobody notices. But if the function is compromised through a code vulnerability or a dependency with a CVE, the attacker has full account access — every IAM action, every service, every region. A single npm package vulnerability becomes a company-wide incident.
Fix
Use IAM Access Analyzer to generate the exact minimum policy from 7+ days of staging CloudTrail data. It takes 10 minutes. The resulting policy contains only the specific actions and resources the application actually used. Apply that policy to production and set up a Config rule to alert if permissions ever broaden beyond the approved baseline.
×

Forgetting the InstanceProfile wrapper when attaching an IAM Role to EC2 in CloudFormation

Symptom
You create a perfect IAM Role in CloudFormation, deploy successfully, and EC2 throws 'Unable to locate credentials' at runtime. The role exists, the permissions are correct, but the instance cannot access credentials because no Instance Profile was created.
Fix
Always create a separate AWS::IAM::InstanceProfile resource and attach the Role to it. Then reference the InstanceProfile ARN (not the Role ARN) in your EC2 LaunchTemplate or Instance resource. The EC2 console hides this by creating the Instance Profile automatically — CloudFormation does not.
×

Writing a Trust Policy with Principal set to asterisk with no conditions

Symptom
Any AWS principal in any account anywhere can assume your role. A security audit flags it as a critical finding. In a worst case, an attacker who discovers the role ARN can assume it from their own AWS account without any stolen credentials.
Fix
Always scope the Principal to a specific AWS service endpoint (lambda.amazonaws.com), a specific account ID, or a specific ARN. For cross-account access to third parties, add a Condition requiring aws:PrincipalOrgID to match your AWS Organization ID — this ensures only principals within your organisation can assume the role even if the account ID changes.
×

Storing Access Keys in ~/.aws/credentials for automation instead of using role assumption or OIDC

Symptom
A developer's laptop is stolen. The Access Key in credentials works indefinitely until the security team notices and manually revokes it — typically hours to days later. API access bypasses MFA entirely, so there is no second factor protecting the key.
Fix
Implement AWS IAM Identity Center linked to your corporate identity provider. Developers authenticate with their corporate credentials and receive temporary credentials expiring in 1 to 12 hours. For CI/CD, configure OIDC federation so pipelines assume a role using their identity token. No stored credentials exist to steal, rotate, or accidentally commit to a repository.
×

Debugging AccessDenied without checking SCPs at the AWS Organizations level

Symptom
The permission policy allows the action, the Trust Policy is correct, no Permission Boundary is attached, but AccessDenied persists. The developer spends hours modifying the permission policy, running simulations, and re-reading the documentation before someone mentions the account is in an AWS Organization.
Fix
Always check SCPs early in the debugging process: aws organizations list-policies-for-target --target-id <account-id> --filter SERVICE_CONTROL_POLICY. Inspect each SCP for explicit Deny statements matching your failing action. An SCP Deny overrides every other policy type and cannot be overridden by any policy within the account.
INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01SENIOR
What is the difference between a Trust Policy and a Permission Policy on...
Q02SENIOR
A Lambda function in Account A needs to read from DynamoDB in Account A ...
Q03SENIOR
Explain the Confused Deputy problem in AWS IAM. How does the ExternalId ...
Q04SENIOR
Describe the IAM policy evaluation order when Identity-based, Resource-b...
Q05SENIOR
A developer's Access Key was committed to a public GitHub repository. Wa...
Q01 of 05SENIOR

What is the difference between a Trust Policy and a Permission Policy on an IAM Role? What happens if the Trust Policy is missing the service that needs to use the role?

ANSWER
A Permission Policy defines what actions a role can perform — it is attached to the role and specifies allowed or denied API calls, resource targets, and conditions. A Trust Policy defines who is allowed to assume the role — it lives in the AssumeRolePolicyDocument on the role itself and is evaluated before the permission policy is ever consulted. If the Trust Policy does not list a service (for example, lambda.amazonaws.com), that service cannot assume the role regardless of what permissions the role has. The role is effectively invisible to that service. This is the most common cause of 'my policy allows it but I still get AccessDenied' — the Permission Policy is not the problem, the Trust Policy gate never opened.
FAQ · 4 QUESTIONS

Frequently Asked Questions

01
What is the difference between an IAM Role and an IAM User in AWS?
02
What does 'least privilege' mean in AWS IAM and how do you actually achieve it in practice?
03
Why does my EC2 instance say 'Unable to locate credentials' even though I created an IAM Role for it?
04
What is the difference between an Inline Policy and a Managed Policy in IAM?
🔥

That's Cloud. Mark it forged?

13 min read · try the examples if you haven't

Previous
AWS VPC and Networking
8 / 23 · Cloud
Next
AWS CloudFront and Route 53