Mid-level 14 min · March 06, 2026
AWS S3 Basics

S3 Principal * Bucket Policy — Why 50K SSNs Hit Google

One Principal * bucket policy made 50K customer SSNs searchable on Google.

N
Naren Founder & Principal Engineer

20+ years shipping production infrastructure and CI/CD at scale. Written from production experience, not tutorials.

Follow
Production
production tested
May 23, 2026
last updated
1,554
articles · all by Naren
 ● Production Incident 🔎 Debug Guide ⚙ Triage Commands
Quick Answer
  • S3 stores data as objects in globally unique buckets
  • Buckets are region-scoped; objects have keys (flat namespace) - not folders
  • Access control: bucket policies (persistent) vs presigned URLs (time-limited)
  • Lifecycle policies auto-tier objects (Standard → IA → Glacier) to cut costs by up to 95%
  • Multipart uploads required for files >5GB; incomplete uploads silently cost money
  • Enable versioning on every production bucket to prevent accidental data loss
✦ Definition~90s read
What is AWS S3 Basics?

Amazon S3 (Simple Storage Service) is object storage at planetary scale — you store files as objects in buckets, each object gets a URL, and you pay for what you use. The core problem S3 solves is durable, infinitely scalable storage without managing servers.

Imagine a massive warehouse with unlimited shelf space.

But the real trap is access control: a single misconfigured bucket policy with a Principal: "*" statement makes every object in that bucket publicly readable. That's how 50K Social Security Numbers end up indexed by Google — no authentication required, just a GET request to the object URL.

S3 doesn't warn you; it faithfully executes the policy you wrote.

S3's structure is deceptively simple: buckets are flat containers (no folders, just prefix-based key naming), objects are the files themselves (up to 5TB each), and every object has a unique key like reports/2024/ssn-export.csv. Access control happens at two levels: bucket policies (resource-based, attached to the bucket itself) and IAM policies (identity-based, attached to users/roles).

Bucket policies with Principal: "*" are the number one cause of data leaks because they grant anonymous access — no AWS account needed. IAM policies are safer because they require authenticated AWS credentials, but they can't grant cross-account access without a bucket policy.

When NOT to use S3: for structured relational data (use RDS or DynamoDB), for low-latency random access on small files (use EBS or instance store), or when you need POSIX file system semantics (use EFS or FSx). S3 is optimized for throughput, not IOPS — expect 3,500 PUT/5,500 GET requests per second per prefix.

Real-world numbers: Netflix stores 8+ PB of video assets, Reddit uses S3 for all user uploads, and Snowflake stores compressed data in S3. The 11 9s durability (99.999999999%) means losing an object is statistically impossible — but losing access because of a bad bucket policy is entirely possible and happens daily.

Plain-English First

Imagine a massive warehouse with unlimited shelf space. You rent a named section of that warehouse (a 'bucket'), and inside it you can store any box (a 'file') of any size. Each box gets a unique label so anyone — or only you — can find it later. That's S3: Amazon's unlimited, pay-per-byte file warehouse in the cloud, accessible from anywhere on the internet.

Every modern application eventually needs somewhere to put files — profile pictures, invoice PDFs, video uploads, database backups, static websites. The moment you outgrow a single server's disk, you need distributed, durable storage. AWS S3 is where the industry landed, and it's powered everything from Netflix's video catalogue to your favourite startup's user uploads for nearly two decades. If you're building anything serious on AWS, S3 is the first service you'll touch.

Before S3, teams had to spin up dedicated file servers, worry about disk failures, manually handle replication, and scramble when traffic spiked. S3 solved all of that in one API. It stores your data across multiple physical locations automatically, scales to literally exabytes without any configuration, and charges you only for what you use. The result: you stop thinking about storage infrastructure and start thinking about your product.

By the end of this article you'll know exactly how S3 is structured, how to create buckets and upload objects using both the AWS CLI and the Python Boto3 SDK, how to control who can access your data and when, and the real-world patterns that experienced engineers use daily — plus the costly gotchas that trip up everyone the first time.

Why a Single S3 Bucket Policy Leaked 50K SSNs

S3 bucket policies are JSON-based resource-based policies that define who can access a bucket and what actions they can perform on which objects. Unlike IAM policies attached to users or roles, bucket policies are attached directly to the bucket and can grant cross-account access without requiring any IAM configuration in the target account. The core mechanic is that the policy's Principal element can be set to "*" to allow anonymous access, which is the single most common cause of data exposure.

When you attach a bucket policy with Principal: "*" and an Allow effect, you are explicitly granting access to every unauthenticated internet user. AWS evaluates bucket policies alongside IAM policies — an explicit Allow in either wins unless there's an explicit Deny. This means a single misconfigured policy can override all other access controls. The policy is evaluated at the bucket level, not the object level, so even objects with ACLs denying access can be read if the bucket policy allows it.

Use bucket policies when you need to grant cross-account access, enforce HTTPS-only access, or set a blanket permission across all objects in a bucket. They are essential for public website hosting, but dangerous for any bucket containing sensitive data. The 2017 Verizon breach of 14 million customer records and the 2021 Pegasus Airlines leak of 6.5 million files both started with a single bucket policy allowing public read access.

Principal: "*" Is Not a Test
Setting Principal to "*" in a bucket policy grants access to every anonymous user on the internet — not just authenticated AWS users. This is the #1 cause of S3 data leaks.
Production Insight
A data engineering team set Principal: "" on a bucket containing 50K SSNs for a 'quick test' and forgot to remove it. The bucket was indexed by Google within 48 hours. Symptom: the bucket's 'ListObjects' operation returned a 200 OK from any curl request without credentials. Rule: never use Principal: "" in a bucket policy unless the bucket is explicitly designed for public content — and even then, restrict actions to s3:GetObject only.
Key Takeaway
Bucket policies with Principal: "*" grant anonymous access — there is no implicit authentication check.
Always test bucket policies with the AWS Policy Simulator before applying to production.
Use IAM policies for user/role access and bucket policies only for cross-account or service-level controls.
S3 Bucket Policy vs IAM Policy Access Control THECODEFORGE.IO S3 Bucket Policy vs IAM Policy Access Control How a misconfigured bucket policy leaked 50K SSNs to Google S3 Bucket Policy Resource-based policy attached to bucket IAM Policy Identity-based policy attached to user/role Principal Wildcard Allows any principal (including anonymous) Public Read Access Bucket policy grants s3:GetObject to * Data Exfiltration SSNs crawled and indexed by Google ⚠ Never use Principal: * in bucket policy without conditions Restrict to specific IAM roles or use aws:SourceArn THECODEFORGE.IO
thecodeforge.io
S3 Bucket Policy vs IAM Policy Access Control
Aws S3 Basics

Buckets and Objects — How S3 is Actually Structured

S3 has exactly two building blocks: buckets and objects. A bucket is a top-level container — think of it as a named partition of Amazon's storage infrastructure. Every bucket name must be globally unique across all AWS accounts everywhere. Not just unique to you — unique to every person using S3 on Earth. That's why 'images' is taken, but 'acme-corp-product-images-prod' probably isn't.

An object is any file you store inside a bucket. It has three parts: the key (the file path, e.g. 'invoices/2024/jan/invoice-001.pdf'), the data itself (up to 5TB per object), and metadata (key-value pairs like content type or custom tags).

Here's the important mental model shift: S3 is NOT a filesystem. There are no real folders. 'invoices/2024/jan/' is just a prefix in the key name. The AWS console and SDK simulate folders for your convenience, but under the hood every object lives flat in the bucket identified by its full key string. This matters when you're listing, filtering, or managing costs at scale.

Buckets are also region-specific. When you create a bucket in us-east-1, your data lives there unless you explicitly set up replication. Always create buckets in the region closest to your users or your compute layer.

s3_bucket_and_object_basics.shBASH
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
#!/bin/bash
# ── Prerequisites: AWS CLI installed and configured with `aws configure` ──

# 1. Create a new bucket in us-east-1
#    Bucket names: lowercase, 3-63 chars, no underscores, globally unique
aws s3api create-bucket \
  --bucket acme-corp-user-uploads-dev \
  --region us-east-1
# Note: us-east-1 is the only region that does NOT need a LocationConstraint.
# Every other region requires --create-bucket-configuration LocationConstraint=<region>

# 2. Upload a local file as an S3 object
#    The key here is 'avatars/user-4821/profile.jpg'
#    S3 doesn't create a folder -- 'avatars/user-4821/' is just part of the key name
aws s3 cp ./profile.jpg \
  s3://acme-corp-user-uploads-dev/avatars/user-4821/profile.jpg

# 3. List objects using a prefix filter (simulates folder browsing)
#    This returns ONLY objects whose key starts with 'avatars/user-4821/'
aws s3 ls s3://acme-corp-user-uploads-dev/avatars/user-4821/

# 4. Download the object back to verify the round-trip
aws s3 cp \
  s3://acme-corp-user-uploads-dev/avatars/user-4821/profile.jpg \
  ./profile_downloaded.jpg

# 5. Delete the object
aws s3 rm s3://acme-corp-user-uploads-dev/avatars/user-4821/profile.jpg
Output
# Step 1 -- create-bucket
{
"Location": "/acme-corp-user-uploads-dev"
}
# Step 2 -- cp upload
upload: ./profile.jpg to s3://acme-corp-user-uploads-dev/avatars/user-4821/profile.jpg
# Step 3 -- ls
2024-06-12 14:03:22 84231 profile.jpg
# Step 4 -- cp download
download: s3://acme-corp-user-uploads-dev/avatars/user-4821/profile.jpg to ./profile_downloaded.jpg
# Step 5 -- rm
delete: s3://acme-corp-user-uploads-dev/avatars/user-4821/profile.jpg
Watch Out: The us-east-1 LocationConstraint Trap
If you run create-bucket with --create-bucket-configuration LocationConstraint=us-east-1, AWS throws a cryptic 'InvalidLocationConstraint' error. us-east-1 is the default region and must NOT have a LocationConstraint. Every other region — eu-west-1, ap-southeast-2, etc. — requires it. This catches almost everyone on their first cross-region script.
Production Insight
Using prefix-based folder simulation means listing 10M objects under one prefix is slow.
S3 ListObjectsV2 is limited to 1000 keys per call.
Rule: design your key hierarchy to keep object counts per prefix under 100k for fast listing.
Key Takeaway
S3 object keys are flat strings, not real file paths.
'nested/folders/' is just a key prefix.
Design key structure for performance, not aesthetics.

Access Control — Who Can See Your Files and Why It Matters

By default, every S3 bucket and every object inside it is completely private. Nothing is publicly accessible unless you deliberately make it so. This is the right default, but it means you need to understand the two main ways to grant access: bucket policies and presigned URLs.

A bucket policy is a JSON document attached to the bucket that grants broad, persistent permissions — for example, allowing your application's IAM role to read everything under the 'invoices/' prefix, or making a 'public-assets/' prefix readable by the entire internet for a static website. Bucket policies are evaluated on every request to that bucket, so they're perfect for service-to-service access.

A presigned URL is the smarter choice for user-facing file access. Instead of making objects public, you generate a time-limited URL server-side that temporarily grants access to one specific object. The URL embeds cryptographic credentials and an expiry timestamp. When it expires, access is gone — automatically, no cleanup needed. This is how every serious application handles file downloads and uploads: the backend stays in control, and the client gets a URL that works just long enough.

Never make a bucket public unless it's genuinely meant to serve public static assets. Even then, use a CloudFront distribution in front of it rather than direct public bucket access.

s3_access_control.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
import boto3
import json
from datetime import datetime

# ── Boto3 picks up credentials from env vars, ~/.aws/credentials, or IAM role ──
s3_client = boto3.client('s3', region_name='us-east-1')

BUCKET_NAME = 'acme-corp-user-uploads-dev'

# ── PART 1: Attach a bucket policy ──
# This policy allows ONLY our application's IAM role to read objects
# under the 'invoices/' prefix. Nothing else in the bucket is affected.
bucket_policy = {
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "AllowAppRoleInvoiceRead",
            "Effect": "Allow",
            "Principal": {
                # Replace with your actual IAM role ARN
                "AWS": "arn:aws:iam::123456789012:role/AcmeAppServerRole"
            },
            "Action": "s3:GetObject",
            # The /* at the end means: any object under this prefix
            "Resource": f"arn:aws:s3:::{BUCKET_NAME}/invoices/*"
        }
    ]
}

s3_client.put_bucket_policy(
    Bucket=BUCKET_NAME,
    Policy=json.dumps(bucket_policy)  # Policy must be serialised to a JSON string
)
print(f"Bucket policy applied to {BUCKET_NAME}")

# ── PART 2: Generate a presigned URL for secure, time-limited object access ──
# Use case: a user clicks 'Download Invoice' in your web app.
# Your backend generates this URL and returns it. The browser hits S3 directly.
# The object itself stays private the whole time.

object_key = 'invoices/2024/jan/invoice-001.pdf'
expiry_seconds = 3600  # URL valid for exactly 1 hour

presigned_download_url = s3_client.generate_presigned_url(
    ClientMethod='get_object',  # 'put_object' works the same way for uploads
    Params={
        'Bucket': BUCKET_NAME,
        'Key': object_key
    },
    ExpiresIn=expiry_seconds
)

print(f"\nPresigned download URL (valid for {expiry_seconds}s):")
print(presigned_download_url)
print(f"\nURL expires at approximately: {datetime.utcnow()} + {expiry_seconds // 60} minutes")

# ── PART 3: Generate a presigned URL for direct-upload from the browser ──
# The client uploads straight to S3 — your server never handles the file bytes.
# This is the standard pattern for large file uploads.
presigned_upload_url = s3_client.generate_presigned_url(
    ClientMethod='put_object',
    Params={
        'Bucket': BUCKET_NAME,
        'Key': 'avatars/user-9001/profile.jpg',
        'ContentType': 'image/jpeg'  # Enforce content type at signing time
    },
    ExpiresIn=300  # Upload must start within 5 minutes
)

print(f"\nPresigned upload URL (valid for 300s):")
print(presigned_upload_url)
Output
Bucket policy applied to acme-corp-user-uploads-dev
Presigned download URL (valid for 3600s):
https://acme-corp-user-uploads-dev.s3.amazonaws.com/invoices/2024/jan/invoice-001.pdf?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIOSFODNN7EXAMPLE%2F20240612%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20240612T140322Z&X-Amz-Expires=3600&X-Amz-SignedHeaders=host&X-Amz-Signature=3d8a2f...
URL expires at approximately: 2024-06-12 14:03:22 + 60 minutes
Presigned upload URL (valid for 300s):
https://acme-corp-user-uploads-dev.s3.amazonaws.com/avatars/user-9001/profile.jpg?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIOSFODNN7EXAMPLE%2F20240612%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20240612T140322Z&X-Amz-Expires=300&X-Amz-SignedHeaders=content-type%3Bhost&X-Amz-Signature=9c1b4e...
Pro Tip: Presigned Upload URLs = No File Size Limit on Your Server
When you generate a presigned PUT URL and give it to the browser, the user uploads directly from their device to S3. Your application server never receives the file bytes. This means you're not capped by your server's memory, you're not paying egress costs for the upload, and 10 GB video files upload just as smoothly as 10 KB thumbnails. This pattern is used by almost every major SaaS product.
Production Insight
Bucket policies are evaluated first, then IAM policies, then ACLs.
If you have a Deny in any layer, access is denied — no Allow overrides a Deny.
Rule: use bucket policies for bucket-wide rules, IAM for user-specific, presigned URLs for per-object client access.
Key Takeaway
Presigned URLs are the production-grade pattern for user file access.
They keep objects private while granting time-limited, per-object access.
Never expose your bucket or objects to the public internet.

Bucket Policy vs IAM Policy Comparison Matrix

AWS provides two primary mechanisms for controlling access to S3: bucket policies and IAM policies. Choosing between them (or using both together) is a frequent source of confusion. This matrix clarifies the differences so you can make the right choice for each access scenario.

FeatureBucket PolicyIAM Policy
ScopeAttached to a specific S3 bucketAttached to a user, group, or role (IAM principal)
Who can be granted accessAny AWS account, IAM role, or anonymous (Principal: "*")Only the principal the policy is attached to
Cross-account accessYes — specify another account's IAM role in PrincipalNo — must use cross-account roles or bucket policies
Anonymous accessYes — set Principal: "*"No — IAM policies never apply to unauthenticated requests
GranularityBucket-wide or prefix-level (Resource with /*)Can specify individual object ARNs (arn:aws:s3:::bucket/key)
Default effectDeny (no policy means no access)Deny (no policy means no access)
Policy evaluation orderEvaluated first; explicit Deny always winsEvaluated after bucket policy; Deny still wins
Use case exampleAllow anonymous read for static websiteAllow a specific developer to read all buckets in an account
ManagementOne policy per bucket, manageable via S3 consoleCentralized via IAM console or AWS Organizations SCPs
Maximum size20 KB6,144 characters (managed policies) / 10 KB (inline)

When to use which: - Bucket policy when you need to grant access to non-AWS identities (like anonymous users) or manage permissions for a shared bucket used by multiple accounts. - IAM policy when you want to follow least-privilege and centrally manage permissions for your own users and roles. - Both in production: use IAM for your application roles and add bucket policies only for specific overrides (e.g., deny all public access as a safety net).

A common anti-pattern is using a permissive bucket policy to allow your own IAM role, then tightening via IAM. Instead, let IAM do the fine-grained control and keep bucket policies simpler.

Production Insight
IAM policies support conditions like aws:SourceIp or aws:MultiFactorAuthPresent that bucket policies do not. If you need MFA enforcement for S3 operations, attach an IAM policy with a condition — bucket policies cannot require MFA directly.
Key Takeaway
Bucket policies handle anonymous/cross-account access; IAM policies control your own team. Combine both for defense in depth, but never nest overly complex logic in bucket policies.

S3 Storage Classes — Cost vs Durability Comparison Table

S3 offers multiple storage classes designed to balance cost, retrieval latency, and durability across your data lifecycle. The table below summarises each class along with typical use cases and cost trade-offs.

Storage ClassBest ForRetrieval LatencyApprox. Cost/GB/Month (us-east-1)Retrieval FeeDurabilityMinimum Object SizeMinimum Storage Duration
S3 StandardFrequently accessed, active dataMilliseconds$0.023None99.999999999%NoneNone
S3 Intelligent-TieringUnknown or changing access patternsMilliseconds$0.023 (frequent tier) + monitoring feeNone99.999999999%None30 days in frequent tier
S3 Standard-IAInfrequently accessed but needs quick accessMilliseconds$0.0125$0.01/GB99.999999999%128KB30 days
S3 One Zone-IARe-creatable, infrequently accessed dataMilliseconds$0.01$0.01/GB99.999999999% (single AZ)128KB30 days
S3 Glacier Instant RetrievalArchive data accessed quarterlyMilliseconds$0.004$0.03/GB99.999999999%128KB90 days
S3 Glacier Flexible RetrievalArchive data accessed yearlyMinutes to hours$0.0036$0.03/GB (expedited)99.999999999%40KB90 days
S3 Glacier Deep ArchiveLong-term compliance, accessed rarely12 hours$0.00099$0.02/GB99.999999999%40KB180 days
S3 on OutpostsOn-premises workloads requiring local S3Milliseconds (local)VariesNoneDepends on hardwareNoneNone

Key cost factors: - Standard costs the most per GB but has no retrieval fees or minimums. - Standard-IA cuts storage cost by ~45% but adds a per-GB retrieval fee. Ideal for backups you access a few times a year. - Glacier Deep Archive is 96% cheaper than Standard but retrieval takes up to 12 hours — perfect for regulatory retention. - Minimum object sizes and minimum storage durations apply to Infrequent Access and Glacier tiers; storing many small objects or deleting early incurs additional charges.

Use lifecycle policies to automatically transition objects between these classes as they age, maximising cost savings without manual intervention.

Intelligent-Tiering: Set and Forget for Unpredictable Access
If your data has spikey access patterns — some months heavily read, others idle — S3 Intelligent-Tiering automatically moves objects between frequent and infrequent tiers based on usage. You pay a small monthly monitoring fee per object (typically $0.0025/1000 objects) but avoid manual tier selection. No retrieval charges, no minimum durations.
Production Insight
Always check minimum storage durations: moving objects out of Standard-IA before 30 days incurs a pro-rated charge. For high-turnover data (logs kept 7 days), stay in Standard — the early-delete penalty wipes out any savings.
Key Takeaway
Match storage class to access frequency: Standard for hot data, Standard-IA for warm, Glacier Deep Archive for cold. Use lifecycle rules to automate transitions and avoid manual cost leak.

Lifecycle Transition Visual — Standard → IA → Glacier

Automated lifecycle transitions are the backbone of S3 cost optimisation. Instead of manually moving files between storage classes, define rules that trigger based on object age. The diagram below visualises a typical production lifecycle path.

How it works: 1. Objects are uploaded into S3 Standard (fastest, most expensive). 2. After 30 days, they automatically transition to S3 Standard-IA (45% cheaper, same latency). 3. After 180 days, they move to S3 Glacier Instant Retrieval (68% cheaper, same latency). 4. After 365 days, they move to S3 Glacier Deep Archive (96% cheaper, 12-hour retrieval). 5. After 7 years (2555 days), objects are permanently deleted (if desired).

Each transition is invisible to your application — the S3 API continues to work identically regardless of storage class. Only the billing changes.

Important: Transitions are one-way. You cannot automatically move objects from Glacier back to Standard without manual restoration (which incurs retrieval fees). Plan your tiers carefully: once data is archived, treat it as read-only.

s3_lifecycle_transition.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
import boto3
import json

s3 = boto3.client('s3', region_name='us-east-1')
bucket = 'acme-prod-uploads'

lifecycle = {
    'Rules': [
        {
            'ID': 'StandardToGlacier',
            'Status': 'Enabled',
            'Filter': {'Prefix': ''},
            'Transitions': [
                {'Days': 30, 'StorageClass': 'STANDARD_IA'},
                {'Days': 180, 'StorageClass': 'GLACIER_IR'},
                {'Days': 365, 'StorageClass': 'DEEP_ARCHIVE'}
            ],
            'Expiration': {'Days': 2555}
        },
        {
            'ID': 'CleanupMultipart',
            'Status': 'Enabled',
            'Filter': {'Prefix': ''},
            'AbortIncompleteMultipartUpload': {'DaysAfterInitiation': 7}
        }
    ]
}

s3.put_bucket_lifecycle_configuration(
    Bucket=bucket,
    LifecycleConfiguration=lifecycle
)
print('Lifecycle policy applied successfully.')
Output
Lifecycle policy applied successfully.
Production Insight
Lifecycle transitions are evaluated once per day. If you need immediate cost savings (e.g., after a data migration), you can manually change the storage class on existing objects using S3 Batch Operations — but automated transitions are sufficient for ongoing lifecycle management.
Key Takeaway
A single lifecycle policy with 4 transition rules can reduce your S3 bill by over 95% for archival data. Always include an AbortIncompleteMultipartUpload rule to prevent silent cost leaks.
S3 Lifecycle Transition Flow
30 days180 days365 days2555 daysUpload - S3 StandardS3 Standard-IAS3 Glacier Instant RetrievalS3 Glacier Deep ArchiveDelete

Storage Classes and Lifecycle Policies — Cutting Your S3 Bill in Half

Not all data is accessed equally often. Your app might read a user's profile picture dozens of times a day, but that invoice from January 2021? Probably never again unless there's an audit. S3 gives you storage classes — different pricing tiers based on how frequently and quickly you need to access data.

S3 Standard is the default and the most expensive per GB. It's designed for data you access regularly with millisecond latency. S3 Standard-IA (Infrequent Access) costs about 45% less per GB but charges a per-retrieval fee, making it ideal for backups and older content you occasionally need. S3 Glacier Instant Retrieval drops the cost further for archival data you access maybe once a quarter. S3 Glacier Deep Archive is the cheapest tier — pennies per GB per month — for data you might need once a year and can wait up to 12 hours to retrieve.

The power move is combining storage classes with lifecycle policies: automated rules that transition objects to cheaper tiers (or delete them entirely) based on their age. You configure this once, and S3 handles the cost optimisation forever. A common real-world pattern: keep user uploads in Standard for 30 days, move to Standard-IA for 6 months, then Glacier Deep Archive indefinitely — with zero manual work after initial setup.

s3_lifecycle_policy.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
import boto3
import json

s3_client = boto3.client('s3', region_name='us-east-1')

BUCKET_NAME = 'acme-corp-user-uploads-dev'

# ── Apply a lifecycle policy that automatically manages storage costs ──
# This policy covers objects under the 'invoices/' prefix only.
# Other prefixes (avatars/, reports/) are not affected.
lifecycle_configuration = {
    "Rules": [
        {
            "ID": "InvoiceArchivalPolicy",
            "Status": "Enabled",  # 'Disabled' to pause without deleting the rule
            "Filter": {
                # Only apply this rule to objects whose key starts with 'invoices/'
                "Prefix": "invoices/"
            },
            "Transitions": [
                {
                    # After 30 days, move to Standard-IA
                    # ~45% cheaper per GB, small per-retrieval fee applies
                    "Days": 30,
                    "StorageClass": "STANDARD_IA"
                },
                {
                    # After 180 days, move to Glacier Instant Retrieval
                    # ~68% cheaper than Standard, millisecond retrieval still available
                    "Days": 180,
                    "StorageClass": "GLACIER_IR"
                },
                {
                    # After 365 days, move to Glacier Deep Archive
                    # Cheapest tier. Retrieval takes up to 12 hours.
                    # Perfect for compliance-required long-term retention
                    "Days": 365,
                    "StorageClass": "DEEP_ARCHIVE"
                }
            ],
            # Permanently delete objects after 7 years (2555 days)
            # Adjust to match your legal/compliance retention requirements
            "Expiration": {
                "Days": 2555
            }
        },
        {
            # Second rule: automatically clean up incomplete multipart uploads
            # Without this, partially uploaded large files silently accumulate costs
            "ID": "CleanupIncompleteMultipartUploads",
            "Status": "Enabled",
            "Filter": {"Prefix": ""},  # Apply to the entire bucket
            "AbortIncompleteMultipartUpload": {
                "DaysAfterInitiation": 7  # Abort uploads not completed within 7 days
            }
        }
    ]
}

response = s3_client.put_bucket_lifecycle_configuration(
    Bucket=BUCKET_NAME,
    LifecycleConfiguration=lifecycle_configuration
)

print(f"Lifecycle policy applied. HTTP status: {response['ResponseMetadata']['HTTPStatusCode']}")

# ── Verify the policy was saved correctly ──
saved_policy = s3_client.get_bucket_lifecycle_configuration(Bucket=BUCKET_NAME)
for rule in saved_policy['Rules']:
    print(f"\nRule ID: {rule['ID']} | Status: {rule['Status']}")
    if 'Transitions' in rule:
        for transition in rule['Transitions']:
            print(f"  → After {transition['Days']} days: {transition['StorageClass']}")
    if 'Expiration' in rule:
        print(f"  → Delete after: {rule['Expiration']['Days']} days")
Output
Lifecycle policy applied. HTTP status: 200
Rule ID: InvoiceArchivalPolicy | Status: Enabled
→ After 30 days: STANDARD_IA
→ After 180 days: GLACIER_IR
→ After 365 days: DEEP_ARCHIVE
→ Delete after: 2555 days
Rule ID: CleanupIncompleteMultipartUploads | Status: Enabled
Interview Gold: The Hidden Multipart Upload Cost Leak
Interviewers love asking about unexpected S3 costs. The answer they want: incomplete multipart uploads. When a large file upload fails halfway, S3 stores the uploaded parts and charges you for them — indefinitely, silently, without showing them in a normal bucket listing. The fix is exactly what's in the code above: an AbortIncompleteMultipartUpload lifecycle rule. Every production bucket should have one.
Production Insight
Lifecycle policies have a minimum age of 30 days for Standard → IA and 90 days for IA → Glacier.
If you try to set a shorter transition, the API returns an error.
Rule: always check the minimum transition days before writing your policy.
Key Takeaway
Storage class transitions via lifecycle policies are a one-time setup that permanently reduces costs.
Standard → Standard-IA → Glacier Deep Archive can cut your S3 bill by over 95% for archival data.
Add an AbortIncompleteMultipartUpload rule to every bucket.

Versioning and Object Lock — Protect Against Accidental Deletion and Compliance

Without versioning, every PUT overwrites the object and every DELETE makes it gone forever. That's fine for temp files, but for user content, production configs, or audit logs, you need versioning.

When versioning is enabled, every object operation creates a new version ID. A DELETE doesn't remove the object — it just adds a delete marker. You can restore any previous version instantly. Versioning also integrates with lifecycle policies to automatically expire old versions and reduce storage costs.

S3 Object Lock takes protection further by making objects write-once-read-many (WORM). You can lock objects for a retention period (days or years) or use legal holds. This is critical for compliance with SEC, FINRA, or GDPR retention rules. Even the root user of the AWS account cannot delete a locked object before the retention period expires.

Versioning and Object Lock together give you an immutable data layer. Enable versioning on every production bucket from day one. Object Lock must be enabled when the bucket is created — you cannot add it later.

s3_versioning_and_lock.shBASH
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
#!/bin/bash
# ── Enable versioning on an existing bucket ──
aws s3api put-bucket-versioning \
  --bucket acme-corp-user-uploads-prod \
  --versioning-configuration Status=Enabled

# ── List all versions of an object (including deleted) ──
aws s3api list-object-versions \
  --bucket acme-corp-user-uploads-prod \
  --prefix invoices/jan-2024/report.pdf

# ── Restore a deleted object by copying from a previous version ──
aws s3api copy-object \
  --bucket acme-corp-user-uploads-prod \
  --copy-source acme-corp-user-uploads-prod/invoices/jan-2024/report.pdf?versionId=abc123 \
  --key invoices/jan-2024/report.pdf

# ── Enable Object Lock at bucket creation (required during creation) ──
aws s3api create-bucket \
  --bucket acme-corp-compliant-logs \
  --region us-east-1 \
  --object-lock-enabled-for-bucket

# ── Set a default retention period on the bucket (7 days for compliance) ──
aws s3api put-object-lock-configuration \
  --bucket acme-corp-compliant-logs \
  --object-lock-configuration '{
    "ObjectLockEnabled": "Enabled",
    "Rule": {
      "DefaultRetention": {
        "Mode": "GOVERNANCE",
        "Days": 7
      }
    }
  }'
Output
# Enable versioning (no output on success, check with get-bucket-versioning)
# List object versions
{
"Versions": [
{
"Key": "invoices/jan-2024/report.pdf",
"VersionId": "abc123",
"IsLatest": false,
"LastModified": "2024-01-15T10:00:00Z",
"Size": 12345
},
{
"Key": "invoices/jan-2024/report.pdf",
"VersionId": "xyz789",
"IsLatest": true,
"LastModified": "2024-06-12T14:00:00Z",
"Size": 12345
}
],
"DeleteMarkers": []
}
# Restore object (success returns metadata)
# Object lock configuration set
Object Lock Must Be Enabled at Bucket Creation
You cannot enable Object Lock on an existing bucket. If you think you might need WORM compliance later, create the bucket with --object-lock-enabled-for-bucket even if you don't configure retention immediately. That keeps the door open. Without it, you're forced to migrate data to a new bucket.
Production Insight
Versioning multiplies your storage costs by the number of versions kept.
A file updated 100 times stores 100 copies.
Rule: combine versioning with a lifecycle policy to expire old versions after 30-90 days.
Key Takeaway
Enable versioning on every production bucket.
It's a one-line CLI command that gives you undo for S3.
Object Lock must be decided at bucket creation — plan ahead for compliance needs.

Performance Patterns — Multipart Uploads, Transfer Acceleration, and Cross-Region Replication

S3 is fast, but you can make it faster — and more expensive if you're not careful.

Multipart Upload: For files larger than 100MB (recommended), use multipart upload. It splits the file into parts (min 5MB each) and uploads them in parallel. If a part fails, only that part is retried, not the entire file. The boto3 upload_file method handles this automatically above 8MB. CLI aws s3 cp does the same.

S3 Transfer Acceleration: Uses AWS edge locations to route uploads over the AWS backbone network instead of the public internet. This can cut upload times by 50-80% for users far from the bucket region. It costs extra per GB uploaded. Enable it on the bucket and use the accelerated endpoint.

Cross-Region Replication (CRR): Automatically replicates objects to a bucket in another region. Use for disaster recovery, lower latency for global users, or compliance. Replication is asynchronous — expect a few seconds to a few hours of lag. You need versioning enabled on both source and destination buckets.

Key trade-off: Transfer Acceleration and CRR cost money. Don't enable them unless you have a measurable need. For most apps, a single bucket with CloudFront is cheaper and faster.

s3_performance_patterns.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
import boto3
import os

s3_client = boto3.client('s3', region_name='us-east-1')
BUCKET = 'acme-corp-large-uploads-prod'

# ── Multipart Upload using boto3's high-level upload_file ──
# Automatically uses multipart for files >8MB
s3_client.upload_file(
    Filename='/tmp/large_backup.tar.gz',
    Bucket=BUCKET,
    Key='backups/june-2026.tar.gz'
)
print("Upload completed (multipart used automatically)")

# ── Transfer Acceleration example ──
# First, enable acceleration on the bucket (CLI):
# aws s3api put-bucket-accelerate-configuration --bucket <bucket> --accelerate-configuration Status=Enabled

# Then upload using the accelerated endpoint
s3_accelerated = boto3.client(
    's3',
    region_name='us-east-1',
    config=boto3.session.Config(
        s3={'use_accelerate_endpoint': True}
    )
)
# Subsequent uploads go through the accelerated endpoint automatically
s3_accelerated.upload_file(
    '/tmp/large_file.mp4',
    BUCKET,
    'videos/promo.mp4'
)
print("Uploaded via Transfer Acceleration")

# ── List ongoing multipart uploads (to find stuck ones) ──
response = s3_client.list_multipart_uploads(Bucket=BUCKET)
if 'Uploads' in response:
    for upload in response['Uploads']:
        print(f"Stuck upload: {upload['Key']} initiated {upload['Initiated']}")
else:
    print("No incomplete multipart uploads found")
Output
Upload completed (multipart used automatically)
Uploaded via Transfer Acceleration
No incomplete multipart uploads found
When to Avoid Transfer Acceleration
If your client and bucket are in the same region, Transfer Acceleration adds latency and cost. It also doesn't help for downloads — only uploads. Benchmark with aws s3 cp with and without --endpoint-url before enabling.
Production Insight
Cross-Region Replication costs storage in both regions + replication PUT requests.
If you're replicating for DR, consider using S3 Batch Replication first to seed, then CRR for ongoing.
Rule: monitor replication lag with S3 metrics — lag of over an hour indicates a bottleneck.
Key Takeaway
Multipart uploads are essential for files >100MB — they're automatic in the SDK.
Transfer Acceleration helps global uploads but costs extra; benchmark first.
CRR is async and costs double storage; only use when required.

Use an S3 Bucket Like a Senior — Console, CLI, or Scripts

You don't "use" S3 via the console if you manage more than 3 buckets. The console is for debugging, not operations. Real teams script everything.

The AWS CLI is your hammer. aws s3 cp, aws s3 sync, aws s3 rm. That's 90% of your daily interactions. Sync is particularly deadly — it only transfers changed files, saving bandwidth and time. But watch out: sync deletes files in the destination that aren't in the source unless you pass --delete explicitly. That flag is how you accidentally nuke a production bucket.

For programmatic access, use the AWS SDK (boto3 in Python, aws-sdk in JS). Always use IAM roles, never hardcode keys. Set ServerSideEncryption and BucketKeyEnabled in every PutObject call. Your future self will thank you when the auditor asks why your data isn't encrypted at rest.

The console is fine for one-off uploads or checking bucket properties. But if you click "Upload" more than once a week, you're doing it wrong.

S3SyncProduction.ymlYAML
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
// io.thecodeforge — devops tutorial

// Sync local logs to S3 with encryption and size limit
aws s3 sync /var/log/app/ s3://prod-logs-bucket/
    --region us-east-1
    --sse AES256
    --exclude "*.tmp"
    --include "*.log"
    --no-follow-symlinks

// Output:
// upload: /var/log/app/2025-04-01.log to s3://prod-logs-bucket/2025-04-01.log
// upload: /var/log/app/2025-04-02.log to s3://prod-logs-bucket/2025-04-02.log
// Total uploaded: 2 files
// Warning: use --delete carefully — it removes destination files not in source
Output
upload: /var/log/app/2025-04-01.log to s3://prod-logs-bucket/2025-04-01.log
upload: /var/log/app/2025-04-02.log to s3://prod-logs-bucket/2025-04-02.log
Total uploaded: 2 files
Production Trap:
Running aws s3 sync with --delete in the wrong direction will delete all files in the destination bucket. Use --dryrun first. Always.
Key Takeaway
Automate S3 operations with CLI or SDK. Console is for debugging, not daily work.

S3 Data Consistency — The Read-After-Write Promise You Can Actually Rely On

S3 gives you read-after-write consistency for PUTS of new objects. You upload, then immediately read — you get the data. This is not eventual consistency. It's immediate. For overwrite PUTS and DELETES, it's also strongly consistent. That means if you delete an object and then list the bucket, the object is gone.

But here's the edge case that bites people: S3 is eventually consistent for listings in certain scenarios with bucket operations. If you create a bucket and immediately query it, you might not see it. That's a 1-second window, but in production automation, that's enough to fail a deploy script.

Versioning changes the game. With versioning enabled, every overwrite creates a new version. The old version is preserved. Deleting an object adds a delete marker — the object is still there. This makes consistency a non-issue for rollbacks. If a bad deploy overwrites production assets, you just delete the delete marker.

For cross-region replication (CRR), consistency is eventually consistent. Changes replicate asynchronously. Never design a system that depends on CRR being instant.

S3ConsistencyCheck.ymlYAML
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
// io.thecodeforge — devops tutorial

// Verify read-after-write consistency in script
aws s3api put-object --bucket prod-assets --key config.json --body config.json

aws s3api head-object --bucket prod-assets --key config.json

// Output:
// {
//     "LastModified": "2025-04-02T14:30:00Z",
//     "ContentLength": 2048,
//     "ETag": "\"abc123...\"",
//     "VersionId": "v1"
// }

// Note: If versioning is off, overwrite is immediate. If on, you get a new version.
Output
{
"LastModified": "2025-04-02T14:30:00Z",
"ContentLength": 2048,
"ETag": "\"abc123...\"",
"VersionId": "v1"
}
Senior Shortcut:
For production critical data, always enable versioning. It turns consistency from a theoretical guarantee into a practical safety net.
Key Takeaway
S3 is strongly consistent for most operations. Versioning makes rollbacks trivial — enable it on every bucket that matters.

Computing in AWS: Beyond Static Hosting

S3 is object storage, but it doesn't compute. In production, you pair S3 with AWS compute services to process uploads, serve dynamic content, or run batch jobs. EC2 gives you full control: launch a virtual machine, install web servers, and pull data from S3 via SDK. For stateless tasks, use AWS Fargate—containerized computing without managing servers. ECS and EKS orchestrate clusters. The key insight: S3 triggers Lambda functions on object creation (e.g., resizing images). Design for decoupling—S3 pushes events to SQS or EventBridge, and compute services consume them. This pattern scales to zero cost when idle. Always set IAM roles on compute instances to access S3, never hard-code keys. Compute choice impacts latency: co-locate compute in the same region as your S3 bucket.

s3-compute-pattern.ymlYAML
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
// io.thecodeforge — devops tutorial
// Event-driven compute with S3 + Lambda
AWSTemplateFormatVersion: '2010-09-09'
Resources:
  Bucket:
    Type: AWS::S3::Bucket
    Properties:
      NotificationConfiguration:
        LambdaConfigurations:
          - Event: s3:ObjectCreated:*
            Function: !GetAtt Resizer.Arn
  Resizer:
    Type: AWS::Lambda::Function
    Properties:
      Handler: index.handler
      Runtime: nodejs20.x
      Code:
        ZipFile: |
          exports.handler = async (event) => {
            console.log('Processing', event.Records[0].s3.object.key);
          };
      Role: !GetAtt LambdaRole.Arn
  LambdaRole:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Version: '2012-10-17'
        Statement:
          - Effect: Allow
            Principal:
              Service: lambda.amazonaws.com
            Action: sts:AssumeRole
Output
exact
Production Trap:
Never assign an IAM user key to an EC2 instance. Use Instance Profiles with least-privilege S3 policies. Rotate keys automatically.
Key Takeaway
Pair S3 with Lambda or Fargate for event-driven processing; decouple storage from compute for scalability.

AWS Elastic Beanstalk: Managed Application Deployments

Elastic Beanstalk abstracts infrastructure so you focus on code. Upload a ZIP or connect to CodePipeline, and Beanstalk provisions EC2, load balancers, auto-scaling groups, and S3 bucket for logs. Choose platform: Python, Node.js, Go, Docker, or Java. It integrates with RDS for databases and CloudWatch for monitoring. Senior engineers use .ebextensions directory for custom config (e.g., environment variables, security group rules). Critical: Beanstalk creates an S3 bucket in your account to store deployment artifacts—never delete it manually. For blue-green deployments, swap environment URLs. Monitoring tip: enable enhanced health reporting for real-time CPU, memory, and latency. Beanstalk is not for every workload—if you need fine-grained networking (e.g., VPC peering), use CDK or Terraform instead. Pricing is EC2 + resources; no extra Beanstalk fee.

beanstalk-config.ymlYAML
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
// io.thecodeforge — devops tutorial
// env.yaml for Elastic Beanstalk
option_settings:
  - namespace: aws:elasticbeanstalk:environment
    option_name: EnvironmentType
    value: LoadBalanced
  - namespace: aws:autoscaling:launchconfiguration
    option_name: InstanceType
    value: t3.small
  - namespace: aws:elasticbeanstalk:application:environment
    option_name: NODE_ENV
    value: production
  - namespace: aws:elasticbeanstalk:s3:log
    option_name: LogFileUrls
    value: true
Output
exact
Production Trap:
Never store database credentials in Beanstalk environment variables. Use AWS Secrets Manager and retrieve at runtime via SDK.
Key Takeaway
Elastic Beanstalk auto-manages infrastructure from your code; use .ebextensions for production-grade tuning.
● Production incidentPOST-MORTEMseverity: high

Public Bucket Exposed 50,000 Customer Records

Symptom
Bucket objects appeared in Google search results. Legal notified the team that customer data (names, addresses, SSNs) was publicly accessible.
Assumption
Making the bucket public was the quickest way to share a file with a client. The engineer assumed no one would discover the bucket URL.
Root cause
The bucket policy granted s3:GetObject to Principal * without any prefix restriction. Every object in the bucket became world-readable.
Fix
1. Immediately update the bucket policy to deny all public access. 2. Enable Block Public Access at the account level. 3. Use presigned URLs for all future file sharing. 4. Rotate any exposed credentials found in the objects.
Key lesson
  • Never make a bucket public. Always use presigned URLs for per-file sharing.
  • Enable S3 Block Public Access at the account level — it's a safety net that prevents accidental public exposure.
  • If you must serve public static assets, use CloudFront with an Origin Access Control (OAC) to keep the bucket itself private.
Production debug guideCommon symptoms and immediate actions for S3 issues in production4 entries
Symptom · 01
AccessDenied when trying to read an object
Fix
Check bucket policy, IAM role permissions, and whether Block Public Access is enabled. Use aws s3api get-object-acl --bucket <name> --key <key> to verify object ownership.
Symptom · 02
Slow uploads or downloads (>500ms latency)
Fix
Check client region vs bucket region. Use S3 Transfer Acceleration for large objects across continents. For uploads >100MB, switch to multipart upload.
Symptom · 03
Bucket creation fails with 'BucketAlreadyExists'
Fix
Bucket names are globally unique. Try a more specific name (e.g., acme-prod-eu-logs). You cannot delete and recreate the same name quickly — DNS propagation delays cause this error.
Symptom · 04
Unexpected high S3 bill
Fix
Check for incomplete multipart uploads using lifecycle rules. Review storage class transitions — objects may be stuck in Standard. Use S3 Inventory for granular cost analysis.
★ S3 Quick Debug Cheat SheetFive common S3 production issues and the exact commands to diagnose and fix them
Can't access object — AccessDenied even with correct credentials
Immediate action
Run `aws s3api get-bucket-policy --bucket <bucket>` to check if a deny policy exists
Commands
aws s3api get-bucket-policy-status --bucket <bucket> --query 'PolicyStatus.IsPublic'
aws s3api get-object-acl --bucket <bucket> --key <key>
Fix now
If the bucket policy has an explicit Deny, update it. If the object ACL is private but bucket policy allows public, the bucket policy wins — fix the policy.
Upload speeds are terrible for large files+
Immediate action
Check file size; if >100MB, abort current upload and use multipart upload
Commands
aws s3api list-multipart-uploads --bucket <bucket> (check for hanging uploads)
aws configure set s3.max_concurrent_requests 20
Fix now
Use aws s3 cp with --cli-read-timeout 0 and --cli-connect-timeout 0 for large files, or use S3 Transfer Acceleration by enabling it on the bucket and appending --endpoint-url https://<bucket>.s3-accelerate.amazonaws.com.
Object deleted accidentally — need recovery+
Immediate action
Check if versioning is enabled on the bucket
Commands
aws s3api get-bucket-versioning --bucket <bucket>
aws s3api list-object-versions --bucket <bucket> --prefix <key>
Fix now
If versioning is enabled, restore the object by copying from a previous version: aws s3api copy-object --bucket <bucket> --copy-source <bucket>/<key>?versionId=<versionId> --key <key>
Bucket policy not taking effect+
Immediate action
Verify the policy syntax and that it's attached to the correct bucket
Commands
aws s3api get-bucket-policy --bucket <bucket>
aws iam simulate-custom-policy --policy-input-list file://policy.json --action-names s3:GetObject --resource-arns arn:aws:s3:::<bucket>/*
Fix now
Use the AWS Policy Simulator to test. Common mistake: missing /* at the end of the Resource ARN. If using Principal with an IAM role, ensure the role ARN is correct.
S3 website endpoint returns 403 Forbidden+
Immediate action
Verify the bucket policy allows public read and that Static Website Hosting is enabled
Commands
aws s3api get-bucket-website --bucket <bucket>
aws s3api get-bucket-policy --bucket <bucket>
Fix now
Ensure bucket policy has {"Effect":"Allow","Principal":"","Action":"s3:GetObject","Resource":"arn:aws:s3:::<bucket>/"}. Also check that the bucket is not blocking public access via Block Public Access settings.
S3 Storage Class Comparison
Storage ClassBest ForRetrieval LatencyApprox. Cost / GB / MonthPer-Retrieval Fee
S3 StandardActive user data, frequently accessed filesMilliseconds$0.023None
S3 Standard-IABackups, older content, DR copiesMilliseconds$0.0125Yes (~$0.01/GB)
S3 Glacier Instant RetrievalQuarterly-access archivesMilliseconds$0.004Yes (~$0.03/GB)
S3 Glacier Flexible RetrievalAnnual-access compliance dataMinutes to hours$0.0036Yes
S3 Glacier Deep ArchiveLong-term legal/compliance retentionUp to 12 hours$0.00099Yes (~$0.02/GB)

Key takeaways

1
S3 object keys are flat strings, not real file paths
'folder' prefixes are a UI convenience, not a filesystem hierarchy. This changes how you design efficient list and filter operations at scale.
2
Presigned URLs are the production-grade pattern for user file access
they keep objects private while granting time-limited, per-object access without exposing IAM credentials to the client.
3
Every bucket that receives large file uploads needs an AbortIncompleteMultipartUpload lifecycle rule
without it, failed partial uploads accumulate silently and cost you real money.
4
Storage class transitions via lifecycle policies are a one-time setup that permanently reduces costs
Standard → Standard-IA → Glacier Deep Archive can cut your S3 bill by over 95% for archival data.
5
Enable versioning on every production bucket
it's a single CLI command that gives you undo for S3 and is a prerequisite for Object Lock and replication.

Common mistakes to avoid

4 patterns
×

Making the entire bucket public to share one file

Symptom
Anyone can browse and download all your data — search engines index it.
Fix
Use presigned URLs for any per-file sharing. If you genuinely need a public static hosting bucket, lock it down to a specific key prefix using a bucket policy Resource like 'arn:aws:s3:::your-bucket/public/*' and never put private data under that prefix.
×

Creating buckets in us-east-1 for users in Sydney

Symptom
Upload and download speeds are painfully slow for non-US users; latency adds 200-300ms to every S3 operation.
Fix
Always create buckets in the AWS region geographically closest to your users or your EC2/Lambda compute. For a global audience, put CloudFront in front of a single S3 bucket — it caches content at edge locations worldwide.
×

Ignoring S3 versioning and then accidentally deleting production files

Symptom
A one-line script bug runs aws s3 rm on the wrong prefix and irreplaceable data is gone.
Fix
Enable versioning on any bucket holding user-generated content or production assets with aws s3api put-bucket-versioning --bucket your-bucket --versioning-configuration Status=Enabled. With versioning on, 'deleted' objects are just marked with a delete marker and can be restored instantly. Combine with MFA Delete for critical buckets.
×

Not setting up an AbortIncompleteMultipartUpload lifecycle rule

Symptom
Failing large uploads leave partial parts in S3, silently accumulating storage costs — not visible in normal bucket listings.
Fix
Add a lifecycle rule with AbortIncompleteMultipartUpload set to 7 days. This automatically cleans up any partial upload that wasn't completed within a week.
INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01SENIOR
S3 is often described as 'eventually consistent' — can you explain what ...
Q02SENIOR
Walk me through how you'd architect a file upload feature for a web app ...
Q03SENIOR
What's the difference between an S3 bucket policy and an IAM policy, and...
Q01 of 03SENIOR

S3 is often described as 'eventually consistent' — can you explain what that means and whether it's still true today, and how it would affect an application that writes and immediately reads the same object?

ANSWER
Before 2020, S3 provided read-after-write consistency for PUTs of new objects, but eventual consistency for overwrites and deletes. In December 2020, AWS announced strong read-after-write consistency for all S3 GET, LIST, and PUT operations in all regions. That means if you write an object and immediately read it, you'll see the latest version. However, LIST operations after a DELETE or overwrite may still be eventually consistent for a short period. The practical impact: your application no longer needs to implement retry logic for immediate read-after-write, but you should still be cautious with LIST consistency after modifications.
FAQ · 5 QUESTIONS

Frequently Asked Questions

01
What is the maximum file size you can upload to S3?
02
Does S3 guarantee that my data won't be lost?
03
What's the difference between S3 and EBS — when do you use each?
04
Can I enable Object Lock on an existing bucket?
05
How do I reduce S3 costs without losing data?
N
Naren Founder & Principal Engineer

20+ years shipping production infrastructure and CI/CD at scale. Written from production experience, not tutorials.

Follow
Verified
production tested
May 23, 2026
last updated
1,554
articles · all by Naren
🔥

That's Cloud. Mark it forged?

14 min read · try the examples if you haven't

Previous
AWS EC2 Basics
4 / 23 · Cloud
Next
AWS Lambda and Serverless