Intermediate 11 min · March 06, 2026

AWS S3 Basics

S3 Principal * Bucket Policy — Why 50K SSNs Hit Google

Q: What is the maximum file size you can upload to S3?

A single S3 object can be up to 5TB in size. However, the maximum size for a single PUT upload is 5GB. For anything larger — or for better reliability with large files — you should use the multipart upload API, which splits the file into parts (minimum 5MB each) and uploads them in parallel. Boto3's `upload_file` method handles this automatically when the file exceeds 8MB.

Q: Does S3 guarantee that my data won't be lost?

S3 Standard is designed for 99.999999999% durability (11 nines) by automatically storing your data redundantly across a minimum of three Availability Zones within a region. In practical terms, if you stored 10 million objects, you'd expect to lose one object every 10,000 years. That said, durability doesn't protect against accidental deletion — for that, enable versioning and optionally MFA Delete on critical buckets.

Q: What's the difference between S3 and EBS — when do you use each?

EBS (Elastic Block Store) is a network-attached block device — it works like a hard drive mounted to a single EC2 instance, with low latency and read/write for databases or OS volumes. S3 is an object store accessed via HTTP API — it's not mountable as a drive (without third-party tools), but it's infinitely scalable, globally accessible, and far cheaper per GB. Use EBS for your EC2 instance storage, databases, and OS. Use S3 for files, media, backups, static assets, and any data that needs to be accessed by more than one service.

Q: Can I enable Object Lock on an existing bucket?

No. Object Lock must be enabled when the bucket is created using the `--object-lock-enabled-for-bucket` flag. You cannot add it later. If you need WORM compliance on an existing bucket, you must migrate the data to a new bucket with Object Lock enabled. Plan ahead: if there's any chance you'll need Object Lock, enable it at creation time even if you don't configure retention immediately.

Q: How do I reduce S3 costs without losing data?

1. Use lifecycle policies to automatically transition objects to cheaper storage classes (Standard → IA → Glacier) based on age. 2. Add an AbortIncompleteMultipartUpload rule to clean up failed uploads. 3. Enable S3 Intelligent-Tiering for unpredictable access patterns. 4. Use S3 Inventory to identify and delete unneeded objects. 5. For infrequently accessed data, consider Standard-IA instead of Standard.

One Principal * bucket policy made 50K customer SSNs searchable on Google.

Naren Founder & Principal Engineer

20+ years shipping production infrastructure and CI/CD at scale. Written from production experience, not tutorials.

✓ Production

production tested

July 18, 2026

last updated

2,466

articles · all by Naren

Before you start⏱ 25 min

✓Solid grasp of DevOps fundamentals
✓Comfortable with command-line tools
✓Basic Linux administration knowledge

● Production Incident 🔎 Debug Guide ⚙ Triage Commands

⚡Quick Answer

S3 stores data as objects in globally unique buckets
Buckets are region-scoped; objects have keys (flat namespace) - not folders
Access control: bucket policies (persistent) vs presigned URLs (time-limited)
Lifecycle policies auto-tier objects (Standard → IA → Glacier) to cut costs by up to 95%
Multipart uploads required for files >5GB; incomplete uploads silently cost money
Enable versioning on every production bucket to prevent accidental data loss

✦ Definition~90s read

What is AWS S3 Basics?

Amazon S3 (Simple Storage Service) is object storage at planetary scale — you store files as objects in buckets, each object gets a URL, and you pay for what you use. The core problem S3 solves is durable, infinitely scalable storage without managing servers.

★

Imagine a massive warehouse with unlimited shelf space.

But the real trap is access control: a single misconfigured bucket policy with a Principal: "*" statement makes every object in that bucket publicly readable. That's how 50K Social Security Numbers end up indexed by Google — no authentication required, just a GET request to the object URL.

S3 doesn't warn you; it faithfully executes the policy you wrote.

S3's structure is deceptively simple: buckets are flat containers (no folders, just prefix-based key naming), objects are the files themselves (up to 5TB each), and every object has a unique key like reports/2024/ssn-export.csv. Access control happens at two levels: bucket policies (resource-based, attached to the bucket itself) and IAM policies (identity-based, attached to users/roles).

Bucket policies with Principal: "*" are the number one cause of data leaks because they grant anonymous access — no AWS account needed. IAM policies are safer because they require authenticated AWS credentials, but they can't grant cross-account access without a bucket policy.

When NOT to use S3: for structured relational data (use RDS or DynamoDB), for low-latency random access on small files (use EBS or instance store), or when you need POSIX file system semantics (use EFS or FSx). S3 is optimized for throughput, not IOPS — expect 3,500 PUT/5,500 GET requests per second per prefix.

Real-world numbers: Netflix stores 8+ PB of video assets, Reddit uses S3 for all user uploads, and Snowflake stores compressed data in S3. The 11 9s durability (99.999999999%) means losing an object is statistically impossible — but losing access because of a bad bucket policy is entirely possible and happens daily.

Plain-English First

Imagine a massive warehouse with unlimited shelf space. You rent a named section of that warehouse (a 'bucket'), and inside it you can store any box (a 'file') of any size. Each box gets a unique label so anyone — or only you — can find it later. That's S3: Amazon's unlimited, pay-per-byte file warehouse in the cloud, accessible from anywhere on the internet.

Every modern application eventually needs somewhere to put files — profile pictures, invoice PDFs, video uploads, database backups, static websites. The moment you outgrow a single server's disk, you need distributed, durable storage. AWS S3 is where the industry landed, and it's powered everything from Netflix's video catalogue to your favourite startup's user uploads for nearly two decades. If you're building anything serious on AWS, S3 is the first service you'll touch.

Before S3, teams had to spin up dedicated file servers, worry about disk failures, manually handle replication, and scramble when traffic spiked. S3 solved all of that in one API. It stores your data across multiple physical locations automatically, scales to literally exabytes without any configuration, and charges you only for what you use. The result: you stop thinking about storage infrastructure and start thinking about your product.

By the end of this article you'll know exactly how S3 is structured, how to create buckets and upload objects using both the AWS CLI and the Python Boto3 SDK, how to control who can access your data and when, and the real-world patterns that experienced engineers use daily — plus the costly gotchas that trip up everyone the first time.

Why a Single S3 Bucket Policy Leaked 50K SSNs

S3 bucket policies are JSON-based resource-based policies that define who can access a bucket and what actions they can perform on which objects. Unlike IAM policies attached to users or roles, bucket policies are attached directly to the bucket and can grant cross-account access without requiring any IAM configuration in the target account. The core mechanic is that the policy's Principal element can be set to "*" to allow anonymous access, which is the single most common cause of data exposure.

When you attach a bucket policy with Principal: "*" and an Allow effect, you are explicitly granting access to every unauthenticated internet user. AWS evaluates bucket policies alongside IAM policies — an explicit Allow in either wins unless there's an explicit Deny. This means a single misconfigured policy can override all other access controls. The policy is evaluated at the bucket level, not the object level, so even objects with ACLs denying access can be read if the bucket policy allows it.

Use bucket policies when you need to grant cross-account access, enforce HTTPS-only access, or set a blanket permission across all objects in a bucket. They are essential for public website hosting, but dangerous for any bucket containing sensitive data. The 2017 Verizon breach of 14 million customer records and the 2021 Pegasus Airlines leak of 6.5 million files both started with a single bucket policy allowing public read access.

⚠ Principal: "*" Is Not a Test

Setting Principal to "*" in a bucket policy grants access to every anonymous user on the internet — not just authenticated AWS users. This is the #1 cause of S3 data leaks.

📊 Production Insight

A data engineering team set Principal: "" on a bucket containing 50K SSNs for a 'quick test' and forgot to remove it. The bucket was indexed by Google within 48 hours. Symptom: the bucket's 'ListObjects' operation returned a 200 OK from any curl request without credentials. Rule: never use Principal: "" in a bucket policy unless the bucket is explicitly designed for public content — and even then, restrict actions to s3:GetObject only.

🎯 Key Takeaway

Bucket policies with Principal: "*" grant anonymous access — there is no implicit authentication check.

Always test bucket policies with the AWS Policy Simulator before applying to production.

Use IAM policies for user/role access and bucket policies only for cross-account or service-level controls.

thecodeforge.io

Aws S3 Basics

Buckets and Objects — How S3 is Actually Structured

S3 has exactly two building blocks: buckets and objects. A bucket is a top-level container — think of it as a named partition of Amazon's storage infrastructure. Every bucket name must be globally unique across all AWS accounts everywhere. Not just unique to you — unique to every person using S3 on Earth. That's why 'images' is taken, but 'acme-corp-product-images-prod' probably isn't.

An object is any file you store inside a bucket. It has three parts: the key (the file path, e.g. 'invoices/2024/jan/invoice-001.pdf'), the data itself (up to 5TB per object), and metadata (key-value pairs like content type or custom tags).

Here's the important mental model shift: S3 is NOT a filesystem. There are no real folders. 'invoices/2024/jan/' is just a prefix in the key name. The AWS console and SDK simulate folders for your convenience, but under the hood every object lives flat in the bucket identified by its full key string. This matters when you're listing, filtering, or managing costs at scale.

Buckets are also region-specific. When you create a bucket in us-east-1, your data lives there unless you explicitly set up replication. Always create buckets in the region closest to your users or your compute layer.

s3_bucket_and_object_basics.shBASH

#!/bin/bash
# ── Prerequisites: AWS CLI installed and configured with `aws configure` ──

# 1. Create a new bucket in us-east-1
#    Bucket names: lowercase, 3-63 chars, no underscores, globally unique
aws s3api create-bucket \
  --bucket acme-corp-user-uploads-dev \
  --region us-east-1
# Note: us-east-1 is the only region that does NOT need a LocationConstraint.
# Every other region requires --create-bucket-configuration LocationConstraint=<region>

# 2. Upload a local file as an S3 object
#    The key here is 'avatars/user-4821/profile.jpg'
#    S3 doesn't create a folder -- 'avatars/user-4821/' is just part of the key name
aws s3 cp ./profile.jpg \
  s3://acme-corp-user-uploads-dev/avatars/user-4821/profile.jpg

# 3. List objects using a prefix filter (simulates folder browsing)
#    This returns ONLY objects whose key starts with 'avatars/user-4821/'
aws s3 ls s3://acme-corp-user-uploads-dev/avatars/user-4821/

# 4. Download the object back to verify the round-trip
aws s3 cp \
  s3://acme-corp-user-uploads-dev/avatars/user-4821/profile.jpg \
  ./profile_downloaded.jpg

# 5. Delete the object
aws s3 rm s3://acme-corp-user-uploads-dev/avatars/user-4821/profile.jpg

Output

# Step 1 -- create-bucket

{

"Location": "/acme-corp-user-uploads-dev"

}

# Step 2 -- cp upload

upload: ./profile.jpg to s3://acme-corp-user-uploads-dev/avatars/user-4821/profile.jpg

# Step 3 -- ls

2024-06-12 14:03:22 84231 profile.jpg

# Step 4 -- cp download

download: s3://acme-corp-user-uploads-dev/avatars/user-4821/profile.jpg to ./profile_downloaded.jpg

# Step 5 -- rm

delete: s3://acme-corp-user-uploads-dev/avatars/user-4821/profile.jpg

⚠ Watch Out: The us-east-1 LocationConstraint Trap

If you run create-bucket with --create-bucket-configuration LocationConstraint=us-east-1, AWS throws a cryptic 'InvalidLocationConstraint' error. us-east-1 is the default region and must NOT have a LocationConstraint. Every other region — eu-west-1, ap-southeast-2, etc. — requires it. This catches almost everyone on their first cross-region script.

📊 Production Insight

Using prefix-based folder simulation means listing 10M objects under one prefix is slow.

S3 ListObjectsV2 is limited to 1000 keys per call.

Rule: design your key hierarchy to keep object counts per prefix under 100k for fast listing.

🎯 Key Takeaway

S3 object keys are flat strings, not real file paths.

'nested/folders/' is just a key prefix.

Design key structure for performance, not aesthetics.

Access Control — Who Can See Your Files and Why It Matters

By default, every S3 bucket and every object inside it is completely private. Nothing is publicly accessible unless you deliberately make it so. This is the right default, but it means you need to understand the two main ways to grant access: bucket policies and presigned URLs.

A bucket policy is a JSON document attached to the bucket that grants broad, persistent permissions — for example, allowing your application's IAM role to read everything under the 'invoices/' prefix, or making a 'public-assets/' prefix readable by the entire internet for a static website. Bucket policies are evaluated on every request to that bucket, so they're perfect for service-to-service access.

A presigned URL is the smarter choice for user-facing file access. Instead of making objects public, you generate a time-limited URL server-side that temporarily grants access to one specific object. The URL embeds cryptographic credentials and an expiry timestamp. When it expires, access is gone — automatically, no cleanup needed. This is how every serious application handles file downloads and uploads: the backend stays in control, and the client gets a URL that works just long enough.

Never make a bucket public unless it's genuinely meant to serve public static assets. Even then, use a CloudFront distribution in front of it rather than direct public bucket access.

s3_access_control.pyPYTHON

import boto3
import json
from datetime import datetime

# ── Boto3 picks up credentials from env vars, ~/.aws/credentials, or IAM role ──
s3_client = boto3.client('s3', region_name='us-east-1')

BUCKET_NAME = 'acme-corp-user-uploads-dev'

# ── PART 1: Attach a bucket policy ──
# This policy allows ONLY our application's IAM role to read objects
# under the 'invoices/' prefix. Nothing else in the bucket is affected.
bucket_policy = {
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "AllowAppRoleInvoiceRead",
            "Effect": "Allow",
            "Principal": {
                # Replace with your actual IAM role ARN
                "AWS": "arn:aws:iam::123456789012:role/AcmeAppServerRole"
            },
            "Action": "s3:GetObject",
            # The /* at the end means: any object under this prefix
            "Resource": f"arn:aws:s3:::{BUCKET_NAME}/invoices/*"
        }
    ]
}

s3_client.put_bucket_policy(
    Bucket=BUCKET_NAME,
    Policy=json.dumps(bucket_policy)  # Policy must be serialised to a JSON string
)
print(f"Bucket policy applied to {BUCKET_NAME}")

# ── PART 2: Generate a presigned URL for secure, time-limited object access ──
# Use case: a user clicks 'Download Invoice' in your web app.
# Your backend generates this URL and returns it. The browser hits S3 directly.
# The object itself stays private the whole time.

object_key = 'invoices/2024/jan/invoice-001.pdf'
expiry_seconds = 3600  # URL valid for exactly 1 hour

presigned_download_url = s3_client.generate_presigned_url(
    ClientMethod='get_object',  # 'put_object' works the same way for uploads
    Params={
        'Bucket': BUCKET_NAME,
        'Key': object_key
    },
    ExpiresIn=expiry_seconds
)

print(f"\nPresigned download URL (valid for {expiry_seconds}s):")
print(presigned_download_url)
print(f"\nURL expires at approximately: {datetime.utcnow()} + {expiry_seconds // 60} minutes")

# ── PART 3: Generate a presigned URL for direct-upload from the browser ──
# The client uploads straight to S3 — your server never handles the file bytes.
# This is the standard pattern for large file uploads.
presigned_upload_url = s3_client.generate_presigned_url(
    ClientMethod='put_object',
    Params={
        'Bucket': BUCKET_NAME,
        'Key': 'avatars/user-9001/profile.jpg',
        'ContentType': 'image/jpeg'  # Enforce content type at signing time
    },
    ExpiresIn=300  # Upload must start within 5 minutes
)

print(f"\nPresigned upload URL (valid for 300s):")
print(presigned_upload_url)

Output

Bucket policy applied to acme-corp-user-uploads-dev

Presigned download URL (valid for 3600s):

https://acme-corp-user-uploads-dev.s3.amazonaws.com/invoices/2024/jan/invoice-001.pdf?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIOSFODNN7EXAMPLE%2F20240612%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20240612T140322Z&X-Amz-Expires=3600&X-Amz-SignedHeaders=host&X-Amz-Signature=3d8a2f...

URL expires at approximately: 2024-06-12 14:03:22 + 60 minutes

Presigned upload URL (valid for 300s):

https://acme-corp-user-uploads-dev.s3.amazonaws.com/avatars/user-9001/profile.jpg?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIOSFODNN7EXAMPLE%2F20240612%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20240612T140322Z&X-Amz-Expires=300&X-Amz-SignedHeaders=content-type%3Bhost&X-Amz-Signature=9c1b4e...

💡Pro Tip: Presigned Upload URLs = No File Size Limit on Your Server

When you generate a presigned PUT URL and give it to the browser, the user uploads directly from their device to S3. Your application server never receives the file bytes. This means you're not capped by your server's memory, you're not paying egress costs for the upload, and 10 GB video files upload just as smoothly as 10 KB thumbnails. This pattern is used by almost every major SaaS product.

📊 Production Insight

Bucket policies are evaluated first, then IAM policies, then ACLs.

If you have a Deny in any layer, access is denied — no Allow overrides a Deny.

Rule: use bucket policies for bucket-wide rules, IAM for user-specific, presigned URLs for per-object client access.

🎯 Key Takeaway

Presigned URLs are the production-grade pattern for user file access.

They keep objects private while granting time-limited, per-object access.

Never expose your bucket or objects to the public internet.

thecodeforge.io

Aws S3 Basics

Bucket Policy vs IAM Policy Comparison Matrix

AWS provides two primary mechanisms for controlling access to S3: bucket policies and IAM policies. Choosing between them (or using both together) is a frequent source of confusion. This matrix clarifies the differences so you can make the right choice for each access scenario.

Feature	Bucket Policy	IAM Policy
Scope	Attached to a specific S3 bucket	Attached to a user, group, or role (IAM principal)
Who can be granted access	Any AWS account, IAM role, or anonymous (`Principal: "*"`)	Only the principal the policy is attached to
Cross-account access	Yes — specify another account's IAM role in `Principal`	No — must use cross-account roles or bucket policies
Anonymous access	Yes — set `Principal: "*"`	No — IAM policies never apply to unauthenticated requests
Granularity	Bucket-wide or prefix-level (`Resource` with `/*`)	Can specify individual object ARNs (`arn:aws:s3:::bucket/key`)
Default effect	Deny (no policy means no access)	Deny (no policy means no access)
Policy evaluation order	Evaluated first; explicit Deny always wins	Evaluated after bucket policy; Deny still wins
Use case example	Allow anonymous read for static website	Allow a specific developer to read all buckets in an account
Management	One policy per bucket, manageable via S3 console	Centralized via IAM console or AWS Organizations SCPs
Maximum size	20 KB	6,144 characters (managed policies) / 10 KB (inline)

When to use which: - Bucket policy when you need to grant access to non-AWS identities (like anonymous users) or manage permissions for a shared bucket used by multiple accounts. - IAM policy when you want to follow least-privilege and centrally manage permissions for your own users and roles. - Both in production: use IAM for your application roles and add bucket policies only for specific overrides (e.g., deny all public access as a safety net).

A common anti-pattern is using a permissive bucket policy to allow your own IAM role, then tightening via IAM. Instead, let IAM do the fine-grained control and keep bucket policies simpler.

📊 Production Insight

IAM policies support conditions like aws:SourceIp or aws:MultiFactorAuthPresent that bucket policies do not. If you need MFA enforcement for S3 operations, attach an IAM policy with a condition — bucket policies cannot require MFA directly.

🎯 Key Takeaway

Bucket policies handle anonymous/cross-account access; IAM policies control your own team. Combine both for defense in depth, but never nest overly complex logic in bucket policies.

S3 Storage Classes — Cost vs Durability Comparison Table

S3 offers multiple storage classes designed to balance cost, retrieval latency, and durability across your data lifecycle. The table below summarises each class along with typical use cases and cost trade-offs.

Storage Class	Best For	Retrieval Latency	Approx. Cost/GB/Month (us-east-1)	Retrieval Fee	Durability	Minimum Object Size	Minimum Storage Duration
S3 Standard	Frequently accessed, active data	Milliseconds	$0.023	None	99.999999999%	None	None
S3 Intelligent-Tiering	Unknown or changing access patterns	Milliseconds	$0.023 (frequent tier) + monitoring fee	None	99.999999999%	None	30 days in frequent tier
S3 Standard-IA	Infrequently accessed but needs quick access	Milliseconds	$0.0125	$0.01/GB	99.999999999%	128KB	30 days
S3 One Zone-IA	Re-creatable, infrequently accessed data	Milliseconds	$0.01	$0.01/GB	99.999999999% (single AZ)	128KB	30 days
S3 Glacier Instant Retrieval	Archive data accessed quarterly	Milliseconds	$0.004	$0.03/GB	99.999999999%	128KB	90 days
S3 Glacier Flexible Retrieval	Archive data accessed yearly	Minutes to hours	$0.0036	$0.03/GB (expedited)	99.999999999%	40KB	90 days
S3 Glacier Deep Archive	Long-term compliance, accessed rarely	12 hours	$0.00099	$0.02/GB	99.999999999%	40KB	180 days
S3 on Outposts	On-premises workloads requiring local S3	Milliseconds (local)	Varies	None	Depends on hardware	None	None

Key cost factors: - Standard costs the most per GB but has no retrieval fees or minimums. - Standard-IA cuts storage cost by ~45% but adds a per-GB retrieval fee. Ideal for backups you access a few times a year. - Glacier Deep Archive is 96% cheaper than Standard but retrieval takes up to 12 hours — perfect for regulatory retention. - Minimum object sizes and minimum storage durations apply to Infrequent Access and Glacier tiers; storing many small objects or deleting early incurs additional charges.

Use lifecycle policies to automatically transition objects between these classes as they age, maximising cost savings without manual intervention.

🔥Intelligent-Tiering: Set and Forget for Unpredictable Access

If your data has spikey access patterns — some months heavily read, others idle — S3 Intelligent-Tiering automatically moves objects between frequent and infrequent tiers based on usage. You pay a small monthly monitoring fee per object (typically $0.0025/1000 objects) but avoid manual tier selection. No retrieval charges, no minimum durations.

📊 Production Insight

Always check minimum storage durations: moving objects out of Standard-IA before 30 days incurs a pro-rated charge. For high-turnover data (logs kept 7 days), stay in Standard — the early-delete penalty wipes out any savings.

🎯 Key Takeaway

Match storage class to access frequency: Standard for hot data, Standard-IA for warm, Glacier Deep Archive for cold. Use lifecycle rules to automate transitions and avoid manual cost leak.

Lifecycle Transition Visual — Standard → IA → Glacier

Automated lifecycle transitions are the backbone of S3 cost optimisation. Instead of manually moving files between storage classes, define rules that trigger based on object age. The diagram below visualises a typical production lifecycle path.

How it works: 1. Objects are uploaded into S3 Standard (fastest, most expensive). 2. After 30 days, they automatically transition to S3 Standard-IA (45% cheaper, same latency). 3. After 180 days, they move to S3 Glacier Instant Retrieval (68% cheaper, same latency). 4. After 365 days, they move to S3 Glacier Deep Archive (96% cheaper, 12-hour retrieval). 5. After 7 years (2555 days), objects are permanently deleted (if desired).

Each transition is invisible to your application — the S3 API continues to work identically regardless of storage class. Only the billing changes.

Important: Transitions are one-way. You cannot automatically move objects from Glacier back to Standard without manual restoration (which incurs retrieval fees). Plan your tiers carefully: once data is archived, treat it as read-only.

s3_lifecycle_transition.pyPYTHON

import boto3
import json

s3 = boto3.client('s3', region_name='us-east-1')
bucket = 'acme-prod-uploads'

lifecycle = {
    'Rules': [
        {
            'ID': 'StandardToGlacier',
            'Status': 'Enabled',
            'Filter': {'Prefix': ''},
            'Transitions': [
                {'Days': 30, 'StorageClass': 'STANDARD_IA'},
                {'Days': 180, 'StorageClass': 'GLACIER_IR'},
                {'Days': 365, 'StorageClass': 'DEEP_ARCHIVE'}
            ],
            'Expiration': {'Days': 2555}
        },
        {
            'ID': 'CleanupMultipart',
            'Status': 'Enabled',
            'Filter': {'Prefix': ''},
            'AbortIncompleteMultipartUpload': {'DaysAfterInitiation': 7}
        }
    ]
}

s3.put_bucket_lifecycle_configuration(
    Bucket=bucket,
    LifecycleConfiguration=lifecycle
)
print('Lifecycle policy applied successfully.')

Output

Lifecycle policy applied successfully.

📊 Production Insight

Lifecycle transitions are evaluated once per day. If you need immediate cost savings (e.g., after a data migration), you can manually change the storage class on existing objects using S3 Batch Operations — but automated transitions are sufficient for ongoing lifecycle management.

🎯 Key Takeaway

A single lifecycle policy with 4 transition rules can reduce your S3 bill by over 95% for archival data. Always include an AbortIncompleteMultipartUpload rule to prevent silent cost leaks.

S3 Lifecycle Transition Flow

Storage Classes and Lifecycle Policies — Cutting Your S3 Bill in Half

Not all data is accessed equally often. Your app might read a user's profile picture dozens of times a day, but that invoice from January 2021? Probably never again unless there's an audit. S3 gives you storage classes — different pricing tiers based on how frequently and quickly you need to access data.

S3 Standard is the default and the most expensive per GB. It's designed for data you access regularly with millisecond latency. S3 Standard-IA (Infrequent Access) costs about 45% less per GB but charges a per-retrieval fee, making it ideal for backups and older content you occasionally need. S3 Glacier Instant Retrieval drops the cost further for archival data you access maybe once a quarter. S3 Glacier Deep Archive is the cheapest tier — pennies per GB per month — for data you might need once a year and can wait up to 12 hours to retrieve.

The power move is combining storage classes with lifecycle policies: automated rules that transition objects to cheaper tiers (or delete them entirely) based on their age. You configure this once, and S3 handles the cost optimisation forever. A common real-world pattern: keep user uploads in Standard for 30 days, move to Standard-IA for 6 months, then Glacier Deep Archive indefinitely — with zero manual work after initial setup.

s3_lifecycle_policy.pyPYTHON

import boto3
import json

s3_client = boto3.client('s3', region_name='us-east-1')

BUCKET_NAME = 'acme-corp-user-uploads-dev'

# ── Apply a lifecycle policy that automatically manages storage costs ──
# This policy covers objects under the 'invoices/' prefix only.
# Other prefixes (avatars/, reports/) are not affected.
lifecycle_configuration = {
    "Rules": [
        {
            "ID": "InvoiceArchivalPolicy",
            "Status": "Enabled",  # 'Disabled' to pause without deleting the rule
            "Filter": {
                # Only apply this rule to objects whose key starts with 'invoices/'
                "Prefix": "invoices/"
            },
            "Transitions": [
                {
                    # After 30 days, move to Standard-IA
                    # ~45% cheaper per GB, small per-retrieval fee applies
                    "Days": 30,
                    "StorageClass": "STANDARD_IA"
                },
                {
                    # After 180 days, move to Glacier Instant Retrieval
                    # ~68% cheaper than Standard, millisecond retrieval still available
                    "Days": 180,
                    "StorageClass": "GLACIER_IR"
                },
                {
                    # After 365 days, move to Glacier Deep Archive
                    # Cheapest tier. Retrieval takes up to 12 hours.
                    # Perfect for compliance-required long-term retention
                    "Days": 365,
                    "StorageClass": "DEEP_ARCHIVE"
                }
            ],
            # Permanently delete objects after 7 years (2555 days)
            # Adjust to match your legal/compliance retention requirements
            "Expiration": {
                "Days": 2555
            }
        },
        {
            # Second rule: automatically clean up incomplete multipart uploads
            # Without this, partially uploaded large files silently accumulate costs
            "ID": "CleanupIncompleteMultipartUploads",
            "Status": "Enabled",
            "Filter": {"Prefix": ""},  # Apply to the entire bucket
            "AbortIncompleteMultipartUpload": {
                "DaysAfterInitiation": 7  # Abort uploads not completed within 7 days
            }
        }
    ]
}

response = s3_client.put_bucket_lifecycle_configuration(
    Bucket=BUCKET_NAME,
    LifecycleConfiguration=lifecycle_configuration
)

print(f"Lifecycle policy applied. HTTP status: {response['ResponseMetadata']['HTTPStatusCode']}")

# ── Verify the policy was saved correctly ──
saved_policy = s3_client.get_bucket_lifecycle_configuration(Bucket=BUCKET_NAME)
for rule in saved_policy['Rules']:
    print(f"\nRule ID: {rule['ID']} | Status: {rule['Status']}")
    if 'Transitions' in rule:
        for transition in rule['Transitions']:
            print(f"  → After {transition['Days']} days: {transition['StorageClass']}")
    if 'Expiration' in rule:
        print(f"  → Delete after: {rule['Expiration']['Days']} days")

Output

Lifecycle policy applied. HTTP status: 200

Rule ID: InvoiceArchivalPolicy | Status: Enabled

→ After 30 days: STANDARD_IA

→ After 180 days: GLACIER_IR

→ After 365 days: DEEP_ARCHIVE

→ Delete after: 2555 days

Rule ID: CleanupIncompleteMultipartUploads | Status: Enabled

🔥Interview Gold: The Hidden Multipart Upload Cost Leak

Interviewers love asking about unexpected S3 costs. The answer they want: incomplete multipart uploads. When a large file upload fails halfway, S3 stores the uploaded parts and charges you for them — indefinitely, silently, without showing them in a normal bucket listing. The fix is exactly what's in the code above: an AbortIncompleteMultipartUpload lifecycle rule. Every production bucket should have one.

📊 Production Insight

Lifecycle policies have a minimum age of 30 days for Standard → IA and 90 days for IA → Glacier.

If you try to set a shorter transition, the API returns an error.

Rule: always check the minimum transition days before writing your policy.

🎯 Key Takeaway

Storage class transitions via lifecycle policies are a one-time setup that permanently reduces costs.

Standard → Standard-IA → Glacier Deep Archive can cut your S3 bill by over 95% for archival data.

Add an AbortIncompleteMultipartUpload rule to every bucket.

Versioning and Object Lock — Protect Against Accidental Deletion and Compliance

Without versioning, every PUT overwrites the object and every DELETE makes it gone forever. That's fine for temp files, but for user content, production configs, or audit logs, you need versioning.

When versioning is enabled, every object operation creates a new version ID. A DELETE doesn't remove the object — it just adds a delete marker. You can restore any previous version instantly. Versioning also integrates with lifecycle policies to automatically expire old versions and reduce storage costs.

S3 Object Lock takes protection further by making objects write-once-read-many (WORM). You can lock objects for a retention period (days or years) or use legal holds. This is critical for compliance with SEC, FINRA, or GDPR retention rules. Even the root user of the AWS account cannot delete a locked object before the retention period expires.

Versioning and Object Lock together give you an immutable data layer. Enable versioning on every production bucket from day one. Object Lock must be enabled when the bucket is created — you cannot add it later.

s3_versioning_and_lock.shBASH

#!/bin/bash
# ── Enable versioning on an existing bucket ──
aws s3api put-bucket-versioning \
  --bucket acme-corp-user-uploads-prod \
  --versioning-configuration Status=Enabled

# ── List all versions of an object (including deleted) ──
aws s3api list-object-versions \
  --bucket acme-corp-user-uploads-prod \
  --prefix invoices/jan-2024/report.pdf

# ── Restore a deleted object by copying from a previous version ──
aws s3api copy-object \
  --bucket acme-corp-user-uploads-prod \
  --copy-source acme-corp-user-uploads-prod/invoices/jan-2024/report.pdf?versionId=abc123 \
  --key invoices/jan-2024/report.pdf

# ── Enable Object Lock at bucket creation (required during creation) ──
aws s3api create-bucket \
  --bucket acme-corp-compliant-logs \
  --region us-east-1 \
  --object-lock-enabled-for-bucket

# ── Set a default retention period on the bucket (7 days for compliance) ──
aws s3api put-object-lock-configuration \
  --bucket acme-corp-compliant-logs \
  --object-lock-configuration '{
    "ObjectLockEnabled": "Enabled",
    "Rule": {
      "DefaultRetention": {
        "Mode": "GOVERNANCE",
        "Days": 7
      }
    }
  }'

Output

# Enable versioning (no output on success, check with get-bucket-versioning)

# List object versions

{

"Versions": [

{

"Key": "invoices/jan-2024/report.pdf",

"VersionId": "abc123",

"IsLatest": false,

"LastModified": "2024-01-15T10:00:00Z",

"Size": 12345

{

"Key": "invoices/jan-2024/report.pdf",

"VersionId": "xyz789",

"IsLatest": true,

"LastModified": "2024-06-12T14:00:00Z",

"Size": 12345

}

"DeleteMarkers": []

}

# Restore object (success returns metadata)

# Object lock configuration set

⚠ Object Lock Must Be Enabled at Bucket Creation

You cannot enable Object Lock on an existing bucket. If you think you might need WORM compliance later, create the bucket with --object-lock-enabled-for-bucket even if you don't configure retention immediately. That keeps the door open. Without it, you're forced to migrate data to a new bucket.

📊 Production Insight

Versioning multiplies your storage costs by the number of versions kept.

A file updated 100 times stores 100 copies.

Rule: combine versioning with a lifecycle policy to expire old versions after 30-90 days.

🎯 Key Takeaway

Enable versioning on every production bucket.

It's a one-line CLI command that gives you undo for S3.

Object Lock must be decided at bucket creation — plan ahead for compliance needs.

Performance Patterns — Multipart Uploads, Transfer Acceleration, and Cross-Region Replication

S3 is fast, but you can make it faster — and more expensive if you're not careful.

Multipart Upload: For files larger than 100MB (recommended), use multipart upload. It splits the file into parts (min 5MB each) and uploads them in parallel. If a part fails, only that part is retried, not the entire file. The boto3 upload_file method handles this automatically above 8MB. CLI aws s3 cp does the same.

S3 Transfer Acceleration: Uses AWS edge locations to route uploads over the AWS backbone network instead of the public internet. This can cut upload times by 50-80% for users far from the bucket region. It costs extra per GB uploaded. Enable it on the bucket and use the accelerated endpoint.

Cross-Region Replication (CRR): Automatically replicates objects to a bucket in another region. Use for disaster recovery, lower latency for global users, or compliance. Replication is asynchronous — expect a few seconds to a few hours of lag. You need versioning enabled on both source and destination buckets.

Key trade-off: Transfer Acceleration and CRR cost money. Don't enable them unless you have a measurable need. For most apps, a single bucket with CloudFront is cheaper and faster.

s3_performance_patterns.pyPYTHON

import boto3
import os

s3_client = boto3.client('s3', region_name='us-east-1')
BUCKET = 'acme-corp-large-uploads-prod'

# ── Multipart Upload using boto3's high-level upload_file ──
# Automatically uses multipart for files >8MB
s3_client.upload_file(
    Filename='/tmp/large_backup.tar.gz',
    Bucket=BUCKET,
    Key='backups/june-2026.tar.gz'
)
print("Upload completed (multipart used automatically)")

# ── Transfer Acceleration example ──
# First, enable acceleration on the bucket (CLI):
# aws s3api put-bucket-accelerate-configuration --bucket <bucket> --accelerate-configuration Status=Enabled

# Then upload using the accelerated endpoint
s3_accelerated = boto3.client(
    's3',
    region_name='us-east-1',
    config=boto3.session.Config(
        s3={'use_accelerate_endpoint': True}
    )
)
# Subsequent uploads go through the accelerated endpoint automatically
s3_accelerated.upload_file(
    '/tmp/large_file.mp4',
    BUCKET,
    'videos/promo.mp4'
)
print("Uploaded via Transfer Acceleration")

# ── List ongoing multipart uploads (to find stuck ones) ──
response = s3_client.list_multipart_uploads(Bucket=BUCKET)
if 'Uploads' in response:
    for upload in response['Uploads']:
        print(f"Stuck upload: {upload['Key']} initiated {upload['Initiated']}")
else:
    print("No incomplete multipart uploads found")

Output

Upload completed (multipart used automatically)

Uploaded via Transfer Acceleration

No incomplete multipart uploads found

💡When to Avoid Transfer Acceleration

If your client and bucket are in the same region, Transfer Acceleration adds latency and cost. It also doesn't help for downloads — only uploads. Benchmark with aws s3 cp with and without --endpoint-url before enabling.

📊 Production Insight

Cross-Region Replication costs storage in both regions + replication PUT requests.

If you're replicating for DR, consider using S3 Batch Replication first to seed, then CRR for ongoing.

Rule: monitor replication lag with S3 metrics — lag of over an hour indicates a bottleneck.

🎯 Key Takeaway

Multipart uploads are essential for files >100MB — they're automatic in the SDK.

Transfer Acceleration helps global uploads but costs extra; benchmark first.

CRR is async and costs double storage; only use when required.

Use an S3 Bucket Like a Senior — Console, CLI, or Scripts

You don't "use" S3 via the console if you manage more than 3 buckets. The console is for debugging, not operations. Real teams script everything.

The AWS CLI is your hammer. aws s3 cp, aws s3 sync, aws s3 rm. That's 90% of your daily interactions. Sync is particularly deadly — it only transfers changed files, saving bandwidth and time. But watch out: sync deletes files in the destination that aren't in the source unless you pass --delete explicitly. That flag is how you accidentally nuke a production bucket.

For programmatic access, use the AWS SDK (boto3 in Python, aws-sdk in JS). Always use IAM roles, never hardcode keys. Set ServerSideEncryption and BucketKeyEnabled in every PutObject call. Your future self will thank you when the auditor asks why your data isn't encrypted at rest.

The console is fine for one-off uploads or checking bucket properties. But if you click "Upload" more than once a week, you're doing it wrong.

S3SyncProduction.ymlYAML

// io.thecodeforge — devops tutorial

// Sync local logs to S3 with encryption and size limit
aws s3 sync /var/log/app/ s3://prod-logs-bucket/
    --region us-east-1
    --sse AES256
    --exclude "*.tmp"
    --include "*.log"
    --no-follow-symlinks

// Output:
// upload: /var/log/app/2025-04-01.log to s3://prod-logs-bucket/2025-04-01.log
// upload: /var/log/app/2025-04-02.log to s3://prod-logs-bucket/2025-04-02.log
// Total uploaded: 2 files
// Warning: use --delete carefully — it removes destination files not in source

Output

upload: /var/log/app/2025-04-01.log to s3://prod-logs-bucket/2025-04-01.log

upload: /var/log/app/2025-04-02.log to s3://prod-logs-bucket/2025-04-02.log

Total uploaded: 2 files

⚠ Production Trap:

Running aws s3 sync with --delete in the wrong direction will delete all files in the destination bucket. Use --dryrun first. Always.

🎯 Key Takeaway

Automate S3 operations with CLI or SDK. Console is for debugging, not daily work.

S3 Data Consistency — The Read-After-Write Promise You Can Actually Rely On

S3 gives you read-after-write consistency for PUTS of new objects. You upload, then immediately read — you get the data. This is not eventual consistency. It's immediate. For overwrite PUTS and DELETES, it's also strongly consistent. That means if you delete an object and then list the bucket, the object is gone.

But here's the edge case that bites people: S3 is eventually consistent for listings in certain scenarios with bucket operations. If you create a bucket and immediately query it, you might not see it. That's a 1-second window, but in production automation, that's enough to fail a deploy script.

Versioning changes the game. With versioning enabled, every overwrite creates a new version. The old version is preserved. Deleting an object adds a delete marker — the object is still there. This makes consistency a non-issue for rollbacks. If a bad deploy overwrites production assets, you just delete the delete marker.

For cross-region replication (CRR), consistency is eventually consistent. Changes replicate asynchronously. Never design a system that depends on CRR being instant.

S3ConsistencyCheck.ymlYAML

// io.thecodeforge — devops tutorial

// Verify read-after-write consistency in script
aws s3api put-object --bucket prod-assets --key config.json --body config.json

aws s3api head-object --bucket prod-assets --key config.json

// Output:
// {
//     "LastModified": "2025-04-02T14:30:00Z",
//     "ContentLength": 2048,
//     "ETag": "\"abc123...\"",
//     "VersionId": "v1"
// }

// Note: If versioning is off, overwrite is immediate. If on, you get a new version.

Output

{

"LastModified": "2025-04-02T14:30:00Z",

"ContentLength": 2048,

"ETag": "\"abc123...\"",

"VersionId": "v1"

}

🔥Senior Shortcut:

For production critical data, always enable versioning. It turns consistency from a theoretical guarantee into a practical safety net.

🎯 Key Takeaway

S3 is strongly consistent for most operations. Versioning makes rollbacks trivial — enable it on every bucket that matters.

Computing in AWS: Beyond Static Hosting

S3 is object storage, but it doesn't compute. In production, you pair S3 with AWS compute services to process uploads, serve dynamic content, or run batch jobs. EC2 gives you full control: launch a virtual machine, install web servers, and pull data from S3 via SDK. For stateless tasks, use AWS Fargate—containerized computing without managing servers. ECS and EKS orchestrate clusters. The key insight: S3 triggers Lambda functions on object creation (e.g., resizing images). Design for decoupling—S3 pushes events to SQS or EventBridge, and compute services consume them. This pattern scales to zero cost when idle. Always set IAM roles on compute instances to access S3, never hard-code keys. Compute choice impacts latency: co-locate compute in the same region as your S3 bucket.

s3-compute-pattern.ymlYAML

// io.thecodeforge — devops tutorial
// Event-driven compute with S3 + Lambda
AWSTemplateFormatVersion: '2010-09-09'
Resources:
  Bucket:
    Type: AWS::S3::Bucket
    Properties:
      NotificationConfiguration:
        LambdaConfigurations:
          - Event: s3:ObjectCreated:*
            Function: !GetAtt Resizer.Arn
  Resizer:
    Type: AWS::Lambda::Function
    Properties:
      Handler: index.handler
      Runtime: nodejs20.x
      Code:
        ZipFile: |
          exports.handler = async (event) => {
            console.log('Processing', event.Records[0].s3.object.key);
          };
      Role: !GetAtt LambdaRole.Arn
  LambdaRole:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Version: '2012-10-17'
        Statement:
          - Effect: Allow
            Principal:
              Service: lambda.amazonaws.com
            Action: sts:AssumeRole

Output

exact

⚠ Production Trap:

Never assign an IAM user key to an EC2 instance. Use Instance Profiles with least-privilege S3 policies. Rotate keys automatically.

🎯 Key Takeaway

Pair S3 with Lambda or Fargate for event-driven processing; decouple storage from compute for scalability.

thecodeforge.io

Aws S3 Basics

AWS Elastic Beanstalk: Managed Application Deployments

Elastic Beanstalk abstracts infrastructure so you focus on code. Upload a ZIP or connect to CodePipeline, and Beanstalk provisions EC2, load balancers, auto-scaling groups, and S3 bucket for logs. Choose platform: Python, Node.js, Go, Docker, or Java. It integrates with RDS for databases and CloudWatch for monitoring. Senior engineers use .ebextensions directory for custom config (e.g., environment variables, security group rules). Critical: Beanstalk creates an S3 bucket in your account to store deployment artifacts—never delete it manually. For blue-green deployments, swap environment URLs. Monitoring tip: enable enhanced health reporting for real-time CPU, memory, and latency. Beanstalk is not for every workload—if you need fine-grained networking (e.g., VPC peering), use CDK or Terraform instead. Pricing is EC2 + resources; no extra Beanstalk fee.

beanstalk-config.ymlYAML

// io.thecodeforge — devops tutorial
// env.yaml for Elastic Beanstalk
option_settings:
  - namespace: aws:elasticbeanstalk:environment
    option_name: EnvironmentType
    value: LoadBalanced
  - namespace: aws:autoscaling:launchconfiguration
    option_name: InstanceType
    value: t3.small
  - namespace: aws:elasticbeanstalk:application:environment
    option_name: NODE_ENV
    value: production
  - namespace: aws:elasticbeanstalk:s3:log
    option_name: LogFileUrls
    value: true

Output

exact

⚠ Production Trap:

Never store database credentials in Beanstalk environment variables. Use AWS Secrets Manager and retrieve at runtime via SDK.

🎯 Key Takeaway

Elastic Beanstalk auto-manages infrastructure from your code; use .ebextensions for production-grade tuning.

● Production incidentPOST-MORTEMseverity: high

Public Bucket Exposed 50,000 Customer Records

Symptom

Bucket objects appeared in Google search results. Legal notified the team that customer data (names, addresses, SSNs) was publicly accessible.

Assumption

Making the bucket public was the quickest way to share a file with a client. The engineer assumed no one would discover the bucket URL.

Root cause

The bucket policy granted s3:GetObject to Principal * without any prefix restriction. Every object in the bucket became world-readable.

Fix

1. Immediately update the bucket policy to deny all public access. 2. Enable Block Public Access at the account level. 3. Use presigned URLs for all future file sharing. 4. Rotate any exposed credentials found in the objects.

Key lesson

Never make a bucket public. Always use presigned URLs for per-file sharing.
Enable S3 Block Public Access at the account level — it's a safety net that prevents accidental public exposure.
If you must serve public static assets, use CloudFront with an Origin Access Control (OAC) to keep the bucket itself private.

Production debug guideCommon symptoms and immediate actions for S3 issues in production4 entries

Symptom · 01

AccessDenied when trying to read an object

→

Fix

Check bucket policy, IAM role permissions, and whether Block Public Access is enabled. Use aws s3api get-object-acl --bucket <name> --key <key> to verify object ownership.

Symptom · 02

Slow uploads or downloads (>500ms latency)

→

Fix

Check client region vs bucket region. Use S3 Transfer Acceleration for large objects across continents. For uploads >100MB, switch to multipart upload.

Symptom · 03

Bucket creation fails with 'BucketAlreadyExists'

→

Fix

Bucket names are globally unique. Try a more specific name (e.g., acme-prod-eu-logs). You cannot delete and recreate the same name quickly — DNS propagation delays cause this error.

Symptom · 04

Unexpected high S3 bill

→

Fix

Check for incomplete multipart uploads using lifecycle rules. Review storage class transitions — objects may be stuck in Standard. Use S3 Inventory for granular cost analysis.

★ S3 Quick Debug Cheat SheetFive common S3 production issues and the exact commands to diagnose and fix them

Can't access object — AccessDenied even with correct credentials−

Immediate action

Run `aws s3api get-bucket-policy --bucket <bucket>` to check if a deny policy exists

Commands

aws s3api get-bucket-policy-status --bucket <bucket> --query 'PolicyStatus.IsPublic'

aws s3api get-object-acl --bucket <bucket> --key <key>

Fix now

If the bucket policy has an explicit Deny, update it. If the object ACL is private but bucket policy allows public, the bucket policy wins — fix the policy.

Upload speeds are terrible for large files+

Object deleted accidentally — need recovery+

Bucket policy not taking effect+

S3 website endpoint returns 403 Forbidden+

S3 Storage Class Comparison

Storage Class	Best For	Retrieval Latency	Approx. Cost / GB / Month	Per-Retrieval Fee
S3 Standard	Active user data, frequently accessed files	Milliseconds	$0.023	None
S3 Standard-IA	Backups, older content, DR copies	Milliseconds	$0.0125	Yes (~$0.01/GB)
S3 Glacier Instant Retrieval	Quarterly-access archives	Milliseconds	$0.004	Yes (~$0.03/GB)
S3 Glacier Flexible Retrieval	Annual-access compliance data	Minutes to hours	$0.0036	Yes
S3 Glacier Deep Archive	Long-term legal/compliance retention	Up to 12 hours	$0.00099	Yes (~$0.02/GB)

⚙ Quick Reference

10 commands from this guide

File	Command / Code	Purpose
s3_bucket_and_object_basics.sh	aws s3api create-bucket \	Buckets and Objects
s3_access_control.py	from datetime import datetime	Access Control
s3_lifecycle_transition.py	s3 = boto3.client('s3', region_name='us-east-1')	Lifecycle Transition Visual
s3_lifecycle_policy.py	s3_client = boto3.client('s3', region_name='us-east-1')	Storage Classes and Lifecycle Policies
s3_versioning_and_lock.sh	aws s3api put-bucket-versioning \	Versioning and Object Lock
s3_performance_patterns.py	s3_client = boto3.client('s3', region_name='us-east-1')	Performance Patterns
S3SyncProduction.yml	aws s3 sync /var/log/app/ s3://prod-logs-bucket/	Use an S3 Bucket Like a Senior
S3ConsistencyCheck.yml	aws s3api put-object --bucket prod-assets --key config.json --body config.json	S3 Data Consistency
s3-compute-pattern.yml	AWSTemplateFormatVersion: '2010-09-09'	Computing in AWS
beanstalk-config.yml	option_settings:	AWS Elastic Beanstalk

Key takeaways

S3 object keys are flat strings, not real file paths

'folder' prefixes are a UI convenience, not a filesystem hierarchy. This changes how you design efficient list and filter operations at scale.

Presigned URLs are the production-grade pattern for user file access

they keep objects private while granting time-limited, per-object access without exposing IAM credentials to the client.

Every bucket that receives large file uploads needs an AbortIncompleteMultipartUpload lifecycle rule

without it, failed partial uploads accumulate silently and cost you real money.

Storage class transitions via lifecycle policies are a one-time setup that permanently reduces costs

Standard → Standard-IA → Glacier Deep Archive can cut your S3 bill by over 95% for archival data.

Enable versioning on every production bucket

it's a single CLI command that gives you undo for S3 and is a prerequisite for Object Lock and replication.

INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01SENIOR

S3 is often described as 'eventually consistent' — can you explain what ...

Q02SENIOR

Walk me through how you'd architect a file upload feature for a web app ...

Q03SENIOR

What's the difference between an S3 bucket policy and an IAM policy, and...

Q01 of 03SENIOR

S3 is often described as 'eventually consistent' — can you explain what that means and whether it's still true today, and how it would affect an application that writes and immediately reads the same object?

ANSWER

Before 2020, S3 provided read-after-write consistency for PUTs of new objects, but eventual consistency for overwrites and deletes. In December 2020, AWS announced strong read-after-write consistency for all S3 GET, LIST, and PUT operations in all regions. That means if you write an object and immediately read it, you'll see the latest version. However, LIST operations after a DELETE or overwrite may still be eventually consistent for a short period. The practical impact: your application no longer needs to implement retry logic for immediate read-after-write, but you should still be cautious with LIST consistency after modifications.

FAQ · 5 QUESTIONS

Frequently Asked Questions

What is the maximum file size you can upload to S3?

Does S3 guarantee that my data won't be lost?

What's the difference between S3 and EBS — when do you use each?

Can I enable Object Lock on an existing bucket?

How do I reduce S3 costs without losing data?

Naren Founder & Principal Engineer

20+ years shipping production infrastructure and CI/CD at scale. Written from production experience, not tutorials.

✓ Verified

production tested

July 18, 2026

last updated

2,466

articles · all by Naren

🔥

That's AWS. Mark it forged?

11 min read · try the examples if you haven't