DevOps Intermediate

AWS Snowball: Data Migration, Edge Computing, and Physical Data Transport at Scale

Q: What is AWS Snowball?

AWS Snowball is a ruggedized physical storage device that AWS ships to your data center for offline data transfer to and from S3. It is designed for datasets too large to transfer over the network, typically 50TB+. You copy data onto the device, ship it back to AWS, and they load it into S3 for you.

Q: What is the difference between Snowball, Snowball Edge, and Snowmobile?

Snowball (Standard) provides 80TB usable storage with no compute capability, intended for one-time data migration to/from S3 at $300 per job. Snowball Edge adds local compute (EC2 instances, Lambda) for edge processing, with variants ranging from 28TB to 210TB storage at $400-600 per job. Snowmobile is a 100PB shipping container for exabyte-scale data center migrations at approximately $0.005/GB.

Q: How much usable storage does each Snowball device provide?

Snowball Standard: 80TB. Snowball Edge Storage Optimized: 210TB. Snowball Edge Compute Optimized: 100TB. Snowball Edge Compute Optimized with EBS: 28TB NVMe. Snowmobile: 100PB.

Q: When should I use Snowball instead of direct network transfer?

Snowball is only faster than direct transfer when the network transfer time exceeds the Snowball round-trip time (shipping + loading). At 1Gbps, the crossover is approximately 50-180TB. At 10Gbps, the crossover is approximately 500TB-1.8PB. For datasets under 10TB, use direct S3 upload or AWS DataSync. Always calculate both timelines before committing.

Q: How does encryption work on Snowball devices?

Every Snowball device encrypts data at rest using AES-256 with keys managed via AWS KMS. You specify a KMS key ARN when creating the job. The S3 adapter encrypts each object using a data key derived from the KMS key. Without the KMS key, the data is unreadable. When AWS receives the device, they use the KMS key to decrypt and load data into your S3 bucket.

Q: How do I handle cross-account KMS key migrations?

If the KMS key is in Account B but the Snowball job is in Account A, Account A's Snowball job IAM role must have kms:Decrypt and kms:GenerateDataKey permissions on Account B's key. Account B's KMS key policy must explicitly allow Account A's IAM role. Always specify the KMS key ARN explicitly in the job configuration. Never rely on default key behavior.

Q: What should I do if AWS rejects my Snowball device?

Common rejection reasons include manifest mismatch (files on device do not match manifest), KMS key mismatch (device encrypted with wrong key), or device tampering (security seal broken). If the device is rejected, you must order a new device (5-day wait), fix the source issues, re-copy the data, and re-ship. There is no fix for a rejected device — this adds 2-3 weeks to your timeline.

Q: Can I re-run copy scripts on a Snowball device?

Never re-run copy scripts on a Snowball device. Overwriting files on the device corrupts the manifest. If you must re-copy, cancel the job and start a new one. A cleanup script that re-ran the initial copy has caused manifest corruption, resulting in device rejection by AWS.

📅 2026-04-10 ⏱ 8 min read 🎯 Intermediate

Where developers are forged. · Structured learning · Free forever.

📍 Part of: Cloud → Topic 22 of 22

AWS Snowball is a physical data transport device for petabyte-scale migrations.

⚙️ Intermediate — basic DevOps knowledge assumed

In this tutorial, you'll learn

AWS Snowball is a physical data transport device for petabyte-scale migrations.

AWS Snowball Family: Snowball, Snowball Edge, and Snowmobile
Snowball Job Lifecycle: From Order to S3
Snowball Edge: Local Compute and Edge Computing Use Cases

✦ Plain-English analogy ✦ Real code with output ✦ Interview questions

⚡Quick Answer

Snowball: 80TB (Standard) or 210TB (Snowball Edge Storage Optimized) usable capacity
Snowball Edge: adds local compute (EC2 instances, Lambda) for edge processing
Snowmobile: 100PB container truck for exabyte-scale migrations
Data transfer: copy files to device via S3 adapter or NFS mount, ship device back to AWS
Encryption: AES-256, keys managed via AWS KMS — device is useless without the key
Speed: 80TB over 1Gbps link = ~7 days. Snowball ships in ~1 week. Net savings for datasets >50TB.
Cost: $300 per job + shipping. Cheaper than egress fees for large datasets.
The S3 adapter is not a real S3 endpoint — it emulates S3 API. Some SDK features (multipart upload completion, versioning) behave differently.
The manifest file is your only proof of what was loaded. Lose it, lose the job.
Not validating the final manifest against the source before shipping. If the device arrives at AWS with missing files, you start over — another 2-3 weeks.

🚨 START HERE

AWS Snowball Triage Cheat Sheet

Fast symptom-to-action for migration engineers investigating Snowball failures. First 5 minutes.

🟡Manifest file count does not match source directory

Immediate ActionCompare source file count with manifest entry count.

Commands

find /source -type f | wc -l

jq '.entries | length' manifest.json

Fix NowIf counts differ, check for files with paths >1024 bytes or unsupported characters. Relocate and re-copy.

🟡Copy speed < 50MB/s on 10GbE connection

Immediate ActionVerify you are using the 10GbE data port, not the 1GbE management port.

Commands

ethtool eth1 | grep Speed

iftop -i eth1 -t -s 1

Fix NowSwitch to the correct network port. If using 10GbE and still slow, check for small file bottleneck — tar files < 1MB before copying.

🟡KMS 'Access Denied' during copy or after AWS loading

Immediate ActionVerify KMS key permissions for the Snowball job IAM role.

Commands

aws iam simulate-principal-policy --policy-source-arn <role-arn> --action-names kms:Decrypt kms:GenerateDataKey --resource-arns <kms-key-arn>

aws kms describe-key --key-id <kms-key-arn> --query 'KeyMetadata.KeyState'

Fix NowIf cross-account, add the Snowball role ARN to the KMS key policy with kms:Decrypt and kms:GenerateDataKey. Re-create the job with the correct key ARN.

🟡Snowball Edge compute instance disk full

Immediate ActionCheck device storage — compute and transfer share the same physical disk.

Commands

snowballEdge describe-device --device-id <id>

df -h /mnt/snowball

Fix NowReduce data loaded to device to leave headroom for compute. Or use a Storage Optimized device (210TB) instead of Compute Optimized (100TB).

Production IncidentThe 47-Day Migration Delay: Lost Manifest File and KMS Key MismatchA media company migrating 180TB of video archives to S3 used three Snowball Edge devices in parallel. One device's manifest file was overwritten by a script that re-ran the initial copy. The manifest no longer matched the device contents. AWS rejected the device on arrival. A second device had data encrypted with a KMS key from a different AWS account — the target account could not decrypt it. What should have been a 2-week migration took 47 days.

SymptomTwo of three Snowball devices were rejected by AWS on arrival. Device 1: 'Manifest mismatch — device contents do not match manifest.' Device 2: 'Access denied — KMS key not found in target account.' Only one device of three was successfully loaded into S3.

AssumptionThe team assumed the Snowball process was straightforward: copy data, ship device, done. They did not validate manifests before shipping and did not verify KMS key permissions across accounts.

Root causeDevice 1: A cleanup script ran 'cp -r /source/* /snowball/' twice. The second run overwrote the manifest.json file that Snowball's S3 adapter generates during the first copy. The manifest recorded only the second run's files, but the device contained files from both runs (some overwritten, some not). AWS's validation detected the mismatch and rejected the device. Device 2: The Snowball job was created in Account A, but the KMS key used for encryption was in Account B. The device encrypted data with Account B's key. On arrival, AWS tried to decrypt using Account A's key — access denied. The Snowball job's IAM role did not have kms:Decrypt permission on Account B's key.

Fix1. Device 1: Ordered a new Snowball device (5-day wait). Re-ran the copy with a single, idempotent script. Added manifest validation: diff manifest.json against a file list generated from source directory. 2. Device 2: Created a new Snowball job with a KMS key ARN from Account B explicitly specified. Added Account A's Snowball job role to Account B's KMS key policy with kms:Decrypt and kms:DescribeKey permissions. 3. Added a pre-ship checklist: (a) manifest matches source file count and total size, (b) KMS key ARN matches target account, (c) IAM role has kms:Decrypt on the specified key, (d) device status is 'Ready to ship' in the console. 4. Added a CI pipeline step that generates and validates the manifest before marking the job as ready.

Key Lesson

The manifest file is the contract between your data and AWS. If it does not match the device contents exactly, the device is rejected. Validate it before shipping.KMS key permissions must be verified across accounts BEFORE creating the Snowball job. A key mismatch discovered at AWS's loading facility means a 2-week round trip restart.Never re-run copy scripts on a Snowball device. If you must re-copy, cancel the job and start a new one. Overwriting files on the device corrupts the manifest.Add a pre-ship checklist as a mandatory gate in your migration pipeline. The cost of a rejected device is 2-3 weeks of delay, not just the device cost.For cross-account migrations, specify the KMS key ARN explicitly in the job configuration. Do not rely on default key behavior.

Production Debug GuideSymptom-to-action guide for Snowball job lifecycle issues, transfer failures, and shipping rejections

Snowball device shows 'Ready to ship' but manifest file count does not match source directory→The S3 adapter may have skipped files due to path length limits (>1024 bytes), unsupported characters, or file size limits. Run: diff <(find /source -type f | wc -l) <(jq '.entries | length' manifest.json). If counts differ, grep for files with long paths or special characters and relocate or rename them.

Snowball S3 adapter returns 'Access Denied' during copy→Check the Snowball job's IAM role. The role needs s3:PutObject, s3:CreateBucket, and kms:GenerateDataKey on the target bucket and KMS key. Verify with: aws iam simulate-principal-policy --policy-source-arn <role-arn> --action-names s3:PutObject --resource-arns arn:aws:s3:::<bucket>/*.

Copy speed to Snowball device is slower than expected (<100MB/s on 10GbE)→Check: (1) network interface — Snowball Edge supports 10GbE and 25GbE; ensure you are using the correct port. (2) File size distribution — millions of small files (<1MB) are 10-50x slower than large files due to per-file overhead. (3) Concurrent copy jobs — the S3 adapter is single-threaded per job; run multiple s3 cp sessions in parallel.

AWS rejects the returned device: 'Job validation failed'→Check the job status in the Snowball console. Common rejection reasons: manifest mismatch (files on device do not match manifest), KMS key mismatch (device encrypted with wrong key), device tampered (security seal broken). If manifest mismatch, you must order a new device and re-copy.

Snowball Edge EC2 instance fails to launch with 'Insufficient capacity'→Snowball Edge compute instances share storage with the data transfer volume. If the device is full of transferred data, there is no capacity for compute. Check device storage: snowballEdge describe-device --device-id <id>. Reduce loaded data or use a larger device.

Data loaded to S3 has different checksums than source files→Compare SHA-256 checksums between source and S3: generate checksums before copy, store in a manifest file, verify after AWS loading completes. If checksums differ, file an AWS Support case with the job ID and manifest. Do not delete source data until verification is complete.

AWS Snowball is a physical data transport service designed for datasets too large to transfer over the network. When transferring 50TB+ over a 1Gbps connection takes weeks, Snowball ships a ruggedized storage device to your location, you copy data onto it, and ship it back. AWS loads the data into S3.

The Snowball family includes three tiers: Snowball (80TB storage, no compute), Snowball Edge (up to 210TB storage with local EC2/Lambda compute), and Snowmobile (100PB shipping container). Each tier serves a different scale and use case — from one-time data center migrations to ongoing edge computing in disconnected environments.

The common misconception is that Snowball is simple — order, copy, ship. In practice, the job lifecycle involves IAM roles, KMS key management, manifest validation, S3 adapter compatibility issues, and shipping logistics that span 2-4 weeks per job. A single mistake in the manifest or key management can force a complete restart.

AWS Snowball Family: Snowball, Snowball Edge, and Snowmobile

The AWS Snow family consists of three physical data transport devices, each designed for a different scale and use case.

Snowball (Standard): - 80TB usable storage (50TB on older models) - No local compute capability - Purpose: one-time data migration to/from S3 - Cost: $300 per job + shipping - Typical use: data center decommission, initial cloud migration

Snowball Edge: - Three variants: Storage Optimized (210TB), Compute Optimized (100TB + GPU option), Compute Optimized with EBS (28TB NVMe) - Local EC2 instances and Lambda functions via AWS IoT Greengrass - Purpose: edge computing in disconnected or limited-connectivity environments - Cost: $400-600 per job + shipping, or on-premises pricing for long-term edge deployment - Typical use: military field operations, oil rig data processing, content distribution, disaster recovery

Snowmobile: - 100PB capacity in a 45-foot shipping container - Purpose: exabyte-scale data center migration - Cost: approximately $0.005/GB (~$500K per 100PB) - Typical use: enterprise data center consolidation, media archive migration - Requires a semi-truck, dedicated security escort, and site preparation

Decision point: if your dataset is under 80TB and you have a stable network connection exceeding 1Gbps, consider AWS DataSync or direct S3 transfer instead. The Snowball shipping overhead (order processing, transit, AWS loading) adds 2-3 weeks. For 80TB over a 10Gbps link, direct transfer takes ~18 hours — faster than Snowball.

io/thecodeforge/aws/snowball_capacity_calculator.py · PYTHON

12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788

package io.thecodeforge.aws;

# Snowball capacity and transfer time calculator.
# Use this to determine whether Snowball or direct transfer is faster.

from dataclasses import dataclass
from enum import Enum


class SnowballDevice(Enum):
    """Snowball device tiers with usable capacity in TB."""
    SNOWBALL_STANDARD = 80
    SNOWBALL_EDGE_STORAGE = 210
    SNOWBALL_EDGE_COMPUTE = 100
    SNOWBALL_EDGE_COMPUTE_GPU = 100
    SNOWBALL_EDGE_EBS = 28
    SNOWMOBILE = 100_000


@dataclass
class TransferEstimate:
    device: SnowballDevice
    data_size_tb: float
    network_speed_gbps: float
    snowball_shipping_days: int = 14  # average round-trip shipping time
    aws_loading_days: int = 2  # time for AWS to load data into S3

    def direct_transfer_days(self) -> float:
        """Calculate days to transfer over network."""
        # Convert TB to bits, Gbps to bits/day
        data_bits = self.data_size_tb * 1e12 * 8
        bits_per_day = self.network_speed_gbps * 1e9 * 86400
        return data_bits / bits_per_day

    def snowball_transfer_days(self) -> float:
        """Calculate total Snowball timeline."""
        # Copy to device: assume 100MB/s average (realistic for mixed files)
        copy_speed_mbps = 100
        data_mb = self.data_size_tb * 1e6
        copy_days = data_mb / (copy_speed_mbps * 86400)
        return copy_days + self.snowball_shipping_days + self.aws_loading_days

    def recommendation(self) -> str:
        """Return recommendation: Snowball or direct transfer."""
        direct = self.direct_transfer_days()
        snowball = self.snowball_transfer_days()

        if self.data_size_tb > self.device.value:
            return (
                f"DATA EXCEEDS DEVICE CAPACITY. "
                f"{self.data_size_tb}TB > {self.device.value}TB. "
                f"Use multiple devices or Snowmobile."
            )

        if direct < snowball:
            return (
                f"DIRECT TRANSFER is faster. "
                f"Direct: {direct:.1f} days vs Snowball: {snowball:.1f} days. "
                f"At {self.network_speed_gbps}Gbps, network transfer wins."
            )
        else:
            savings = snowball - direct
            return (
                f"SNOWBALL is faster. "
                f"Snowball: {snowball:.1f} days vs Direct: {direct:.1f} days. "
                f"Saves {savings:.1f} days."
            )


def main():
    scenarios = [
        TransferEstimate(SnowballDevice.SNOWBALL_STANDARD, 50, 1),
        TransferEstimate(SnowballDevice.SNOWBALL_STANDARD, 50, 10),
        TransferEstimate(SnowballDevice.SNOWBALL_EDGE_STORAGE, 150, 1),
        TransferEstimate(SnowballDevice.SNOWBALL_EDGE_STORAGE, 150, 10),
        TransferEstimate(SnowballDevice.SNOWBALL_EDGE_STORAGE, 200, 1),
    ]

    for scenario in scenarios:
        print(f"\n--- {scenario.data_size_tb}TB @ {scenario.network_speed_gbps}Gbps ---")
        print(f"  Device: {scenario.device.name} ({scenario.device.value}TB)")
        print(f"  Direct transfer: {scenario.direct_transfer_days():.1f} days")
        print(f"  Snowball total:  {scenario.snowball_transfer_days():.1f} days")
        print(f"  {scenario.recommendation()}")


if __name__ == '__main__':
    main()

Mental Model

The Snowball Crossover Point

At 1Gbps, the crossover is approximately 50TB. At 10Gbps, it is approximately 500TB. Below these thresholds, direct transfer is faster. Above them, Snowball saves time.

1Gbps network: 80TB takes ~7 days. Snowball round-trip: ~16 days. Direct transfer wins.
1Gbps network: 200TB takes ~18 days. Snowball round-trip: ~18 days. Break-even.
1Gbps network: 500TB takes ~46 days. Snowball round-trip: ~18 days. Snowball wins by 28 days.
10Gbps network: 500TB takes ~4.6 days. Snowball round-trip: ~18 days. Direct transfer wins.
Rule: calculate both timelines before committing to Snowball. It is not always faster.

📊 Production Insight

A SaaS company decided to use Snowball to migrate 45TB of analytics data to S3. They had a 10Gbps direct connect to AWS. The Snowball job took 22 days (order processing, device transit, copy, return shipping, AWS loading). Direct transfer over their 10Gbps link would have taken approximately 10 hours. They lost 21 days by choosing Snowball for a dataset that was well within direct transfer range.

Cause: assumed Snowball is always faster for large datasets without calculating the crossover point. Effect: 22-day migration instead of 10-hour transfer. Impact: analytics platform was delayed by 3 weeks, missing a customer-facing launch deadline. Action: always calculate both timelines. Use the formula: direct_days = (data_tb 8) / (network_gbps 0.0864). Snowball_days = 14 (shipping) + 2 (AWS loading) + (data_tb / 8.64) (copy at 100MB/s).

🎯 Key Takeaway

Snowball is not always faster than direct transfer. Calculate the crossover point: if direct transfer time is less than 16 days (Snowball round-trip), use the network. At 1Gbps, crossover is ~180TB. At 10Gbps, crossover is ~1.8PB. Always calculate before committing.

Snowball vs Direct Transfer vs DataSync Decision

IfDataset < 10TB

→

UseUse direct S3 upload or AWS DataSync. Snowball shipping overhead is not justified.

IfDataset 10-50TB with network > 1Gbps

→

UseUse direct transfer or DataSync. Calculate transfer time — it is likely faster than Snowball.

IfDataset 50-500TB with 1Gbps network

→

UseUse Snowball. Direct transfer takes weeks. Snowball saves time.

IfDataset > 500TB

→

UseUse Snowball Edge (210TB per device, multiple in parallel) or Snowmobile (100PB).

IfNeed local compute at edge (disconnected site)

→

UseUse Snowball Edge Compute Optimized. Run EC2/Lambda locally, sync to S3 when connected.

IfDataset > 10PB

→

UseUse Snowmobile. 100PB per container. Requires site preparation and security escort.

IfOngoing sync between on-premises and S3

→

UseUse AWS DataSync. Snowball is for one-time or infrequent bulk transfers, not continuous sync.

Snowball Job Lifecycle: From Order to S3

A Snowball job has five phases. Each phase has specific failure modes that can add weeks to your timeline.

Phase 1: Create Job - Specify source/destination S3 bucket, KMS key, IAM role, and device type. - AWS provisions the device and prepares the manifest. - Timeline: 1-3 business days. - Failure mode: IAM role or KMS key misconfiguration discovered later at device loading.

Phase 2: Receive Device - AWS ships the device to your specified address. - Timeline: 2-5 business days depending on region. - Failure mode: shipping delays, incorrect address, device arrives damaged.

Phase 3: Copy Data - Install the Snowball client or use the S3 adapter. - Copy files to the device using s3 cp, s3 sync, or the Snowball client. - Timeline: depends on data size and network speed (typically 100-300MB/s). - Failure mode: manifest corruption from re-running copy scripts, KMS permission errors, network bottlenecks.

Phase 4: Ship Device Back - Use the pre-paid shipping label. - Verify the device status is 'Ready to ship' in the console. - Timeline: 2-5 business days. - Failure mode: shipping damage, lost in transit, incorrect return address.

Phase 5: AWS Loads Data - AWS validates the manifest, decrypts data, and loads it into your S3 bucket. - Timeline: 1-3 business days. - Failure mode: manifest mismatch (device rejected), KMS decryption failure, S3 bucket policy blocking writes.

Total typical timeline: 10-20 business days per device. Plan your migration timeline accordingly.

io/thecodeforge/aws/snowball_job_manager.py · PYTHON

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216

import boto3
import json
import hashlib
import os
from dataclasses import dataclass, field
from typing import Optional, List
from datetime import datetime


@dataclass
class SnowballJobConfig:
    """Configuration for a Snowball migration job."""
    job_id: str
    device_type: str  # STANDARD | EDGE_STORAGE | EDGE_COMPUTE
    s3_bucket: str
    s3_prefix: str
    kms_key_arn: str
    iam_role_arn: str
    source_directory: str
    shipping_address: dict
    notification_sns_topic: str


@dataclass
class ManifestEntry:
    """Single entry in the transfer manifest."""
    file_path: str
    s3_key: str
    size_bytes: int
    sha256_hash: str
    last_modified: str


class SnowballJobManager:
    """Manages Snowball job lifecycle and pre-ship validation."""

    def __init__(self, config: SnowballJobConfig):
        self.config = config
        self.snowball_client = boto3.client('snowball')
        self.kms_client = boto3.client('kms')
        self.iam_client = boto3.client('iam')
        self.s3_client = boto3.client('s3')

    def generate_manifest(self, source_dir: str) -> List[ManifestEntry]:
        """Generate a manifest of all files to be transferred."""
        manifest = []
        total_size = 0
        skipped_files = []

        for dirpath, dirnames, filenames in os.walk(source_dir):
            for filename in filenames:
                filepath = os.path.join(dirpath, filename)

                # Check path length — Snowball S3 adapter rejects > 1024 bytes
                if len(filepath.encode('utf-8')) > 1024:
                    skipped_files.append(filepath)
                    continue

                try:
                    size = os.path.getsize(filepath)
                    total_size += size

                    sha256 = hashlib.sha256()
                    with open(filepath, 'rb') as f:
                        for chunk in iter(lambda: f.read(8192), b''):
                            sha256.update(chunk)

                    s3_key = os.path.relpath(filepath, source_dir)

                    manifest.append(ManifestEntry(
                        file_path=filepath,
                        s3_key=f"{self.config.s3_prefix}/{s3_key}",
                        size_bytes=size,
                        sha256_hash=sha256.hexdigest(),
                        last_modified=datetime.fromtimestamp(
                            os.path.getmtime(filepath)
                        ).isoformat(),
                    ))
                except OSError as e:
                    skipped_files.append(f"{filepath}: {e}")

        if skipped_files:
            print(f"WARNING: {len(skipped_files)} files skipped (path too long or inaccessible)")
            for f in skipped_files[:10]:
                print(f"  - {f}")

        return manifest

    def validate_pre_ship(self, manifest: List[ManifestEntry]) -> dict:
        """Run all pre-ship validations. Returns validation result."""
        errors = []
        warnings = []

        # 1. Validate KMS key exists and is accessible
        try:
            key_response = self.kms_client.describe_key(
                KeyId=self.config.kms_key_arn
            )
            key_state = key_response['KeyMetadata']['KeyState']
            if key_state != 'Enabled':
                errors.append(f"KMS key is not enabled: {key_state}")
        except Exception as e:
            errors.append(f"KMS key validation failed: {e}")

        # 2. Validate IAM role has required permissions
        required_actions = ['s3:PutObject', 's3:CreateBucket', 'kms:GenerateDataKey']
        for action in required_actions:
            try:
                response = self.iam_client.simulate_principal_policy(
                    PolicySourceArn=self.config.iam_role_arn,
                    ActionNames=[action],
                    ResourceArns=[f"arn:aws:s3:::{self.config.s3_bucket}/*"],
                )
                decision = response['EvaluationResults'][0]['EvalDecision']
                if decision != 'allowed':
                    errors.append(f"IAM role missing {action} permission")
            except Exception as e:
                errors.append(f"IAM permission check failed for {action}: {e}")

        # 3. Validate manifest has entries
        if not manifest:
            errors.append("Manifest is empty — no files to transfer")

        # 4. Validate total size fits on device
        total_gb = sum(e.size_bytes for e in manifest) / (1024 ** 3)
        device_limits = {
            'STANDARD': 80,
            'EDGE_STORAGE': 210,
            'EDGE_COMPUTE': 100,
        }
        limit = device_limits.get(self.config.device_type, 80)
        if total_gb > limit:
            errors.append(
                f"Data size {total_gb:.1f}TB exceeds device capacity {limit}TB"
            )
        elif total_gb > limit * 0.9:
            warnings.append(
                f"Data size {total_gb:.1f}TB is >90% of device capacity {limit}TB. "
                f"Consider leaving headroom."
            )

        # 5. Validate no files have paths > 1024 bytes
        long_paths = [e for e in manifest if len(e.file_path.encode('utf-8')) > 1024]
        if long_paths:
            errors.append(
                f"{len(long_paths)} files have paths > 1024 bytes. "
                f"S3 adapter will reject them."
            )

        return {
            'valid': len(errors) == 0,
            'errors': errors,
            'warnings': warnings,
            'manifest_entries': len(manifest),
            'total_size_gb': round(total_gb, 2),
            'device_type': self.config.device_type,
            'device_capacity_tb': limit,
        }

    def save_manifest(self, manifest: List[ManifestEntry], output_path: str):
        """Save manifest to JSON for post-transfer verification."""
        manifest_data = {
            'job_id': self.config.job_id,
            'generated_at': datetime.utcnow().isoformat(),
            'total_entries': len(manifest),
            'total_bytes': sum(e.size_bytes for e in manifest),
            'entries': [
                {
                    'file_path': e.file_path,
                    's3_key': e.s3_key,
                    'size_bytes': e.size_bytes,
                    'sha256': e.sha256_hash,
                    'last_modified': e.last_modified,
                }
                for e in manifest
            ],
        }

        with open(output_path, 'w') as f:
            json.dump(manifest_data, f, indent=2)

        print(f"Manifest saved: {len(manifest)} entries, "
              f"{manifest_data['total_bytes'] / (1024**3):.2f}GB")

    def verify_post_load(self, manifest_path: str) -> dict:
        """Verify S3 contents match manifest after AWS loading."""
        with open(manifest_path, 'r') as f:
            manifest_data = json.load(f)

        mismatches = []
        missing = []

        for entry in manifest_data['entries']:
            try:
                response = self.s3_client.head_object(
                    Bucket=self.config.s3_bucket,
                    Key=entry['s3_key'],
                )
                s3_size = response['ContentLength']
                if s3_size != entry['size_bytes']:
                    mismatches.append({
                        'key': entry['s3_key'],
                        'expected_size': entry['size_bytes'],
                        'actual_size': s3_size,
                    })
            except self.s3_client.exceptions.ClientError:
                missing.append(entry['s3_key'])

        return {
            'verified': len(mismatches) == 0 and len(missing) == 0,
            'total_checked': len(manifest_data['entries']),
            'missing': len(missing),
            'size_mismatches': len(mismatches),
            'missing_files': missing[:20],
            'mismatched_files': mismatches[:20],
        }

Mental Model

The Five-Phase Timeline

If you plan for 'Snowball takes a week', you will be 10-15 days short. Plan for the full pipeline: order (3d) + shipping (5d) + copy (varies) + return shipping (5d) + AWS loading (3d).

Phase 1 (Create): 1-3 days. Validate IAM, KMS, S3 before creating the job.
Phase 2 (Receive): 2-5 days. Shipping time varies by region and carrier.
Phase 3 (Copy): variable. Depends on data size and local network speed.
Phase 4 (Return): 2-5 days. Use pre-paid label. Verify 'Ready to ship' status.
Phase 5 (Load): 1-3 days. AWS validates manifest and loads into S3. Failures here mean starting over.

📊 Production Insight

A healthcare company migrating 120TB of medical imaging data to S3 planned for a 2-week Snowball migration. The actual timeline was 6 weeks. Delays: (1) KMS key validation failed because the key was in a different account — 1 week to resolve cross-account permissions. (2) Device arrived damaged — 1 week to get a replacement. (3) The copy phase revealed that 30% of files had paths exceeding 1024 bytes, which the S3 adapter rejected — 1 week to rename files. (4) The return shipment was delayed by a holiday — 3 extra days.

Cause: no pre-flight validation of KMS permissions, file path lengths, or shipping logistics. Effect: 6-week timeline instead of planned 2 weeks. Impact: data center decommission delayed by 1 month, incurring $45K in extended lease costs. Action: run pre-flight validation (KMS, IAM, file paths, shipping calendar) before creating the Snowball job.

🎯 Key Takeaway

A Snowball job is a 10-20 business day pipeline with five phases, each with distinct failure modes. Validate everything before creating the job: KMS permissions, IAM roles, file path lengths, and shipping logistics. Never re-run copy scripts on the same device — cancel and restart. The manifest is your only proof of what was loaded.

Snowball Job Phase Failure Recovery

IfJob creation fails — IAM or KMS error

→

UseFix the IAM role or KMS key policy. Re-create the job. No device cost incurred yet.

IfDevice arrives damaged or with wrong configuration

→

UseContact AWS Support immediately. Request a replacement device. Do not use a damaged device — it may be rejected at AWS.

IfCopy phase fails — manifest corruption or file errors

→

UseDo NOT re-run the copy on the same device. Cancel the job, order a new device, fix the source issues, and start fresh.

IfDevice shipped back but rejected by AWS — manifest mismatch

→

UseOrder a new device. Re-copy from source with validated manifest. This adds 2-3 weeks. There is no fix for a rejected device.

IfData loaded to S3 but checksums do not match source

→

UseFile an AWS Support case with job ID and manifest. AWS can investigate the loading process. Do not delete source data until checksums are verified.

Snowball Edge: Local Compute and Edge Computing Use Cases

Snowball Edge extends the basic Snowball by adding local compute capabilities. You can run EC2 instances and Lambda functions directly on the device, processing data at the edge before (or instead of) transferring it to S3.

Snowball Edge variants: - Storage Optimized: 210TB usable storage, 40 vCPUs, 80GB RAM. Best for bulk data transfer with light processing. - Compute Optimized: 100TB storage, 52 vCPUs, 208GB RAM. Best for heavy compute at the edge. - Compute Optimized with GPU: 100TB storage, 52 vCPUs, 208GB RAM, NVIDIA V100 GPU. Best for ML inference at the edge. - Compute Optimized with EBS: 28TB NVMe EBS-like storage. Best for applications requiring fast random access.

Edge computing use cases: - Military field operations: process ISR (intelligence, surveillance, reconnaissance) data in disconnected environments. Run ML inference locally, ship only results. - Oil and gas: process seismic data at remote drilling sites. 100TB of raw seismic data reduced to 2TB of processed results. - Content distribution: pre-load content libraries onto Snowball Edge devices and ship to retail locations for local streaming. - Disaster recovery: deploy Snowball Edge to a disaster site, run local infrastructure (databases, web servers) while the primary site is being restored.

Important: Snowball Edge compute instances share storage with the data transfer volume. If you load 200TB onto a Storage Optimized device, the EC2 instance has minimal remaining storage. Plan compute and storage allocations separately.

io/thecodeforge/aws/snowball_edge_manager.py · PYTHON

1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495

package io.thecodeforge.aws;

import boto3
import json
from dataclasses import dataclass
from typing import Optional


@dataclass
class EdgeComputeConfig:
    """Configuration for Snowball Edge compute workloads."""
    device_id: str
    instance_type: str  # sbg1.medium, sbg1.large, sbg1.xlarge
    ami_id: str
    vcpus: int
    memory_gb: int
    ebs_volumes: list  # list of {size_gb, mount_point}


class SnowballEdgeManager:
    """Manages Snowball Edge compute instances and edge workloads."""

    def __init__(self, device_ip: str, unlock_code: str, manifest_path: str):
        self.device_ip = device_ip
        self.unlock_code = unlock_code
        self.manifest_path = manifest_path
        # Snowball Edge exposes a local EC2-compatible endpoint
        self.ec2_client = boto3.client(
            'ec2',
            endpoint_url=f"https://{device_ip}:8443",
            verify=False,  # Snowball Edge uses self-signed certs
        )
        self.s3_client = boto3.client(
            's3',
            endpoint_url=f"https://{device_ip}:8443",
            verify=False,
        )

    def list_local_instances(self) -> list:
        """List EC2 instances running on the Snowball Edge device."""
        response = self.ec2_client.describe_instances()
        instances = []
        for reservation in response['Reservations']:
            for instance in reservation['Instances']:
                instances.append({
                    'instance_id': instance['InstanceId'],
                    'state': instance['State']['Name'],
                    'type': instance['InstanceType'],
                    'private_ip': instance.get('PrivateIpAddress', 'N/A'),
                })
        return instances

    def launch_edge_instance(self, config: EdgeComputeConfig) -> dict:
        """Launch an EC2 instance on the Snowball Edge device."""
        response = self.ec2_client.run_instances(
            ImageId=config.ami_id,
            InstanceType=config.instance_type,
            MinCount=1,
            MaxCount=1,
        )
        instance = response['Instances'][0]
        return {
            'instance_id': instance['InstanceId'],
            'state': instance['State']['Name'],
        }

    def list_local_buckets(self) -> list:
        """List S3 buckets on the Snowball Edge device."""
        response = self.s3_client.list_buckets()
        return [bucket['Name'] for bucket in response['Buckets']]

    def get_device_storage_status(self) -> dict:
        """Get storage utilization on the Snowball Edge device."""
        # Note: actual API varies by Snowball Edge SDK version
        # This is a simplified example
        return {
            'note': 'Use snowballEdge describe-device --device-id <id> CLI command',
            'command': f'snowballEdge describe-device --device-id {self.device_ip}',
        }

    def validate_edge_workload(self, config: EdgeComputeConfig) -> list:
        """Validate that the edge workload fits on the device."""
        errors = []

        # Check instance type availability
        valid_types = ['sbg1.medium', 'sbg1.large', 'sbg1.xlarge']
        if config.instance_type not in valid_types:
            errors.append(f"Invalid instance type: {config.instance_type}. Valid: {valid_types}")

        # Check EBS volume sizes
        total_ebs_gb = sum(v['size_gb'] for v in config.ebs_volumes)
        if total_ebs_gb > 28000:  # 28TB max for EBS volumes
            errors.append(f"Total EBS size {total_ebs_gb}GB exceeds 28TB limit")

        return errors

Mental Model

Edge Compute Shares Storage with Data Transfer

Plan storage allocation before loading data. If you need both bulk transfer AND compute, use separate devices or reduce the data volume to leave headroom.

Storage Optimized (210TB): 40 vCPUs, 80GB RAM. Best for transfer-heavy workloads.
Compute Optimized (100TB): 52 vCPUs, 208GB RAM. Best for compute-heavy workloads.
Compute Optimized with GPU: adds NVIDIA V100 for ML inference at the edge.
EBS variant (28TB NVMe): fast random access. Best for database workloads at the edge.
Rule: if compute is primary, use Compute Optimized. If transfer is primary, use Storage Optimized. Never assume you get both at full capacity.

📊 Production Insight

A defense contractor deployed Snowball Edge Compute Optimized devices to forward operating bases for real-time drone footage analysis. They loaded 90TB of historical reference imagery onto the device, leaving 10TB for compute. The ML inference pipeline required 15TB of working storage for intermediate results. The pipeline crashed with 'disk full' errors after processing 3 hours of footage.

Cause: did not account for compute storage needs when loading reference data. Effect: ML pipeline crashed after 3 hours. Impact: 2-day delay while a new device was shipped to the forward base with pre-allocated storage partitions. Action: partition device storage explicitly — reserve 30TB for compute, load only 70TB of reference data. Test the full pipeline on the device before shipping to the field.

🎯 Key Takeaway

Snowball Edge compute and data transfer share physical storage. Plan allocations before loading data. For compute-heavy edge workloads, use Compute Optimized and reserve storage for compute. Test the full pipeline on the device before shipping to disconnected environments — there is no remote fix once the device is in the field.

Encryption, KMS Key Management, and Cross-Account Migrations

Every Snowball device encrypts data at rest using AES-256. The encryption key is managed via AWS KMS. Understanding the key management model is critical — a KMS mismatch discovered at AWS's loading facility means a 2-3 week round-trip restart.

Encryption flow: 1. You specify a KMS key ARN when creating the Snowball job. 2. When you copy data to the device, the S3 adapter encrypts each object using a data key derived from the KMS key. 3. The device stores encrypted data. Without the KMS key, the data is unreadable. 4. When AWS receives the device, they use the KMS key to decrypt and load data into your S3 bucket.

Cross-account migration: - The Snowball job is created in Account A. - The S3 destination bucket is in Account A. - The KMS key can be in Account A or Account B. - If the KMS key is in Account B, Account A's Snowball job IAM role must have kms:Decrypt and kms:GenerateDataKey permissions on Account B's key. - Account B's KMS key policy must explicitly allow Account A's IAM role.

Common failure: the job is created with a default KMS key (aws/s3) in Account A, but the S3 bucket has a bucket policy requiring a specific KMS key from Account B. AWS loads the data with Account A's key, but S3 rejects writes because the key does not match the bucket policy.

Best practice: always specify the KMS key ARN explicitly. Never rely on default key behavior for Snowball jobs.

io/thecodeforge/aws/snowball_kms_validator.py · PYTHON

package io.thecodeforge.aws;

import boto3
import json
from dataclasses import dataclass


@dataclass
class KMSValidationResult:
    key_arn: str
    key_exists: bool
    key_enabled: bool
    key_account: str
    job_account: str
    account_match: bool
    permissions_valid: bool
    bucket_policy_compatible: bool
    errors: list


class SnowballKMSValidator:
    """Validates KMS key configuration for Snowball jobs."""

    def __init__(self, job_account_region: str = 'us-east-1'):
        self.kms_client = boto3.client('kms', region_name=job_account_region)
        self.iam_client = boto3.client('iam', region_name=job_account_region)
        self.s3_client = boto3.client('s3', region_name=job_account_region)
        self.sts_client = boto3.client('sts', region_name=job_account_region)

    def validate_kms_for_snowball(
        self,
        kms_key_arn: str,
        iam_role_arn: str,
        s3_bucket: str,
    ) -> KMSValidationResult:
        """Comprehensive KMS validation for a Snowball job."""
        errors = []

        # Get current account
        job_account = self.sts_client.get_caller_identity()['Account']

        # 1. Verify KMS key exists and is enabled
        key_exists = False
        key_enabled = False
        key_account = ''
        try:
            key_response = self.kms_client.describe_key(KeyId=kms_key_arn)
            key_metadata = key_response['KeyMetadata']
            key_exists = True
            key_enabled = key_metadata['KeyState'] == 'Enabled'
            key_account = key_metadata['Arn'].split(':')[4]

            if not key_enabled:
                errors.append(f"KMS key is not enabled. State: {key_metadata['KeyState']}")

            if key_metadata.get('KeyManager') == 'AWS':
                errors.append(
                    "AWS-managed keys (aws/s3) are not recommended for Snowball. "
                    "Use a customer-managed key for cross-account compatibility."
                )
        except Exception as e:
            errors.append(f"KMS key not found or not accessible: {e}")

        # 2. Check cross-account permissions
        account_match = key_account == job_account
        permissions_valid = False

        if not account_match:
            errors.append(
                f"KMS key is in account {key_account} but Snowball job is in {job_account}. "
                f"Cross-account KMS permissions required."
            )

            # Check if the IAM role has permissions on the cross-account key
            try:
                for action in ['kms:Decrypt', 'kms:GenerateDataKey', 'kms:DescribeKey']:
                    response = self.iam_client.simulate_principal_policy(
                        PolicySourceArn=iam_role_arn,
                        ActionNames=[action],
                        ResourceArns=[kms_key_arn],
                    )
                    decision = response['EvaluationResults'][0]['EvalDecision']
                    if decision != 'allowed':
                        errors.append(
                            f"IAM role missing {action} on cross-account KMS key. "
                            f"Add permission to IAM policy AND KMS key policy."
                        )
                    else:
                        permissions_valid = True
            except Exception as e:
                errors.append(f"Permission simulation failed: {e}")
        else:
            permissions_valid = True

        # 3. Check S3 bucket policy compatibility
        bucket_policy_compatible = True
        try:
            bucket_policy = self.s3_client.get_bucket_policy(Bucket=s3_bucket)
            policy = json.loads(bucket_policy['Policy'])

            # Check if bucket policy requires a specific KMS key
            for statement in policy.get('Statement', []):
                if statement.get('Effect') == 'Deny':
                    condition = statement.get('Condition', {})
                    kms_condition = condition.get('StringNotEquals', {}).get(
                        's3:x-amz-server-side-encryption-aws-kms-key-id', ''
                    )
                    if kms_condition and kms_condition != kms_key_arn:
                        errors.append(
                            f"Bucket policy denies writes not encrypted with {kms_condition}. "
                            f"Snowball job uses {kms_key_arn}. Mismatch."
                        )
                        bucket_policy_compatible = False
        except self.s3_client.exceptions.NoSuchBucketPolicy:
            pass  # No bucket policy is fine
        except Exception as e:
            errors.append(f"Bucket policy check failed: {e}")

        return KMSValidationResult(
            key_arn=kms_key_arn,
            key_exists=key_exists,
            key_enabled=key_enabled,
            key_account=key_account,
            job_account=job_account,
            account_match=account_match,
            permissions_valid=permissions_valid,
            bucket_policy_compatible=bucket_policy_compatible,
            errors=errors,
        )

    def generate_cross_account_kms_policy(
        self,
        job_account_id: str,
        job_role_arn: str,
    ) -> str:
        """Generate a KMS key policy for cross-account Snowball access."""
        policy = {
            "Version": "2012-10-17",
            "Statement": [
                {
                    "Sid": "AllowSnowballJobAccount",
                    "Effect": "Allow",
                    "Principal": {
                        "AWS": job_role_arn
                    },
                    "Action": [
                        "kms:Decrypt",
                        "kms:GenerateDataKey",
                        "kms:DescribeKey",
                    ],
                    "Resource": "*",
                },
                {
                    "Sid": "AllowRootAccountFullAccess",
                    "Effect": "Allow",
                    "Principal": {
                        "AWS": f"arn:aws:iam::{job_account_id}:root"
                    },
                    "Action": "kms:*",
                    "Resource": "*",
                },
            ],
        }
        return json.dumps(policy, indent=2)

Mental Model

KMS Key Must Be Accessible by the Snowball Job Role

Cross-account KMS access requires permissions in TWO places: the IAM role's policy (Account A) and the KMS key's policy (Account B). Missing either one causes 'Access Denied' at AWS loading time.

Same account: IAM role needs kms:Decrypt and kms:GenerateDataKey on the key.
Cross account: IAM role needs the same permissions. KMS key policy must allow the role ARN.
Always specify the KMS key ARN explicitly in the Snowball job. Never rely on default key.
AWS-managed keys (aws/s3) cannot be shared cross-account. Use customer-managed keys.
Validate with iam:SimulatePrincipalPolicy before creating the job. Do not discover the error at AWS loading.

📊 Production Insight

A company migrating data from a subsidiary's AWS account (Account B) to the parent company's account (Account A) used Snowball. The job was created in Account A with the default KMS key (aws/s3). The S3 bucket in Account A had a bucket policy requiring all objects to be encrypted with a specific customer-managed KMS key from Account A. AWS loaded the data using the default key. S3 rejected all writes because the encryption key did not match the bucket policy. The entire device load failed silently — AWS reported 'Job completed with errors' but did not retry.

Cause: default KMS key did not match S3 bucket policy requirements. Effect: 180TB of data was not loaded into S3. Impact: 3-week delay — new device ordered, data re-copied with correct KMS key, re-shipped. Action: always specify the KMS key ARN explicitly. Validate bucket policy compatibility before creating the job.

🎯 Key Takeaway

Always specify the KMS key ARN explicitly in the Snowball job. Never rely on default key behavior. For cross-account migrations, validate both the IAM policy and the KMS key policy. A KMS mismatch discovered at AWS loading means a 2-3 week round-trip restart.

Performance Optimization: Maximizing Copy Speed to Snowball Devices

The copy phase is the only phase you control. Maximizing copy speed directly reduces your migration timeline.

Bottleneck analysis: - Network: Snowball Edge supports 10GbE and 25GbE. Standard Snowball supports 10GbE. If you connect at 1GbE, you are limited to ~100MB/s. - File size: millions of small files (<1MB) are 10-50x slower than large files due to per-file overhead (manifest entry, metadata, S3 adapter handshake). A 100MB/s large-file throughput drops to 5-10MB/s with millions of small files. - S3 adapter concurrency: the S3 adapter is single-threaded per copy session. Run multiple sessions in parallel to saturate the network. - Source disk I/O: if the source is on spinning disks, random read patterns for small files are 100x slower than sequential reads for large files.

Optimization strategies: 1. Tar small files: archive millions of small files into large tar files before copying. Copy 10GB tar files instead of 10 million 1KB files. 2. Parallel copy sessions: run 4-8 concurrent s3 cp sessions, each copying a different subdirectory. 3. Use 10GbE or 25GbE: verify the physical connection. The Snowball device's 1GbE management port is NOT the data port. 4. Pre-sort by file size: copy large files first (they saturate the network), then small files in batches. 5. Skip unnecessary files: exclude temp files, logs, and caches. Every unnecessary file adds per-file overhead.

io/thecodeforge/aws/snowball_copy_optimizer.py · PYTHON

package io.thecodeforge.aws;

import os
import subprocess
import tarfile
import tempfile
from dataclasses import dataclass
from concurrent.futures import ThreadPoolExecutor, as_completed
from pathlib import Path


@dataclass
class CopyStrategy:
    """Copy strategy based on file size distribution."""
    small_file_threshold_mb: float = 1.0
    tar_batch_size_mb: float = 1024.0  # 1GB tar archives
    parallel_sessions: int = 4
    exclude_patterns: list = None

    def __post_init__(self):
        if self.exclude_patterns is None:
            self.exclude_patterns = [
                '*.tmp', '*.temp', '*.log', '*.cache',
                '.DS_Store', 'Thumbs.db', '__pycache__',
                '.git', 'node_modules',
            ]


class SnowballCopyOptimizer:
    """Optimizes data copy to Snowball device for maximum throughput."""

    def __init__(self, source_dir: str, snowball_endpoint: str, bucket: str, prefix: str):
        self.source_dir = Path(source_dir)
        self.snowball_endpoint = snowball_endpoint
        self.bucket = bucket
        self.prefix = prefix

    def analyze_source(self) -> dict:
        """Analyze source directory to determine optimal copy strategy."""
        total_files = 0
        total_size = 0
        small_files = 0
        small_files_size = 0
        large_files = 0
        large_files_size = 0
        max_path_length = 0
        long_path_files = []

        for dirpath, dirnames, filenames in os.walk(self.source_dir):
            for filename in filenames:
                filepath = os.path.join(dirpath, filename)
                try:
                    size = os.path.getsize(filepath)
                except OSError:
                    continue

                total_files += 1
                total_size += size

                if len(filepath) > max_path_length:
                    max_path_length = len(filepath)

                if len(filepath) > 1024:
                    long_path_files.append(filepath)

                if size < 1 * 1024 * 1024:  # < 1MB
                    small_files += 1
                    small_files_size += size
                else:
                    large_files += 1
                    large_files_size += size

        return {
            'total_files': total_files,
            'total_size_gb': total_size / (1024**3),
            'small_files': small_files,
            'small_files_pct': (small_files / total_files * 100) if total_files > 0 else 0,
            'small_files_size_gb': small_files_size / (1024**3),
            'large_files': large_files,
            'large_files_size_gb': large_files_size / (1024**3),
            'max_path_length': max_path_length,
            'long_path_files': len(long_path_files),
            'recommendation': self._recommend_strategy(small_files, total_files),
        }

    def _recommend_strategy(self, small_files: int, total_files: int) -> str:
        """Recommend copy strategy based on file size distribution."""
        small_pct = (small_files / total_files * 100) if total_files > 0 else 0

        if small_pct > 50:
            return (
                "HIGH SMALL FILE COUNT. "
                "Tar small files into large archives before copying. "
                "Expected speedup: 10-50x."
            )
        elif small_pct > 20:
            return (
                "MODERATE SMALL FILE COUNT. "
                "Consider tarring files < 1MB. "
                "Run 4+ parallel copy sessions."
            )
        else:
            return (
                "LARGE FILE DOMINANT. "
                "Direct copy with parallel sessions should achieve near-max throughput. "
                "Use 4-8 parallel s3 cp sessions."
            )

    def create_tar_archives(self, strategy: CopyStrategy) -> list:
        """Create tar archives of small files for efficient transfer."""
        archives = []
        current_batch = []
        current_batch_size = 0
        batch_num = 0

        for dirpath, dirnames, filenames in os.walk(self.source_dir):
            for filename in filenames:
                filepath = os.path.join(dirpath, filename)
                try:
                    size = os.path.getsize(filepath)
                except OSError:
                    continue

                if size < strategy.small_file_threshold_mb * 1024 * 1024:
                    current_batch.append(filepath)
                    current_batch_size += size

                    if current_batch_size >= strategy.tar_batch_size_mb * 1024 * 1024:
                        archive_path = self._write_tar(current_batch, batch_num)
                        archives.append(archive_path)
                        current_batch = []
                        current_batch_size = 0
                        batch_num += 1

        # Write remaining batch
        if current_batch:
            archive_path = self._write_tar(current_batch, batch_num)
            archives.append(archive_path)

        return archives

    def _write_tar(self, files: list, batch_num: int) -> str:
        """Write a batch of files to a tar archive."""
        archive_name = f"small_files_batch_{batch_num:04d}.tar.gz"
        archive_path = os.path.join(tempfile.gettempdir(), archive_name)

        with tarfile.open(archive_path, 'w:gz') as tar:
            for filepath in files:
                arcname = os.path.relpath(filepath, self.source_dir)
                tar.add(filepath, arcname=arcname)

        return archive_path

    def parallel_copy(self, source_dirs: list, strategy: CopyStrategy) -> list:
        """Copy multiple directories in parallel to Snowball device."""
        results = []

        with ThreadPoolExecutor(max_workers=strategy.parallel_sessions) as executor:
            futures = {}
            for i, source in enumerate(source_dirs):
                s3_key = f"{self.prefix}/batch_{i:04d}"
                future = executor.submit(
                    self._copy_to_snowball, source, s3_key
                )
                futures[future] = source

            for future in as_completed(futures):
                source = futures[future]
                try:
                    result = future.result()
                    results.append({'source': source, 'status': 'success', 'result': result})
                except Exception as e:
                    results.append({'source': source, 'status': 'error', 'error': str(e)})

        return results

    def _copy_to_snowball(self, source: str, s3_key: str) -> dict:
        """Copy a single directory to the Snowball device via S3 adapter."""
        cmd = [
            'aws', 's3', 'cp',
            source,
            f"s3://{self.bucket}/{s3_key}/",
            '--recursive',
            '--endpoint-url', f"https://{self.snowball_endpoint}:8443",
            '--no-verify-ssl',
        ]

        result = subprocess.run(cmd, capture_output=True, text=True)

        if result.returncode != 0:
            raise RuntimeError(f"Copy failed: {result.stderr}")

        return {'command': ' '.join(cmd), 'stdout': result.stdout}

Mental Model

Small Files Are the Enemy of Snowball Copy Speed

If your dataset is >20% small files (<1MB), tar them into large archives before copying. The speedup is dramatic.

Per-file overhead: each file requires a manifest entry, metadata update, and S3 adapter handshake.
10 million 1KB files = 10GB total data. Copy time: ~2-3 days at 5MB/s.
10 thousand 1MB files = 10GB total data. Copy time: ~2 minutes at 100MB/s.
Tar small files into 1GB archives: 10 archives instead of 10 million files. Copy time: ~2 minutes.
Always run a source analysis before copying. If small files >20%, tar them first.

📊 Production Insight

A genomics research lab attempted to copy 50TB of sequencing data (composed of 50 million small files averaging 1KB each) to a Snowball Edge device. The copy ran at 3MB/s — projected completion: 193 days. They cancelled the job after 2 weeks of copying, having transferred only 3.6TB.

Cause: 50 million small files overwhelmed the S3 adapter's per-file overhead. Effect: 3MB/s throughput instead of the expected 100MB/s. Impact: 2 weeks of wasted copy time, 50TB still to transfer. Action: tarred all small files into 10GB archives (5,000 archives total). Re-ran the copy: 120MB/s throughput, completed in 5 days.

🎯 Key Takeaway

Small files are the enemy of Snowball copy speed. If >20% of your files are <1MB, tar them into large archives before copying. Run a source analysis first. The time spent tarring is recovered 10-50x in copy speed. And always use the 10GbE data port, not the 1GbE management port.

🗂 AWS Snowball Family Comparison

Side-by-side comparison of Snowball, Snowball Edge variants, and Snowmobile for data transport and edge computing.

Feature / Aspect	Snowball Standard	Snowball Edge Storage Optimized	Snowball Edge Compute Optimized	Snowball Edge Compute with GPU	Snowmobile
Usable storage	80TB	210TB	100TB	100TB	100PB
Compute	None	40 vCPUs, 80GB RAM	52 vCPUs, 208GB RAM	52 vCPUs, 208GB RAM, NVIDIA V100	N/A
Network interfaces	1x 10GbE	1x 10GbE, 1x 25GbE	1x 10GbE, 1x 25GbE	1x 10GbE, 1x 25GbE	Fiber optic connection on-site
Local S3 endpoint	Yes	Yes	Yes	Yes	N/A
EC2/Lambda support	No	Yes (limited)	Yes (full)	Yes (full, with GPU)	No
Cost per job	$300	$400-600	$400-600	$600+	~$0.005/GB
Shipping	Standard carrier	Standard carrier	Standard carrier	Standard carrier	Semi-truck with security escort
Typical timeline	10-20 business days	10-20 business days	10-20 business days	10-20 business days	Weeks to months (site prep required)
Best for	One-time data migration <80TB	Bulk transfer + light compute	Edge compute + moderate transfer	ML inference at edge	Exabyte-scale data center migration
Encryption	AES-256, KMS-managed	AES-256, KMS-managed	AES-256, KMS-managed	AES-256, KMS-managed	AES-256, KMS-managed
Tamper detection	TPM chip, tamper-evident enclosure, 256-bit encryption	TPM chip, tamper-evident enclosure, 256-bit encryption	TPM chip, tamper-evident enclosure, 256-bit encryption	TPM chip, tamper-evident enclosure, 256-bit encryption	Armed security escort, GPS tracking, tamper-evident seals
Use case example	Data center decommission, initial cloud migration	Content distribution, disaster recovery, bulk archival	Military field ops, oil rig processing, disconnected environments	Real-time video analysis, ML inference at edge	Enterprise data center consolidation, media archive migration

🎯 Key Takeaways

Frequently Asked Questions

What is AWS Snowball?

AWS Snowball is a ruggedized physical storage device that AWS ships to your data center for offline data transfer to and from S3. It is designed for datasets too large to transfer over the network, typically 50TB+. You copy data onto the device, ship it back to AWS, and they load it into S3 for you.

What is the difference between Snowball, Snowball Edge, and Snowmobile?

Snowball (Standard) provides 80TB usable storage with no compute capability, intended for one-time data migration to/from S3 at $300 per job. Snowball Edge adds local compute (EC2 instances, Lambda) for edge processing, with variants ranging from 28TB to 210TB storage at $400-600 per job. Snowmobile is a 100PB shipping container for exabyte-scale data center migrations at approximately $0.005/GB.

How much usable storage does each Snowball device provide?

Snowball Standard: 80TB. Snowball Edge Storage Optimized: 210TB. Snowball Edge Compute Optimized: 100TB. Snowball Edge Compute Optimized with EBS: 28TB NVMe. Snowmobile: 100PB.

How long does a Snowball job take from start to finish?

A Snowball job is a 10-20 business day pipeline with five phases: Phase 1 (Create Job): 1-3 business days. Phase 2 (Receive Device): 2-5 business days depending on region. Phase 3 (Copy Data): depends on data size and network speed, typically 100-300MB/s. Phase 4 (Ship Device Back): 2-5 business days. Phase 5 (AWS Loads Data): 1-3 business days.

When should I use Snowball instead of direct network transfer?

Snowball is only faster than direct transfer when the network transfer time exceeds the Snowball round-trip time (shipping + loading). At 1Gbps, the crossover is approximately 50-180TB. At 10Gbps, the crossover is approximately 500TB-1.8PB. For datasets under 10TB, use direct S3 upload or AWS DataSync. Always calculate both timelines before committing.

How does encryption work on Snowball devices?

Every Snowball device encrypts data at rest using AES-256 with keys managed via AWS KMS. You specify a KMS key ARN when creating the job. The S3 adapter encrypts each object using a data key derived from the KMS key. Without the KMS key, the data is unreadable. When AWS receives the device, they use the KMS key to decrypt and load data into your S3 bucket.

How do I handle cross-account KMS key migrations?

If the KMS key is in Account B but the Snowball job is in Account A, Account A's Snowball job IAM role must have kms:Decrypt and kms:GenerateDataKey permissions on Account B's key. Account B's KMS key policy must explicitly allow Account A's IAM role. Always specify the KMS key ARN explicitly in the job configuration. Never rely on default key behavior.

What is the manifest file and why is it important?

The manifest file is the contract between your data and AWS. It records all files loaded onto the device. If the manifest does not match the device contents exactly, AWS rejects the device upon arrival. Losing or corrupting the manifest means losing the job. Always validate the manifest against the source directory before shipping.

What should I do if AWS rejects my Snowball device?

Common rejection reasons include manifest mismatch (files on device do not match manifest), KMS key mismatch (device encrypted with wrong key), or device tampering (security seal broken). If the device is rejected, you must order a new device (5-day wait), fix the source issues, re-copy the data, and re-ship. There is no fix for a rejected device — this adds 2-3 weeks to your timeline.

Can I re-run copy scripts on a Snowball device?

Never re-run copy scripts on a Snowball device. Overwriting files on the device corrupts the manifest. If you must re-copy, cancel the job and start a new one. A cleanup script that re-ran the initial copy has caused manifest corruption, resulting in device rejection by AWS.

How can I maximize copy speed to a Snowball device?

Use 10GbE or 25GbE connections (not the 1GbE management port). Tar millions of small files (<1MB) into large archives before copying, as small files are 10-50x slower due to per-file overhead. Run 4-8 concurrent s3 cp sessions in parallel. Pre-sort by file size, copying large files first. Skip unnecessary files like temp files, logs, and caches.

What are the Snowball Edge compute capabilities?

Snowball Edge supports local EC2 instances and Lambda functions via AWS IoT Greengrass. Compute Optimized variant offers 52 vCPUs and 208GB RAM. There is also a GPU variant with NVIDIA V100 for ML inference. Note that compute instances share storage with the data transfer volume — loading 200TB onto a 210TB device leaves only 10TB for compute workloads.

What are typical use cases for Snowball Edge?

Military field operations for processing ISR data in disconnected environments. Oil and gas for processing seismic data at remote drilling sites. Content distribution by pre-loading content libraries and shipping to retail locations. Disaster recovery by deploying to a disaster site and running local infrastructure while the primary site is being restored.

What permissions does the Snowball job IAM role need?

The Snowball job IAM role needs s3:PutObject, s3:CreateBucket, and kms:GenerateDataKey on the target bucket and KMS key. For cross-account migrations, the role also needs kms:Decrypt and kms:DescribeKey permissions on the cross-account KMS key. Verify permissions with iam:SimulatePrincipalPolicy before creating the job.

What files does the Snowball S3 adapter reject?

The S3 adapter rejects files with paths exceeding 1024 bytes, files with unsupported characters, and has per-file overhead that significantly slows transfer of millions of small files. Run a pre-flight check: diff source file count against manifest entry count, and grep for files with long paths or special characters before copying.

What is the Snowball crossover point for direct transfer vs Snowball?

At 1Gbps network: 80TB takes ~7 days direct transfer vs ~16 days Snowball round-trip (direct wins). 200TB takes ~18 days direct vs ~18 days Snowball (break-even). 500TB takes ~46 days direct vs ~18 days Snowball (Snowball wins by 28 days). At 10Gbps network: 500TB takes ~4.6 days direct vs ~18 days Snowball (direct wins). Rule: calculate both timelines before committing.

How much does Snowball cost?

Snowball Standard costs $300 per job plus shipping. Snowball Edge costs $400-600 per job plus shipping, or on-premises pricing for long-term edge deployment. Snowmobile costs approximately $0.005/GB (~$500K per 100PB). Snowball is cheaper than egress fees for large datasets.

What is the biggest mistake teams make with Snowball?

Not validating the final manifest against the source before shipping. If the device arrives at AWS with missing files, you start over — another 2-3 weeks. Other major mistakes include not verifying KMS key permissions across accounts before creating the job, re-running copy scripts on the same device, and not calculating whether direct transfer would be faster.

What should a pre-ship checklist include?

A pre-ship checklist should verify: (a) manifest matches source file count and total size, (b) KMS key ARN matches target account, (c) IAM role has kms:Decrypt on the specified key, (d) device status is 'Ready to ship' in the console. This should be a mandatory gate in your migration pipeline.

How should I plan storage allocation on Snowball Edge for compute workloads?

Snowball Edge compute instances and the S3 data transfer volume share the same physical storage. Plan storage allocation before loading data. If you need both bulk transfer AND compute, use separate devices or reduce the data volume to leave headroom. Partition device storage explicitly — for example, reserve 30TB for compute and load only 70TB of reference data. Test the full pipeline on the device before shipping.

🔥

Naren Founder & Author

Developer and founder of TheCodeForge. I built this site because I was tired of tutorials that explain what to type without explaining why it works. Every article here is written to make concepts actually click.

About Naren Get in touch

Forged with 🔥 at TheCodeForge.io — Where Developers Are Forged