AWS Snowball: Data Migration, Edge Computing, and Physical Data Transport at Scale
- AWS Snowball Family: Snowball, Snowball Edge, and Snowmobile
- Snowball Job Lifecycle: From Order to S3
- Snowball Edge: Local Compute and Edge Computing Use Cases
- Snowball: 80TB (Standard) or 210TB (Snowball Edge Storage Optimized) usable capacity
- Snowball Edge: adds local compute (EC2 instances, Lambda) for edge processing
- Snowmobile: 100PB container truck for exabyte-scale migrations
- Data transfer: copy files to device via S3 adapter or NFS mount, ship device back to AWS
- Encryption: AES-256, keys managed via AWS KMS β device is useless without the key
- Speed: 80TB over 1Gbps link = ~7 days. Snowball ships in ~1 week. Net savings for datasets >50TB.
- Cost: $300 per job + shipping. Cheaper than egress fees for large datasets.
- The S3 adapter is not a real S3 endpoint β it emulates S3 API. Some SDK features (multipart upload completion, versioning) behave differently.
- The manifest file is your only proof of what was loaded. Lose it, lose the job.
- Not validating the final manifest against the source before shipping. If the device arrives at AWS with missing files, you start over β another 2-3 weeks.
Manifest file count does not match source directory
find /source -type f | wc -ljq '.entries | length' manifest.jsonCopy speed < 50MB/s on 10GbE connection
ethtool eth1 | grep Speediftop -i eth1 -t -s 1KMS 'Access Denied' during copy or after AWS loading
aws iam simulate-principal-policy --policy-source-arn <role-arn> --action-names kms:Decrypt kms:GenerateDataKey --resource-arns <kms-key-arn>aws kms describe-key --key-id <kms-key-arn> --query 'KeyMetadata.KeyState'Snowball Edge compute instance disk full
snowballEdge describe-device --device-id <id>df -h /mnt/snowballProduction Incident
Production Debug GuideSymptom-to-action guide for Snowball job lifecycle issues, transfer failures, and shipping rejections
AWS Snowball is a physical data transport service designed for datasets too large to transfer over the network. When transferring 50TB+ over a 1Gbps connection takes weeks, Snowball ships a ruggedized storage device to your location, you copy data onto it, and ship it back. AWS loads the data into S3.
The Snowball family includes three tiers: Snowball (80TB storage, no compute), Snowball Edge (up to 210TB storage with local EC2/Lambda compute), and Snowmobile (100PB shipping container). Each tier serves a different scale and use case β from one-time data center migrations to ongoing edge computing in disconnected environments.
The common misconception is that Snowball is simple β order, copy, ship. In practice, the job lifecycle involves IAM roles, KMS key management, manifest validation, S3 adapter compatibility issues, and shipping logistics that span 2-4 weeks per job. A single mistake in the manifest or key management can force a complete restart.
AWS Snowball Family: Snowball, Snowball Edge, and Snowmobile
The AWS Snow family consists of three physical data transport devices, each designed for a different scale and use case.
Snowball (Standard): - 80TB usable storage (50TB on older models) - No local compute capability - Purpose: one-time data migration to/from S3 - Cost: $300 per job + shipping - Typical use: data center decommission, initial cloud migration
Snowball Edge: - Three variants: Storage Optimized (210TB), Compute Optimized (100TB + GPU option), Compute Optimized with EBS (28TB NVMe) - Local EC2 instances and Lambda functions via AWS IoT Greengrass - Purpose: edge computing in disconnected or limited-connectivity environments - Cost: $400-600 per job + shipping, or on-premises pricing for long-term edge deployment - Typical use: military field operations, oil rig data processing, content distribution, disaster recovery
Snowmobile: - 100PB capacity in a 45-foot shipping container - Purpose: exabyte-scale data center migration - Cost: approximately $0.005/GB (~$500K per 100PB) - Typical use: enterprise data center consolidation, media archive migration - Requires a semi-truck, dedicated security escort, and site preparation
Decision point: if your dataset is under 80TB and you have a stable network connection exceeding 1Gbps, consider AWS DataSync or direct S3 transfer instead. The Snowball shipping overhead (order processing, transit, AWS loading) adds 2-3 weeks. For 80TB over a 10Gbps link, direct transfer takes ~18 hours β faster than Snowball.
package io.thecodeforge.aws; # Snowball capacity and transfer time calculator. # Use this to determine whether Snowball or direct transfer is faster. from dataclasses import dataclass from enum import Enum class SnowballDevice(Enum): """Snowball device tiers with usable capacity in TB.""" SNOWBALL_STANDARD = 80 SNOWBALL_EDGE_STORAGE = 210 SNOWBALL_EDGE_COMPUTE = 100 SNOWBALL_EDGE_COMPUTE_GPU = 100 SNOWBALL_EDGE_EBS = 28 SNOWMOBILE = 100_000 @dataclass class TransferEstimate: device: SnowballDevice data_size_tb: float network_speed_gbps: float snowball_shipping_days: int = 14 # average round-trip shipping time aws_loading_days: int = 2 # time for AWS to load data into S3 def direct_transfer_days(self) -> float: """Calculate days to transfer over network.""" # Convert TB to bits, Gbps to bits/day data_bits = self.data_size_tb * 1e12 * 8 bits_per_day = self.network_speed_gbps * 1e9 * 86400 return data_bits / bits_per_day def snowball_transfer_days(self) -> float: """Calculate total Snowball timeline.""" # Copy to device: assume 100MB/s average (realistic for mixed files) copy_speed_mbps = 100 data_mb = self.data_size_tb * 1e6 copy_days = data_mb / (copy_speed_mbps * 86400) return copy_days + self.snowball_shipping_days + self.aws_loading_days def recommendation(self) -> str: """Return recommendation: Snowball or direct transfer.""" direct = self.direct_transfer_days() snowball = self.snowball_transfer_days() if self.data_size_tb > self.device.value: return ( f"DATA EXCEEDS DEVICE CAPACITY. " f"{self.data_size_tb}TB > {self.device.value}TB. " f"Use multiple devices or Snowmobile." ) if direct < snowball: return ( f"DIRECT TRANSFER is faster. " f"Direct: {direct:.1f} days vs Snowball: {snowball:.1f} days. " f"At {self.network_speed_gbps}Gbps, network transfer wins." ) else: savings = snowball - direct return ( f"SNOWBALL is faster. " f"Snowball: {snowball:.1f} days vs Direct: {direct:.1f} days. " f"Saves {savings:.1f} days." ) def main(): scenarios = [ TransferEstimate(SnowballDevice.SNOWBALL_STANDARD, 50, 1), TransferEstimate(SnowballDevice.SNOWBALL_STANDARD, 50, 10), TransferEstimate(SnowballDevice.SNOWBALL_EDGE_STORAGE, 150, 1), TransferEstimate(SnowballDevice.SNOWBALL_EDGE_STORAGE, 150, 10), TransferEstimate(SnowballDevice.SNOWBALL_EDGE_STORAGE, 200, 1), ] for scenario in scenarios: print(f"\n--- {scenario.data_size_tb}TB @ {scenario.network_speed_gbps}Gbps ---") print(f" Device: {scenario.device.name} ({scenario.device.value}TB)") print(f" Direct transfer: {scenario.direct_transfer_days():.1f} days") print(f" Snowball total: {scenario.snowball_transfer_days():.1f} days") print(f" {scenario.recommendation()}") if __name__ == '__main__': main()
- 1Gbps network: 80TB takes ~7 days. Snowball round-trip: ~16 days. Direct transfer wins.
- 1Gbps network: 200TB takes ~18 days. Snowball round-trip: ~18 days. Break-even.
- 1Gbps network: 500TB takes ~46 days. Snowball round-trip: ~18 days. Snowball wins by 28 days.
- 10Gbps network: 500TB takes ~4.6 days. Snowball round-trip: ~18 days. Direct transfer wins.
- Rule: calculate both timelines before committing to Snowball. It is not always faster.
Snowball Job Lifecycle: From Order to S3
A Snowball job has five phases. Each phase has specific failure modes that can add weeks to your timeline.
Phase 1: Create Job - Specify source/destination S3 bucket, KMS key, IAM role, and device type. - AWS provisions the device and prepares the manifest. - Timeline: 1-3 business days. - Failure mode: IAM role or KMS key misconfiguration discovered later at device loading.
Phase 2: Receive Device - AWS ships the device to your specified address. - Timeline: 2-5 business days depending on region. - Failure mode: shipping delays, incorrect address, device arrives damaged.
Phase 3: Copy Data - Install the Snowball client or use the S3 adapter. - Copy files to the device using s3 cp, s3 sync, or the Snowball client. - Timeline: depends on data size and network speed (typically 100-300MB/s). - Failure mode: manifest corruption from re-running copy scripts, KMS permission errors, network bottlenecks.
Phase 4: Ship Device Back - Use the pre-paid shipping label. - Verify the device status is 'Ready to ship' in the console. - Timeline: 2-5 business days. - Failure mode: shipping damage, lost in transit, incorrect return address.
Phase 5: AWS Loads Data - AWS validates the manifest, decrypts data, and loads it into your S3 bucket. - Timeline: 1-3 business days. - Failure mode: manifest mismatch (device rejected), KMS decryption failure, S3 bucket policy blocking writes.
Total typical timeline: 10-20 business days per device. Plan your migration timeline accordingly.
import boto3 import json import hashlib import os from dataclasses import dataclass, field from typing import Optional, List from datetime import datetime @dataclass class SnowballJobConfig: """Configuration for a Snowball migration job.""" job_id: str device_type: str # STANDARD | EDGE_STORAGE | EDGE_COMPUTE s3_bucket: str s3_prefix: str kms_key_arn: str iam_role_arn: str source_directory: str shipping_address: dict notification_sns_topic: str @dataclass class ManifestEntry: """Single entry in the transfer manifest.""" file_path: str s3_key: str size_bytes: int sha256_hash: str last_modified: str class SnowballJobManager: """Manages Snowball job lifecycle and pre-ship validation.""" def __init__(self, config: SnowballJobConfig): self.config = config self.snowball_client = boto3.client('snowball') self.kms_client = boto3.client('kms') self.iam_client = boto3.client('iam') self.s3_client = boto3.client('s3') def generate_manifest(self, source_dir: str) -> List[ManifestEntry]: """Generate a manifest of all files to be transferred.""" manifest = [] total_size = 0 skipped_files = [] for dirpath, dirnames, filenames in os.walk(source_dir): for filename in filenames: filepath = os.path.join(dirpath, filename) # Check path length β Snowball S3 adapter rejects > 1024 bytes if len(filepath.encode('utf-8')) > 1024: skipped_files.append(filepath) continue try: size = os.path.getsize(filepath) total_size += size sha256 = hashlib.sha256() with open(filepath, 'rb') as f: for chunk in iter(lambda: f.read(8192), b''): sha256.update(chunk) s3_key = os.path.relpath(filepath, source_dir) manifest.append(ManifestEntry( file_path=filepath, s3_key=f"{self.config.s3_prefix}/{s3_key}", size_bytes=size, sha256_hash=sha256.hexdigest(), last_modified=datetime.fromtimestamp( os.path.getmtime(filepath) ).isoformat(), )) except OSError as e: skipped_files.append(f"{filepath}: {e}") if skipped_files: print(f"WARNING: {len(skipped_files)} files skipped (path too long or inaccessible)") for f in skipped_files[:10]: print(f" - {f}") return manifest def validate_pre_ship(self, manifest: List[ManifestEntry]) -> dict: """Run all pre-ship validations. Returns validation result.""" errors = [] warnings = [] # 1. Validate KMS key exists and is accessible try: key_response = self.kms_client.describe_key( KeyId=self.config.kms_key_arn ) key_state = key_response['KeyMetadata']['KeyState'] if key_state != 'Enabled': errors.append(f"KMS key is not enabled: {key_state}") except Exception as e: errors.append(f"KMS key validation failed: {e}") # 2. Validate IAM role has required permissions required_actions = ['s3:PutObject', 's3:CreateBucket', 'kms:GenerateDataKey'] for action in required_actions: try: response = self.iam_client.simulate_principal_policy( PolicySourceArn=self.config.iam_role_arn, ActionNames=[action], ResourceArns=[f"arn:aws:s3:::{self.config.s3_bucket}/*"], ) decision = response['EvaluationResults'][0]['EvalDecision'] if decision != 'allowed': errors.append(f"IAM role missing {action} permission") except Exception as e: errors.append(f"IAM permission check failed for {action}: {e}") # 3. Validate manifest has entries if not manifest: errors.append("Manifest is empty β no files to transfer") # 4. Validate total size fits on device total_gb = sum(e.size_bytes for e in manifest) / (1024 ** 3) device_limits = { 'STANDARD': 80, 'EDGE_STORAGE': 210, 'EDGE_COMPUTE': 100, } limit = device_limits.get(self.config.device_type, 80) if total_gb > limit: errors.append( f"Data size {total_gb:.1f}TB exceeds device capacity {limit}TB" ) elif total_gb > limit * 0.9: warnings.append( f"Data size {total_gb:.1f}TB is >90% of device capacity {limit}TB. " f"Consider leaving headroom." ) # 5. Validate no files have paths > 1024 bytes long_paths = [e for e in manifest if len(e.file_path.encode('utf-8')) > 1024] if long_paths: errors.append( f"{len(long_paths)} files have paths > 1024 bytes. " f"S3 adapter will reject them." ) return { 'valid': len(errors) == 0, 'errors': errors, 'warnings': warnings, 'manifest_entries': len(manifest), 'total_size_gb': round(total_gb, 2), 'device_type': self.config.device_type, 'device_capacity_tb': limit, } def save_manifest(self, manifest: List[ManifestEntry], output_path: str): """Save manifest to JSON for post-transfer verification.""" manifest_data = { 'job_id': self.config.job_id, 'generated_at': datetime.utcnow().isoformat(), 'total_entries': len(manifest), 'total_bytes': sum(e.size_bytes for e in manifest), 'entries': [ { 'file_path': e.file_path, 's3_key': e.s3_key, 'size_bytes': e.size_bytes, 'sha256': e.sha256_hash, 'last_modified': e.last_modified, } for e in manifest ], } with open(output_path, 'w') as f: json.dump(manifest_data, f, indent=2) print(f"Manifest saved: {len(manifest)} entries, " f"{manifest_data['total_bytes'] / (1024**3):.2f}GB") def verify_post_load(self, manifest_path: str) -> dict: """Verify S3 contents match manifest after AWS loading.""" with open(manifest_path, 'r') as f: manifest_data = json.load(f) mismatches = [] missing = [] for entry in manifest_data['entries']: try: response = self.s3_client.head_object( Bucket=self.config.s3_bucket, Key=entry['s3_key'], ) s3_size = response['ContentLength'] if s3_size != entry['size_bytes']: mismatches.append({ 'key': entry['s3_key'], 'expected_size': entry['size_bytes'], 'actual_size': s3_size, }) except self.s3_client.exceptions.ClientError: missing.append(entry['s3_key']) return { 'verified': len(mismatches) == 0 and len(missing) == 0, 'total_checked': len(manifest_data['entries']), 'missing': len(missing), 'size_mismatches': len(mismatches), 'missing_files': missing[:20], 'mismatched_files': mismatches[:20], }
- Phase 1 (Create): 1-3 days. Validate IAM, KMS, S3 before creating the job.
- Phase 2 (Receive): 2-5 days. Shipping time varies by region and carrier.
- Phase 3 (Copy): variable. Depends on data size and local network speed.
- Phase 4 (Return): 2-5 days. Use pre-paid label. Verify 'Ready to ship' status.
- Phase 5 (Load): 1-3 days. AWS validates manifest and loads into S3. Failures here mean starting over.
Snowball Edge: Local Compute and Edge Computing Use Cases
Snowball Edge extends the basic Snowball by adding local compute capabilities. You can run EC2 instances and Lambda functions directly on the device, processing data at the edge before (or instead of) transferring it to S3.
Snowball Edge variants: - Storage Optimized: 210TB usable storage, 40 vCPUs, 80GB RAM. Best for bulk data transfer with light processing. - Compute Optimized: 100TB storage, 52 vCPUs, 208GB RAM. Best for heavy compute at the edge. - Compute Optimized with GPU: 100TB storage, 52 vCPUs, 208GB RAM, NVIDIA V100 GPU. Best for ML inference at the edge. - Compute Optimized with EBS: 28TB NVMe EBS-like storage. Best for applications requiring fast random access.
Edge computing use cases: - Military field operations: process ISR (intelligence, surveillance, reconnaissance) data in disconnected environments. Run ML inference locally, ship only results. - Oil and gas: process seismic data at remote drilling sites. 100TB of raw seismic data reduced to 2TB of processed results. - Content distribution: pre-load content libraries onto Snowball Edge devices and ship to retail locations for local streaming. - Disaster recovery: deploy Snowball Edge to a disaster site, run local infrastructure (databases, web servers) while the primary site is being restored.
Important: Snowball Edge compute instances share storage with the data transfer volume. If you load 200TB onto a Storage Optimized device, the EC2 instance has minimal remaining storage. Plan compute and storage allocations separately.
package io.thecodeforge.aws; import boto3 import json from dataclasses import dataclass from typing import Optional @dataclass class EdgeComputeConfig: """Configuration for Snowball Edge compute workloads.""" device_id: str instance_type: str # sbg1.medium, sbg1.large, sbg1.xlarge ami_id: str vcpus: int memory_gb: int ebs_volumes: list # list of {size_gb, mount_point} class SnowballEdgeManager: """Manages Snowball Edge compute instances and edge workloads.""" def __init__(self, device_ip: str, unlock_code: str, manifest_path: str): self.device_ip = device_ip self.unlock_code = unlock_code self.manifest_path = manifest_path # Snowball Edge exposes a local EC2-compatible endpoint self.ec2_client = boto3.client( 'ec2', endpoint_url=f"https://{device_ip}:8443", verify=False, # Snowball Edge uses self-signed certs ) self.s3_client = boto3.client( 's3', endpoint_url=f"https://{device_ip}:8443", verify=False, ) def list_local_instances(self) -> list: """List EC2 instances running on the Snowball Edge device.""" response = self.ec2_client.describe_instances() instances = [] for reservation in response['Reservations']: for instance in reservation['Instances']: instances.append({ 'instance_id': instance['InstanceId'], 'state': instance['State']['Name'], 'type': instance['InstanceType'], 'private_ip': instance.get('PrivateIpAddress', 'N/A'), }) return instances def launch_edge_instance(self, config: EdgeComputeConfig) -> dict: """Launch an EC2 instance on the Snowball Edge device.""" response = self.ec2_client.run_instances( ImageId=config.ami_id, InstanceType=config.instance_type, MinCount=1, MaxCount=1, ) instance = response['Instances'][0] return { 'instance_id': instance['InstanceId'], 'state': instance['State']['Name'], } def list_local_buckets(self) -> list: """List S3 buckets on the Snowball Edge device.""" response = self.s3_client.list_buckets() return [bucket['Name'] for bucket in response['Buckets']] def get_device_storage_status(self) -> dict: """Get storage utilization on the Snowball Edge device.""" # Note: actual API varies by Snowball Edge SDK version # This is a simplified example return { 'note': 'Use snowballEdge describe-device --device-id <id> CLI command', 'command': f'snowballEdge describe-device --device-id {self.device_ip}', } def validate_edge_workload(self, config: EdgeComputeConfig) -> list: """Validate that the edge workload fits on the device.""" errors = [] # Check instance type availability valid_types = ['sbg1.medium', 'sbg1.large', 'sbg1.xlarge'] if config.instance_type not in valid_types: errors.append(f"Invalid instance type: {config.instance_type}. Valid: {valid_types}") # Check EBS volume sizes total_ebs_gb = sum(v['size_gb'] for v in config.ebs_volumes) if total_ebs_gb > 28000: # 28TB max for EBS volumes errors.append(f"Total EBS size {total_ebs_gb}GB exceeds 28TB limit") return errors
- Storage Optimized (210TB): 40 vCPUs, 80GB RAM. Best for transfer-heavy workloads.
- Compute Optimized (100TB): 52 vCPUs, 208GB RAM. Best for compute-heavy workloads.
- Compute Optimized with GPU: adds NVIDIA V100 for ML inference at the edge.
- EBS variant (28TB NVMe): fast random access. Best for database workloads at the edge.
- Rule: if compute is primary, use Compute Optimized. If transfer is primary, use Storage Optimized. Never assume you get both at full capacity.
Encryption, KMS Key Management, and Cross-Account Migrations
Every Snowball device encrypts data at rest using AES-256. The encryption key is managed via AWS KMS. Understanding the key management model is critical β a KMS mismatch discovered at AWS's loading facility means a 2-3 week round-trip restart.
Encryption flow: 1. You specify a KMS key ARN when creating the Snowball job. 2. When you copy data to the device, the S3 adapter encrypts each object using a data key derived from the KMS key. 3. The device stores encrypted data. Without the KMS key, the data is unreadable. 4. When AWS receives the device, they use the KMS key to decrypt and load data into your S3 bucket.
Cross-account migration: - The Snowball job is created in Account A. - The S3 destination bucket is in Account A. - The KMS key can be in Account A or Account B. - If the KMS key is in Account B, Account A's Snowball job IAM role must have kms:Decrypt and kms:GenerateDataKey permissions on Account B's key. - Account B's KMS key policy must explicitly allow Account A's IAM role.
Common failure: the job is created with a default KMS key (aws/s3) in Account A, but the S3 bucket has a bucket policy requiring a specific KMS key from Account B. AWS loads the data with Account A's key, but S3 rejects writes because the key does not match the bucket policy.
Best practice: always specify the KMS key ARN explicitly. Never rely on default key behavior for Snowball jobs.
package io.thecodeforge.aws; import boto3 import json from dataclasses import dataclass @dataclass class KMSValidationResult: key_arn: str key_exists: bool key_enabled: bool key_account: str job_account: str account_match: bool permissions_valid: bool bucket_policy_compatible: bool errors: list class SnowballKMSValidator: """Validates KMS key configuration for Snowball jobs.""" def __init__(self, job_account_region: str = 'us-east-1'): self.kms_client = boto3.client('kms', region_name=job_account_region) self.iam_client = boto3.client('iam', region_name=job_account_region) self.s3_client = boto3.client('s3', region_name=job_account_region) self.sts_client = boto3.client('sts', region_name=job_account_region) def validate_kms_for_snowball( self, kms_key_arn: str, iam_role_arn: str, s3_bucket: str, ) -> KMSValidationResult: """Comprehensive KMS validation for a Snowball job.""" errors = [] # Get current account job_account = self.sts_client.get_caller_identity()['Account'] # 1. Verify KMS key exists and is enabled key_exists = False key_enabled = False key_account = '' try: key_response = self.kms_client.describe_key(KeyId=kms_key_arn) key_metadata = key_response['KeyMetadata'] key_exists = True key_enabled = key_metadata['KeyState'] == 'Enabled' key_account = key_metadata['Arn'].split(':')[4] if not key_enabled: errors.append(f"KMS key is not enabled. State: {key_metadata['KeyState']}") if key_metadata.get('KeyManager') == 'AWS': errors.append( "AWS-managed keys (aws/s3) are not recommended for Snowball. " "Use a customer-managed key for cross-account compatibility." ) except Exception as e: errors.append(f"KMS key not found or not accessible: {e}") # 2. Check cross-account permissions account_match = key_account == job_account permissions_valid = False if not account_match: errors.append( f"KMS key is in account {key_account} but Snowball job is in {job_account}. " f"Cross-account KMS permissions required." ) # Check if the IAM role has permissions on the cross-account key try: for action in ['kms:Decrypt', 'kms:GenerateDataKey', 'kms:DescribeKey']: response = self.iam_client.simulate_principal_policy( PolicySourceArn=iam_role_arn, ActionNames=[action], ResourceArns=[kms_key_arn], ) decision = response['EvaluationResults'][0]['EvalDecision'] if decision != 'allowed': errors.append( f"IAM role missing {action} on cross-account KMS key. " f"Add permission to IAM policy AND KMS key policy." ) else: permissions_valid = True except Exception as e: errors.append(f"Permission simulation failed: {e}") else: permissions_valid = True # 3. Check S3 bucket policy compatibility bucket_policy_compatible = True try: bucket_policy = self.s3_client.get_bucket_policy(Bucket=s3_bucket) policy = json.loads(bucket_policy['Policy']) # Check if bucket policy requires a specific KMS key for statement in policy.get('Statement', []): if statement.get('Effect') == 'Deny': condition = statement.get('Condition', {}) kms_condition = condition.get('StringNotEquals', {}).get( 's3:x-amz-server-side-encryption-aws-kms-key-id', '' ) if kms_condition and kms_condition != kms_key_arn: errors.append( f"Bucket policy denies writes not encrypted with {kms_condition}. " f"Snowball job uses {kms_key_arn}. Mismatch." ) bucket_policy_compatible = False except self.s3_client.exceptions.NoSuchBucketPolicy: pass # No bucket policy is fine except Exception as e: errors.append(f"Bucket policy check failed: {e}") return KMSValidationResult( key_arn=kms_key_arn, key_exists=key_exists, key_enabled=key_enabled, key_account=key_account, job_account=job_account, account_match=account_match, permissions_valid=permissions_valid, bucket_policy_compatible=bucket_policy_compatible, errors=errors, ) def generate_cross_account_kms_policy( self, job_account_id: str, job_role_arn: str, ) -> str: """Generate a KMS key policy for cross-account Snowball access.""" policy = { "Version": "2012-10-17", "Statement": [ { "Sid": "AllowSnowballJobAccount", "Effect": "Allow", "Principal": { "AWS": job_role_arn }, "Action": [ "kms:Decrypt", "kms:GenerateDataKey", "kms:DescribeKey", ], "Resource": "*", }, { "Sid": "AllowRootAccountFullAccess", "Effect": "Allow", "Principal": { "AWS": f"arn:aws:iam::{job_account_id}:root" }, "Action": "kms:*", "Resource": "*", }, ], } return json.dumps(policy, indent=2)
- Same account: IAM role needs kms:Decrypt and kms:GenerateDataKey on the key.
- Cross account: IAM role needs the same permissions. KMS key policy must allow the role ARN.
- Always specify the KMS key ARN explicitly in the Snowball job. Never rely on default key.
- AWS-managed keys (aws/s3) cannot be shared cross-account. Use customer-managed keys.
- Validate with iam:SimulatePrincipalPolicy before creating the job. Do not discover the error at AWS loading.
Performance Optimization: Maximizing Copy Speed to Snowball Devices
The copy phase is the only phase you control. Maximizing copy speed directly reduces your migration timeline.
Bottleneck analysis: - Network: Snowball Edge supports 10GbE and 25GbE. Standard Snowball supports 10GbE. If you connect at 1GbE, you are limited to ~100MB/s. - File size: millions of small files (<1MB) are 10-50x slower than large files due to per-file overhead (manifest entry, metadata, S3 adapter handshake). A 100MB/s large-file throughput drops to 5-10MB/s with millions of small files. - S3 adapter concurrency: the S3 adapter is single-threaded per copy session. Run multiple sessions in parallel to saturate the network. - Source disk I/O: if the source is on spinning disks, random read patterns for small files are 100x slower than sequential reads for large files.
Optimization strategies: 1. Tar small files: archive millions of small files into large tar files before copying. Copy 10GB tar files instead of 10 million 1KB files. 2. Parallel copy sessions: run 4-8 concurrent s3 cp sessions, each copying a different subdirectory. 3. Use 10GbE or 25GbE: verify the physical connection. The Snowball device's 1GbE management port is NOT the data port. 4. Pre-sort by file size: copy large files first (they saturate the network), then small files in batches. 5. Skip unnecessary files: exclude temp files, logs, and caches. Every unnecessary file adds per-file overhead.
package io.thecodeforge.aws; import os import subprocess import tarfile import tempfile from dataclasses import dataclass from concurrent.futures import ThreadPoolExecutor, as_completed from pathlib import Path @dataclass class CopyStrategy: """Copy strategy based on file size distribution.""" small_file_threshold_mb: float = 1.0 tar_batch_size_mb: float = 1024.0 # 1GB tar archives parallel_sessions: int = 4 exclude_patterns: list = None def __post_init__(self): if self.exclude_patterns is None: self.exclude_patterns = [ '*.tmp', '*.temp', '*.log', '*.cache', '.DS_Store', 'Thumbs.db', '__pycache__', '.git', 'node_modules', ] class SnowballCopyOptimizer: """Optimizes data copy to Snowball device for maximum throughput.""" def __init__(self, source_dir: str, snowball_endpoint: str, bucket: str, prefix: str): self.source_dir = Path(source_dir) self.snowball_endpoint = snowball_endpoint self.bucket = bucket self.prefix = prefix def analyze_source(self) -> dict: """Analyze source directory to determine optimal copy strategy.""" total_files = 0 total_size = 0 small_files = 0 small_files_size = 0 large_files = 0 large_files_size = 0 max_path_length = 0 long_path_files = [] for dirpath, dirnames, filenames in os.walk(self.source_dir): for filename in filenames: filepath = os.path.join(dirpath, filename) try: size = os.path.getsize(filepath) except OSError: continue total_files += 1 total_size += size if len(filepath) > max_path_length: max_path_length = len(filepath) if len(filepath) > 1024: long_path_files.append(filepath) if size < 1 * 1024 * 1024: # < 1MB small_files += 1 small_files_size += size else: large_files += 1 large_files_size += size return { 'total_files': total_files, 'total_size_gb': total_size / (1024**3), 'small_files': small_files, 'small_files_pct': (small_files / total_files * 100) if total_files > 0 else 0, 'small_files_size_gb': small_files_size / (1024**3), 'large_files': large_files, 'large_files_size_gb': large_files_size / (1024**3), 'max_path_length': max_path_length, 'long_path_files': len(long_path_files), 'recommendation': self._recommend_strategy(small_files, total_files), } def _recommend_strategy(self, small_files: int, total_files: int) -> str: """Recommend copy strategy based on file size distribution.""" small_pct = (small_files / total_files * 100) if total_files > 0 else 0 if small_pct > 50: return ( "HIGH SMALL FILE COUNT. " "Tar small files into large archives before copying. " "Expected speedup: 10-50x." ) elif small_pct > 20: return ( "MODERATE SMALL FILE COUNT. " "Consider tarring files < 1MB. " "Run 4+ parallel copy sessions." ) else: return ( "LARGE FILE DOMINANT. " "Direct copy with parallel sessions should achieve near-max throughput. " "Use 4-8 parallel s3 cp sessions." ) def create_tar_archives(self, strategy: CopyStrategy) -> list: """Create tar archives of small files for efficient transfer.""" archives = [] current_batch = [] current_batch_size = 0 batch_num = 0 for dirpath, dirnames, filenames in os.walk(self.source_dir): for filename in filenames: filepath = os.path.join(dirpath, filename) try: size = os.path.getsize(filepath) except OSError: continue if size < strategy.small_file_threshold_mb * 1024 * 1024: current_batch.append(filepath) current_batch_size += size if current_batch_size >= strategy.tar_batch_size_mb * 1024 * 1024: archive_path = self._write_tar(current_batch, batch_num) archives.append(archive_path) current_batch = [] current_batch_size = 0 batch_num += 1 # Write remaining batch if current_batch: archive_path = self._write_tar(current_batch, batch_num) archives.append(archive_path) return archives def _write_tar(self, files: list, batch_num: int) -> str: """Write a batch of files to a tar archive.""" archive_name = f"small_files_batch_{batch_num:04d}.tar.gz" archive_path = os.path.join(tempfile.gettempdir(), archive_name) with tarfile.open(archive_path, 'w:gz') as tar: for filepath in files: arcname = os.path.relpath(filepath, self.source_dir) tar.add(filepath, arcname=arcname) return archive_path def parallel_copy(self, source_dirs: list, strategy: CopyStrategy) -> list: """Copy multiple directories in parallel to Snowball device.""" results = [] with ThreadPoolExecutor(max_workers=strategy.parallel_sessions) as executor: futures = {} for i, source in enumerate(source_dirs): s3_key = f"{self.prefix}/batch_{i:04d}" future = executor.submit( self._copy_to_snowball, source, s3_key ) futures[future] = source for future in as_completed(futures): source = futures[future] try: result = future.result() results.append({'source': source, 'status': 'success', 'result': result}) except Exception as e: results.append({'source': source, 'status': 'error', 'error': str(e)}) return results def _copy_to_snowball(self, source: str, s3_key: str) -> dict: """Copy a single directory to the Snowball device via S3 adapter.""" cmd = [ 'aws', 's3', 'cp', source, f"s3://{self.bucket}/{s3_key}/", '--recursive', '--endpoint-url', f"https://{self.snowball_endpoint}:8443", '--no-verify-ssl', ] result = subprocess.run(cmd, capture_output=True, text=True) if result.returncode != 0: raise RuntimeError(f"Copy failed: {result.stderr}") return {'command': ' '.join(cmd), 'stdout': result.stdout}
- Per-file overhead: each file requires a manifest entry, metadata update, and S3 adapter handshake.
- 10 million 1KB files = 10GB total data. Copy time: ~2-3 days at 5MB/s.
- 10 thousand 1MB files = 10GB total data. Copy time: ~2 minutes at 100MB/s.
- Tar small files into 1GB archives: 10 archives instead of 10 million files. Copy time: ~2 minutes.
- Always run a source analysis before copying. If small files >20%, tar them first.
| Feature / Aspect | Snowball Standard | Snowball Edge Storage Optimized | Snowball Edge Compute Optimized | Snowball Edge Compute with GPU | Snowmobile |
|---|---|---|---|---|---|
| Usable storage | 80TB | 210TB | 100TB | 100TB | 100PB |
| Compute | None | 40 vCPUs, 80GB RAM | 52 vCPUs, 208GB RAM | 52 vCPUs, 208GB RAM, NVIDIA V100 | N/A |
| Network interfaces | 1x 10GbE | 1x 10GbE, 1x 25GbE | 1x 10GbE, 1x 25GbE | 1x 10GbE, 1x 25GbE | Fiber optic connection on-site |
| Local S3 endpoint | Yes | Yes | Yes | Yes | N/A |
| EC2/Lambda support | No | Yes (limited) | Yes (full) | Yes (full, with GPU) | No |
| Cost per job | $300 | $400-600 | $400-600 | $600+ | ~$0.005/GB |
| Shipping | Standard carrier | Standard carrier | Standard carrier | Standard carrier | Semi-truck with security escort |
| Typical timeline | 10-20 business days | 10-20 business days | 10-20 business days | 10-20 business days | Weeks to months (site prep required) |
| Best for | One-time data migration <80TB | Bulk transfer + light compute | Edge compute + moderate transfer | ML inference at edge | Exabyte-scale data center migration |
| Encryption | AES-256, KMS-managed | AES-256, KMS-managed | AES-256, KMS-managed | AES-256, KMS-managed | AES-256, KMS-managed |
| Tamper detection | TPM chip, tamper-evident enclosure, 256-bit encryption | TPM chip, tamper-evident enclosure, 256-bit encryption | TPM chip, tamper-evident enclosure, 256-bit encryption | TPM chip, tamper-evident enclosure, 256-bit encryption | Armed security escort, GPS tracking, tamper-evident seals |
| Use case example | Data center decommission, initial cloud migration | Content distribution, disaster recovery, bulk archival | Military field ops, oil rig processing, disconnected environments | Real-time video analysis, ML inference at edge | Enterprise data center consolidation, media archive migration |
π― Key Takeaways
Frequently Asked Questions
What is AWS Snowball?
AWS Snowball is a ruggedized physical storage device that AWS ships to your data center for offline data transfer to and from S3. It is designed for datasets too large to transfer over the network, typically 50TB+. You copy data onto the device, ship it back to AWS, and they load it into S3 for you.
What is the difference between Snowball, Snowball Edge, and Snowmobile?
Snowball (Standard) provides 80TB usable storage with no compute capability, intended for one-time data migration to/from S3 at $300 per job. Snowball Edge adds local compute (EC2 instances, Lambda) for edge processing, with variants ranging from 28TB to 210TB storage at $400-600 per job. Snowmobile is a 100PB shipping container for exabyte-scale data center migrations at approximately $0.005/GB.
How much usable storage does each Snowball device provide?
Snowball Standard: 80TB. Snowball Edge Storage Optimized: 210TB. Snowball Edge Compute Optimized: 100TB. Snowball Edge Compute Optimized with EBS: 28TB NVMe. Snowmobile: 100PB.
How long does a Snowball job take from start to finish?
A Snowball job is a 10-20 business day pipeline with five phases: Phase 1 (Create Job): 1-3 business days. Phase 2 (Receive Device): 2-5 business days depending on region. Phase 3 (Copy Data): depends on data size and network speed, typically 100-300MB/s. Phase 4 (Ship Device Back): 2-5 business days. Phase 5 (AWS Loads Data): 1-3 business days.
When should I use Snowball instead of direct network transfer?
Snowball is only faster than direct transfer when the network transfer time exceeds the Snowball round-trip time (shipping + loading). At 1Gbps, the crossover is approximately 50-180TB. At 10Gbps, the crossover is approximately 500TB-1.8PB. For datasets under 10TB, use direct S3 upload or AWS DataSync. Always calculate both timelines before committing.
How does encryption work on Snowball devices?
Every Snowball device encrypts data at rest using AES-256 with keys managed via AWS KMS. You specify a KMS key ARN when creating the job. The S3 adapter encrypts each object using a data key derived from the KMS key. Without the KMS key, the data is unreadable. When AWS receives the device, they use the KMS key to decrypt and load data into your S3 bucket.
How do I handle cross-account KMS key migrations?
If the KMS key is in Account B but the Snowball job is in Account A, Account A's Snowball job IAM role must have kms:Decrypt and kms:GenerateDataKey permissions on Account B's key. Account B's KMS key policy must explicitly allow Account A's IAM role. Always specify the KMS key ARN explicitly in the job configuration. Never rely on default key behavior.
What is the manifest file and why is it important?
The manifest file is the contract between your data and AWS. It records all files loaded onto the device. If the manifest does not match the device contents exactly, AWS rejects the device upon arrival. Losing or corrupting the manifest means losing the job. Always validate the manifest against the source directory before shipping.
What should I do if AWS rejects my Snowball device?
Common rejection reasons include manifest mismatch (files on device do not match manifest), KMS key mismatch (device encrypted with wrong key), or device tampering (security seal broken). If the device is rejected, you must order a new device (5-day wait), fix the source issues, re-copy the data, and re-ship. There is no fix for a rejected device β this adds 2-3 weeks to your timeline.
Can I re-run copy scripts on a Snowball device?
Never re-run copy scripts on a Snowball device. Overwriting files on the device corrupts the manifest. If you must re-copy, cancel the job and start a new one. A cleanup script that re-ran the initial copy has caused manifest corruption, resulting in device rejection by AWS.
How can I maximize copy speed to a Snowball device?
Use 10GbE or 25GbE connections (not the 1GbE management port). Tar millions of small files (<1MB) into large archives before copying, as small files are 10-50x slower due to per-file overhead. Run 4-8 concurrent s3 cp sessions in parallel. Pre-sort by file size, copying large files first. Skip unnecessary files like temp files, logs, and caches.
What are the Snowball Edge compute capabilities?
Snowball Edge supports local EC2 instances and Lambda functions via AWS IoT Greengrass. Compute Optimized variant offers 52 vCPUs and 208GB RAM. There is also a GPU variant with NVIDIA V100 for ML inference. Note that compute instances share storage with the data transfer volume β loading 200TB onto a 210TB device leaves only 10TB for compute workloads.
What are typical use cases for Snowball Edge?
Military field operations for processing ISR data in disconnected environments. Oil and gas for processing seismic data at remote drilling sites. Content distribution by pre-loading content libraries and shipping to retail locations. Disaster recovery by deploying to a disaster site and running local infrastructure while the primary site is being restored.
What permissions does the Snowball job IAM role need?
The Snowball job IAM role needs s3:PutObject, s3:CreateBucket, and kms:GenerateDataKey on the target bucket and KMS key. For cross-account migrations, the role also needs kms:Decrypt and kms:DescribeKey permissions on the cross-account KMS key. Verify permissions with iam:SimulatePrincipalPolicy before creating the job.
What files does the Snowball S3 adapter reject?
The S3 adapter rejects files with paths exceeding 1024 bytes, files with unsupported characters, and has per-file overhead that significantly slows transfer of millions of small files. Run a pre-flight check: diff source file count against manifest entry count, and grep for files with long paths or special characters before copying.
What is the Snowball crossover point for direct transfer vs Snowball?
At 1Gbps network: 80TB takes ~7 days direct transfer vs ~16 days Snowball round-trip (direct wins). 200TB takes ~18 days direct vs ~18 days Snowball (break-even). 500TB takes ~46 days direct vs ~18 days Snowball (Snowball wins by 28 days). At 10Gbps network: 500TB takes ~4.6 days direct vs ~18 days Snowball (direct wins). Rule: calculate both timelines before committing.
How much does Snowball cost?
Snowball Standard costs $300 per job plus shipping. Snowball Edge costs $400-600 per job plus shipping, or on-premises pricing for long-term edge deployment. Snowmobile costs approximately $0.005/GB (~$500K per 100PB). Snowball is cheaper than egress fees for large datasets.
What is the biggest mistake teams make with Snowball?
Not validating the final manifest against the source before shipping. If the device arrives at AWS with missing files, you start over β another 2-3 weeks. Other major mistakes include not verifying KMS key permissions across accounts before creating the job, re-running copy scripts on the same device, and not calculating whether direct transfer would be faster.
What should a pre-ship checklist include?
A pre-ship checklist should verify: (a) manifest matches source file count and total size, (b) KMS key ARN matches target account, (c) IAM role has kms:Decrypt on the specified key, (d) device status is 'Ready to ship' in the console. This should be a mandatory gate in your migration pipeline.
How should I plan storage allocation on Snowball Edge for compute workloads?
Snowball Edge compute instances and the S3 data transfer volume share the same physical storage. Plan storage allocation before loading data. If you need both bulk transfer AND compute, use separate devices or reduce the data volume to leave headroom. Partition device storage explicitly β for example, reserve 30TB for compute and load only 70TB of reference data. Test the full pipeline on the device before shipping.
Developer and founder of TheCodeForge. I built this site because I was tired of tutorials that explain what to type without explaining why it works. Every article here is written to make concepts actually click.