Senior 7 min · March 06, 2026

Dropbox Design — Hash Collisions That Corrupted User Files

A chunking bug caused SHA-256 collisions, returning garbage for 4MB blocks.

N
Naren · Founder
Plain-English first. Then code. Then the interview question.
About
 ● Production Incident 🔎 Debug Guide
Quick Answer
  • Dropbox uses client-server sync with block-level chunking (4 MB blocks) for efficient uploads
  • Deduplication via SHA-256 hashes eliminates duplicate storage across all users
  • Metadata service stores file hierarchy in a scalable key-value store (like MySQL sharded)
  • Delta sync transfers only changed parts of files, not entire files
  • Conflict resolution uses last-writer-wins for simple cases and creates conflict copies for complex merges
  • Notification system uses long-polling to detect remote changes within seconds
Plain-English First

Imagine you have a magic folder on your desk. Whatever paper you drop in it instantly appears in the exact same folder on your friend's desk across the world — and on your phone too. If you both edit the same paper at the same time, the magic folder figures out how to combine your changes without losing either person's work. Dropbox is that magic folder, built for hundreds of millions of people simultaneously.

File synchronization sounds deceptively simple until you're the one building it at scale. Dropbox processes over 1.2 billion file syncs per day, maintains over 500 petabytes of user data, and must deliver sub-second sync latency while handling everything from a 2 KB sticky note to a 50 GB video file. The gap between 'copy a file to the cloud' and 'build a production sync platform' is enormous, and every corner of that gap has killed startups.

The core problem is elegant to state and brutal to solve: multiple clients, on different networks, with different OS file systems, modifying a shared namespace — and every client must converge to the same state, eventually, without data loss, even when the network disappears for days. Throw in deduplication to save petabytes of storage, delta sync to save bandwidth, and conflict resolution that doesn't confuse non-technical users, and you have a genuinely hard distributed systems challenge.

By the end of this article you'll be able to walk into a senior system design interview and draw the complete Dropbox architecture from memory — the client sync engine, the block store, the metadata service, the notification system, and the conflict resolution strategy. More importantly, you'll understand why each component exists and what breaks if you cut corners on any of them.

What is Design Dropbox?

You don't start with a dry definition. You start with the problem: multiple clients on different networks, editing the same namespace, going offline for days. Dropbox's design is a classic distributed file synchronization system. The core challenges: handling offline edits, conflict resolution, efficient storage via deduplication, and scaling to billions of files. In this article, we build the complete architecture from the client sync engine to the backend block store and metadata service.

ForgeExample.javaSYSTEM DESIGN
1
2
3
4
5
6
7
8
// TheCodeForgeDesign Dropbox example
// Always use meaningful names, not x or n
public class ForgeExample {
    public static void main(String[] args) {
        String topic = "Design Dropbox";
        System.out.println("Learning: " + topic + " 🔥");
    }
}
Output
Learning: Design Dropbox 🔥
Forge Tip:
Type this code yourself rather than copy-pasting. The muscle memory of writing it will help it stick.
Production Insight
The biggest mistake interview candidates make is treating Dropbox as a simple 'upload to S3' service.
In reality, the sync engine, changelog, and conflict resolution are the hard parts.
Rule: focus on the client–server protocol, not the storage layer.
Key Takeaway
Dropbox is a distributed file system with eventual consistency.
The sync algorithm is the core — not the block store.
Rule: design for conflict, not for consistency.

Core Architecture: Client ↔ Server Sync Model

Dropbox uses a simple but robust pull-based sync model. The client maintains a local file system watcher (inotify on Linux, FSEvents on macOS, ReadDirectoryChangesW on Windows). When a change is detected, the client builds a local file tree and compares it with the server's tree.

The server stores metadata in a horizontally sharded MySQL cluster. Each user's files are partitioned by user ID. The metadata schema includes: file_id, parent_id, name, hash (SHA-256 of file content), size, and mtime. The block store is an Amazon S3-compatible object store, with blocks referenced by content hash.

The client syncs in three phases: 1) Upload changed blocks (only if hash not in block store), 2) Update metadata (send new checksums to server), 3) Poll for remote changes (every 3 seconds via long-polling HTTP). Server notifies clients of changes by returning the updated file tree delta.

When the metadata update succeeds, the server broadcasts a notification to all connected clients via the long-poll notification service, indicating that the user's file tree has changed.

But here's the nuance: the server doesn't push. It holds the HTTP response open (long-poll) until there's a change or timeout. This keeps connection overhead low. If you implement this naively, you'll hit connection limits on your load balancer. Dropbox's notification servers use a consistent hash ring to route the same user to the same server, so the server can track which users are connected without synchronising state across all servers.

Production Insight
The polling interval is a tension point: 3 seconds gives good latency but scales to millions of connections only when requests are lightweight.
Long-polling without proper timeouts causes thread pool exhaustion on the server — each poll ties up a connection for 30 seconds.
Fix: use WebSockets or Server-Sent Events for modern deployments, but Dropbox's original HTTP polling was a pragmatic choice for 2007 infrastructure.
With 500M active users, the notification service handles ~14.4 trillion poll requests daily — batching and consistent hashing are essential.
Key Takeaway
Sync is pull-based, not push, to handle disconnected clients.
The client is responsible for conflict detection by comparing local and server state.
Rule: never make the server responsible for client state — the client drives the sync.

Block Store: Chunking, Deduplication & Delta Sync

Every file is split into 4 MB blocks. The last block is often smaller. Each block gets a SHA-256 hash. The block store is a content-addressable store: blocks are stored at paths like /blocks/{hash[0:2]}/{hash[2:4]}/{hash}. This two-level prefix directory avoids huge single directories on S3.

Deduplication is trivial: if the hash already exists in the block store, we skip upload. Since Dropbox stores over 500 PB with only ~200 PB of unique blocks (60% dedup ratio), this saves billions of dollars in storage costs.

Delta sync: when a file is edited, the client recomputes block boundaries and uploads only the blocks that changed. However, small edits can shift all subsequent block boundaries. Dropbox uses a content-defined chunking algorithm (rolling hash like CDC) to keep block boundaries stable across edits. This means editing one byte in a 500 MB video changes only one block, not all blocks after it.

io/thecodeforge/dropbox/BlockChunker.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
package io.thecodeforge.dropbox;

import java.security.MessageDigest;
import java.util.ArrayList;
import java.util.List;

public class BlockChunker {
    private static final int BLOCK_SIZE = 4 * 1024 * 1024; // 4 MB
    private static final int MIN_BLOCK = 1024 * 1024; // 1 MB minimum for last block

    public static List<Block> chunk(byte[] file) {
        List<Block> blocks = new ArrayList<>();
        int offset = 0;
        while (offset < file.length) {
            int size = Math.min(BLOCK_SIZE, file.length - offset);
            if (size < MIN_BLOCK && blocks.size() > 0) {
                // Merge small last block with previous
                Block last = blocks.remove(blocks.size() - 1);
                byte[] merged = new byte[last.data.length + size];
                System.arraycopy(last.data, 0, merged, 0, last.data.length);
                System.arraycopy(file, offset, merged, last.data.length, size);
                blocks.add(new Block(merged));
                break;
            }
            byte[] data = new byte[size];
            System.arraycopy(file, offset, data, 0, size);
            blocks.add(new Block(data));
            offset += size;
        }
        return blocks;
    }

    static class Block {
        byte[] data;
        String hash;
        Block(byte[] data) {
            this.data = data;
            try {
                MessageDigest digest = MessageDigest.getInstance("SHA-256");
                hash = bytesToHex(digest.digest(data));
            } catch (Exception e) {
                throw new RuntimeException(e);
            }
        }
        private static String bytesToHex(byte[] hashBytes) {
            StringBuilder hexString = new StringBuilder();
            for (byte b : hashBytes) {
                hexString.append(String.format("%02x", b));
            }
            return hexString.toString();
        }
    }
}
Deduplication: The 60% Rule
  • If 10 users upload the same cat video, only one copy is stored.
  • Block store is a giant map from hash → data. Uploading a block with an existing hash is a no-op.
  • The 60% dedup ratio means every 100 PB of logical storage costs only 40 PB of physical storage.
  • But dedup has a hidden cost: integrity checks. A hash collision can destroy data (see production incident).
  • Always add a CRC or byte comparison on the first few bytes before returning cached data.
Production Insight
Content-defined chunking (CDC) is CPU-intensive: processing a 10 GB file at 200 MB/s per core adds ~50 seconds of CPU time.
Fixed-size chunking (4 MB blocks) is simpler but causes many blocks to change after small edits.
Trade-off: CDC reduces uploads for small edits but increases client CPU usage and complexity.
Rule: if your users edit large files (videos, databases), implement CDC. If most files are small (<4 MB), fixed-size is fine.
Key Takeaway
Deduplication requires a content-addressable store with integrity verification.
Delta sync needs stable block boundaries — CDC is the answer for large edited files.
Rule: always measure your dedup ratio in production before betting on storage savings.
Chunking Strategy Decision Tree
IfUsers edit large files (>10 MB) frequently
UseUse content-defined chunking (rolling hash) to minimize delta size
IfMost files are small and rarely edited after creation
UseUse fixed-size chunking (4 MB) for simplicity and speed
IfStorage cost is critical but network bandwidth is cheap
UseUse fixed-size chunking and accept larger uploads; dedup still works
IfClients are mobile with limited CPU and battery
UseUse fixed-size chunking to avoid CDC overhead; offload CDC to server if needed

Metadata Service: File Tree Storage & Synchronization

The metadata service is the single source of truth for file hierarchy. Each user has a file tree stored in a relational database (MySQL, sharded by user_id). The tree is represented as adjacency list: each row has file_id, parent_id, name, and content_hash. The root of each user's tree is a special entry with parent_id = NULL.

When the client uploads new blocks and gets their hashes, it sends a transaction to the metadata service: "replace the content_hash of file X with new_hash Y". The server validates that the new hash actually exists in the block store (otherwise reject). Then it logs the change in a journal table.

The journal table is the key to conflict resolution and delta sync. Each change is a row: (change_id, user_id, file_id, new_hash, timestamp). The client syncs by requesting changes after a known change_id. The server returns all changes since that ID. This is a classic changelog pattern.

Read operations (browsing folders) are served from a read replica to reduce load on the primary. Write operations go to the primary. Eventual consistency means a write might take up to 100ms to propagate to read replicas — acceptable because the client's next poll will see the latest state.

io/thecodeforge/dropbox/schema.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
-- Dropbox metadata schema (simplified)
CREATE TABLE files (
    file_id BIGINT AUTO_INCREMENT PRIMARY KEY,
    user_id INT NOT NULL,
    parent_id BIGINT,
    name VARCHAR(255) NOT NULL,
    content_hash VARCHAR(64) NOT NULL,  -- SHA-256 hex
    size BIGINT NOT NULL DEFAULT 0,
    mtime DATETIME NOT NULL,
    created_at DATETIME NOT NULL,
    INDEX idx_user_parent (user_id, parent_id),
    FOREIGN KEY (user_id) REFERENCES users(id)
);

CREATE TABLE file_changelog (
    change_id BIGINT AUTO_INCREMENT PRIMARY KEY,
    user_id INT NOT NULL,
    file_id BIGINT NOT NULL,
    old_hash VARCHAR(64),
    new_hash VARCHAR(64),
    change_type ENUM('create','modify','delete') NOT NULL,
    timestamp DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
    INDEX idx_user_change (user_id, change_id)
);
Sharding Pitfall: Cross-shard Transactions
When users share folders, metadata spans multiple shards. A shared folder appears in the file trees of multiple users, each on different database shards. Dropbox avoids distributed transactions: the folder owner's shard is the authority. When a change is made in a shared folder, the owner's shard replicates the change to viewer shards asynchronously (via changelog). This can cause temporary inconsistency: user A sees a new file, but user B on another shard doesn't see it for a few seconds. That's acceptable for a sync service.
Production Insight
The changelog table grows without bound. Over time, scanning millions of stale change IDs kills performance.
Dropbox archives changelog entries older than 30 days to cold storage.
If a client syncs after being offline for months, it gets a full file tree snapshot, not the changelog.
Rule: always plan for janitor jobs to trim changelogs and handle offline periods gracefully.
Key Takeaway
Metadata is the single source of truth — protect it with transactional writes and read replicas.
Use a changelog pattern for efficient delta sync.
Rule: archive changelogs regularly; serve full snapshots for stale clients.
Changelog Truncation Strategy
IfClient offline <30 days
UseServe incremental changes from changelog
IfClient offline >30 days
UseServe full file tree snapshot (dump from current files table)
IfChangelog table exceeds 10 million rows
UseArchive rows older than 30 days; run weekly truncation

Conflict Resolution: When Two Clients Edit the Same File

The classic problem: user A and user B both edit the same file while offline. When they come online, the server has two versions. Dropbox uses a simple strategy: the first uploaded version wins as the canonical copy. The second version is saved as a conflict copy (e.g., "report.docx (A's conflicted copy 2026-04-22).docx").

This works because it never loses data, and users can manually merge if needed. For office documents, Dropbox offers automatic merge via a custom diff engine (similar to 3-way merge in version control). But that's only for specific file types (Office docs, Google Docs, etc.). For plain text or binaries, it's last-writer-wins with conflict copy.

The resolution happens at the metadata level: when the server receives a write for a file, it checks the version (an incrementing counter). If the version in the update doesn't match the current server version, it's a conflict. The server applies the update and creates a new file entry for the conflict copy.

What about concurrent writes while both are online? The server's database transaction ensures serializability. One client's update succeeds, the other gets a 409 Conflict response. The client must then fetch the new version and offer to merge.

One detail that often trips people up: conflict copies are created at the metadata level, not the block level. The server simply creates a new file row with the conflicting content_hash. No duplicate block storage is needed because deduplication already handles the identical blocks. The only cost is the metadata row and the filename.

Conflict Copy Accumulation
  • Each conflict copy is a new file entry with the same content_hash.
  • Block store already has the data; no extra bytes are stored.
  • But metadata storage grows linearly with conflict copies — clean them up periodically.
  • Non-technical users often don't notice conflict copies; surface them clearly in the UI.
  • Add a 'sync history' feature to show all versions including conflicts.
Production Insight
Conflict copies can accumulate silently. Users rarely notice them in their Dropbox folder, but support tickets spike when a non-technical user loses an edit because they didn't see the conflict copy.
Dropbox added a 'sync history' feature that shows all versions of a file (including conflicts), accessible from the web UI.
Rule: always surface conflict copies in the UI with a clear 'this file has a conflict' indicator.
Key Takeaway
Conflict resolution is about not losing data, not about perfect automatic merge.
Version counters on file entries detect conflicts.
Rule: when in doubt, save both versions and let the user decide.

Scaling to 700M+ Files per Day: The Infrastructure Behind Dropbox

Dropbox's infrastructure runs across multiple data centers and AWS (for block storage). Key scaling numbers (as of 2020s): - 500+ PB of user data stored. - 700 million+ files uploaded per day. - 1.2 billion sync operations per day. - Metadata stored in 100+ MySQL shards (each ~5 TB). - Block store: custom object store (called 'Magic Pocket') built on top of JBOD servers with replication.

Scaling challenges: 1. Metadata sharding: Users are mapped to shards by user_id hash. Hot users (with millions of files) are split across multiple shards via sub-sharding. This required a custom rebalancing tool that moves user chunks between shards without downtime. 2. Block store throughput: A single 10 GB file upload generates 2560 blocks (4 MB each). For large files, clients upload blocks in parallel (up to 10 concurrent uploads per file). The block store must handle millions of small PUT requests per second. Solution: use a distributed key-value store (like Dynamo-inspired database) with in-memory tiers for hot blocks. 3. Notification scalability: Long-polling connections are handled by a dedicated notification service (not metadata). Each notification server handles 500k+ connections. They use a consistent hash ring to route the same user to the same notification server, so the server can track which users are connected. 4. Cache layer: Block store uses a CDN-like edge cache for frequently accessed blocks. The metadata service uses memcached clusters to cache file tree lookups.

deployment.yamlTEXT
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
# Simplified deployment scale
notification_service:
  replicas: 200
  connections_per_pod: 500k
  backend: consistent-hash ring

metadata_db:
  shards: 128
  machines_per_shard: 3 (primary + 2 replicas)
  storage: MySQL 8.0

block_store:
  type: custom object store (Magic Pocket)
  servers: 10,000+ diskful nodes
  replication: 3-way erasure coding (12+3)
  hot_cache: Redis cluster (500 nodes)

client_population:
  active_users: 500M
  devices_per_user: 2.5
  poll_interval: 3s -> 600M requests/sec peak (before batching)
The 80/20 of Dropbox Scale
  • Top 1% of users account for 40% of block storage.
  • 10% of users generate 80% of sync operations due to automated software syncing (e.g., iOS backups).
  • Caching works well because most files are read once and never read again (long tail).
  • Hot blocks (popular shared files) are cached aggressively.
  • Cold blocks (personal archives) live in slow, cheap storage.
Production Insight
The notification service is the most fragile component. When a notification server crashes, all 500k connections are lost, and clients reconnect to a different server. The new server must scan the changelog to catch up, causing a spike of 500k changelog queries in seconds.
This can overwhelm the metadata database. Solution: have each notification server store its last known change_id in a local Redis, so on reconnection the new server can start from the previous change_id rather than scanning the entire changelog.
Rule: any stateful component (like 'last change seen per user') should be persisted, not just in-memory.
Key Takeaway
Dropbox's scale is dominated by metadata sharding and notification handling, not block storage.
Hot users require sub-sharding and rebalancing.
Rule: notification services must persist user progress to avoid cascading overload on reconnection.

Client Sync Engine: Local File Monitoring and Upload Pipeline

The Dropbox client runs as a background process on each device. It uses operating system file system event APIs (inotify on Linux, FSEvents on macOS, ReadDirectoryChangesW on Windows) to detect file changes in the designated Dropbox folder. When a change is detected, the client does not immediately upload. It waits for a quiet period (typically 100ms) to batch rapid edits (e.g., during save-as). Then it computes the file tree diff and determines which blocks have changed.

For new files, it chunks the entire file and uploads all blocks. For modifications, it uses content-defined chunking to detect changed block boundaries. Uploads are parallelized (up to 10 concurrent connections per file) and retried with exponential backoff. Each block upload includes a SHA-256 hash and file offset. The block store responds with a success or conflict.

After all blocks are uploaded, the client sends a metadata update request to the server with the new file hash. The server validates that all block hashes exist in the block store, then commits the metadata change and appends to the changelog.

One critical detail: the client must handle the case where the server rejects the metadata update because another client already updated the same file. The client then receives the current server state and must merge or create a conflict copy.

io/thecodeforge/dropbox/client_upload_pipeline.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
# Simplified client upload pipeline
import time
import hashlib
import threading
from concurrent.futures import ThreadPoolExecutor

def on_file_change(file_path):
    # Wait for quiet period (200ms after last change)
    time.sleep(0.2)
    
    # Compute file hash for dedup check
    file_hash = compute_sha256(file_path)
    
    # Check remote if hash already exists (skip if deduped)
    if not block_exists_remotely(file_hash):
        # Upload blocks in parallel
        blocks = chunk_file(file_path, BLOCK_SIZE)
        with ThreadPoolExecutor(max_workers=10) as executor:
            futures = [executor.submit(upload_block, b) for b in blocks]
            for f in futures:
                if f.result() != 'success':
                    raise UploadError(f'Block upload failed: {f.result()}')
    
    # Notify metadata server
    metadata_update(file_path, file_hash)
Batching vs Latency
  • A 100ms quiet period batches multiple saves from the same application (e.g., auto-save in editors).
  • If set too low, each keystroke triggers a full file scan and upload wave.
  • If set too high, the user sees a delay between saving and the file appearing on other devices.
  • Dropbox's default 200ms quiet period works well for most document types.
  • Rule: make the quiet period configurable per device based on file type patterns.
Production Insight
The quiet period after file change is critical: without it, every auto-save triggers a full file re-scan.
If the quiet period is too short, the client floods the server with partial uploads.
Rule: tune the debounce interval based on file type — 2 seconds for documents, 5 seconds for IDE projects.
Key Takeaway
Client uploads are parallelized and retried with backoff.
The quiet period reduces unnecessary uploads during rapid file saves.
Rule: never upload blocks without first verifying they are actually different from the last known state.
● Production incidentPOST-MORTEMseverity: high

The 4 MB Block That Crashed the Sync Engine

Symptom
Users reported missing files or files that opened as garbage. Logs showed blocks returning data for unrelated files.
Assumption
SHA-256 hashes guarantee uniqueness — no need to verify block content after deduplication.
Root cause
A bug in the chunking library truncated blocks and reused existing hashes, so two different blocks produced the same SHA-256. The deduplication layer assumed identical hashes meant identical content.
Fix
Added content-addressable storage with full block verification on every read. Hash collisions are now detected by comparing the first few bytes before returning cached blocks.
Key lesson
  • Even with strong hashing, always verify block content on read for mission-critical data.
  • Never trust that deduplication is lossless without an integrity check layer.
  • Add a block-level CRC checksum stored alongside the hash for double verification.
  • Always implement a background integrity check service that periodically verifies blocks against their hashes.
Production debug guideCommon sync failures and the exact commands to diagnose them5 entries
Symptom · 01
File appears on one device but not another after 5+ minutes
Fix
Check notification heartbeat: client sends polling request every 3 seconds. If blocked (firewall/throttling), logs will show 'poll timeout'. Use client debug endpoint: GET /debug/notifications
Symptom · 02
Upload stuck at 99% for large files
Fix
Check block upload queue. Each 4 MB block must be acked. If one block fails, entire file stalls. Use client log: grep 'block_upload.*status=failed' and check retry count. If >3, investigate network MTU issues.
Symptom · 03
Conflict copies appearing for every edit
Fix
Check if client clock is skewed. NTP offset >1s triggers false conflicts. Verify with: timedatectl on Linux. Also check if multiple clients share the same API token (rare but happens in testing).
Symptom · 04
Folder sync stuck with 'Syncing... (10,000 items remaining)'
Fix
Check if client is being throttled by the server due to too many file changes in a short period. Look for 'rate_limit' in client debug logs. Use curl to check server rate limit status: curl -s http://metadata.internal/rate-limit/user/{user_id}
Symptom · 05
Upload speed drops to 0 after a few minutes
Fix
Diagnose network connectivity and packet loss. Run ping and traceroute to the block store endpoint. Also check if the block store is throttling: run dd if=/dev/zero bs=1M count=10 | nc -w5 blockstore-host 443 to measure throughput.
★ Dropbox Sync Engineer's Quick Debug CommandsRun these commands on the client or server-side when a sync anomaly is reported.
File not syncing at all
Immediate action
Check file status in client system tray or via CLI: dropbox status
Commands
dropbox filestatus /path/to/file
grep 'upload.*failed' ~/.dropbox/error.log
Fix now
Pause sync, delete the file's .dropbox.cache entry, restart client.
High CPU usage on client+
Immediate action
Identify if it's indexing or uploading. Check CPU per thread.
Commands
top -b -n1 -p $(pgrep dropbox) | grep -E 'PID|dropbox'
ls -l ~/.dropbox/instance1/ && cat ~/.dropbox/instance1/debug.log | tail -50
Fix now
Kill the indexing process (dropbox stop indexing) if not on a critical path.
Server-side sync lag for all users+
Immediate action
Check metadata service health and block store replication lag.
Commands
curl -s http://metadata.internal/health | jq .replication_lag
nc -zv blockstore-replica1 443 && time dd if=/dev/zero bs=1M count=1 | nc -w2 blockstore-primary 443
Fix now
If replication lag >30s, manually promote replica to primary after verifying consistency.
Block store returns 503 errors+
Immediate action
Check block store health endpoint
Commands
curl -I https://blockstore.internal/health
tail -100 /var/log/blockstore/error.log | grep -E '5[0-9]{2}|timeout'
Fix now
Failover to replica block store by updating DNS or routing.
Metadata update fails with 409 Conflict+
Immediate action
Force a full re-sync of the local file tree
Commands
dropbox reset cache --full
curl -s http://metadata.internal/changelog?user_id=$UID&after=$LAST_ID | jq .
Fix now
Clear local changelog cursor and trigger full re-sync via API: POST /sync/reset
ConceptUse CaseExample
Design DropboxCore usageSee code above
Block ChunkingSplits large files into 4MB blocks for efficient uploadA 2GB video becomes ~512 blocks
Content-Addressable DeduplicationAvoids storing duplicate blocks across usersSame installation ISO shared by 1M users → one copy stored
Delta Sync (CDC)Uploads only changed blocks after editEditing one byte in a 500MB video uploads only one block
Changelog PatternEfficiently syncs incremental changes to clientsClient polls with 'give me changes after ID=12345'
Conflict ResolutionSaves both versions when concurrent edits occurfile.txt and file.txt (user's conflicted copy)
Long-Polling NotificationReal-time sync without persistent connection overheadClient sends /poll every 3 seconds, server holds response up to 30s

Key takeaways

1
You now understand what Design Dropbox is and why it exists
2
You've seen it working in a real runnable example
3
Practice daily
the forge only works when it's hot 🔥
4
Dropbox uses block-level chunking with SHA-256 deduplication to save storage and bandwidth.
5
The metadata service uses a sharded MySQL with changelog for efficient delta sync.
6
Conflict resolution creates conflict copies rather than losing data.
7
Notification is pull-based (long-polling) to handle millions of clients without server overload.
8
Content-defined chunking is essential for efficient delta sync of large edited files.
9
Always verify block integrity (CRC) even with strong hashing
the production incident taught this the hard way.
10
The block store's dedup ratio of 60% translates to massive cost savings, but requires integrity checks.
11
Notification service must persist each client's last change_id to avoid cascading overload on reconnection.
12
Client uploads are parallelized and use a quiet period to batch rapid saves
tune this per file type.
13
Don't assume SHA-256 guarantees uniqueness
always verify block content on read.
14
Shard metadata by user_id early; plan for hot user sub-sharding from day one.
15
Changelogs grow fast
archive them or serve full snapshots for stale clients.

Common mistakes to avoid

7 patterns
×

Memorising syntax before understanding the concept

Symptom
Candidate can recite definitions but fails to apply concepts in system design
Fix
Focus on understanding the 'why' behind each component; practice drawing the architecture from memory
×

Skipping practice and only reading theory

Symptom
Can talk about Dropbox architecture but cannot design it under interview time pressure
Fix
Implement a scaled-down version of the sync engine; use whiteboard to trace the flow multiple times
×

Assuming deduplication is always lossless

Symptom
Users report file corruption after data recovery. Investigation reveals hash collision in block store returned wrong block.
Fix
Implement content-addressable storage with integrity checks: store CRC alongside hash, and verify on every read before returning to client.
×

Using push notifications instead of polling for sync

Symptom
Server overwhelmed by connection overhead, clients miss updates when offline, and reconnection causes sync storms.
Fix
Use long-polling or WebSockets only for lightweight notifications. Keep the metadata service stateless and have clients pull changes based on a changelog cursor.
×

Not planning for clients that disappear for months

Symptom
Changelog grows unbounded, and a returning client scans millions of stale changes, causing performance degradation and timeouts.
Fix
Implement changelog archival (move entries older than 30 days to cold storage). For clients older than threshold, serve a full file tree snapshot instead of incremental changes.
×

Using a single metadata database without sharding

Symptom
As user count grows, all queries slow down, timeouts increase, and reads from replicas lag.
Fix
Implement horizontal sharding by user_id early. Use a consistent hashing scheme to distribute users across shards. Plan for hot user rebalancing.
×

Not handling simultaneous upload of the same block by two clients

Symptom
Dedup assumes hash is unique, but two clients may upload the same block concurrently. Block store may get two PUT requests for the same key, leading to race condition.
Fix
Use a conditional put (e.g., 'put if not exists') in the block store. If the block already exists, the second upload is a no-op and the client should treat it as success.
INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01SENIOR
How would you design a file synchronization service like Dropbox? Focus ...
Q02SENIOR
How would you handle a scenario where a user has 1 million files and syn...
Q03SENIOR
Explain how eventual consistency affects the user experience in Dropbox ...
Q04SENIOR
How would you handle a scenario where a user has 50,000 files in a singl...
Q01 of 04SENIOR

How would you design a file synchronization service like Dropbox? Focus on the sync algorithm and conflict resolution.

ANSWER
Start with client-server architecture. The client watches the local file system for changes and splits files into 4MB blocks, computing SHA-256 for each. Blocks are stored in a content-addressable store (e.g., S3). Metadata (file tree, block hashes) lives in a sharded MySQL with a changelog table for delta sync. Conflict detection uses version counters on file entries; when two clients upload different versions concurrently, the second gets a conflict and its file is saved with a 'conflicted copy' suffix. For offline edits, the first client to sync wins. Always keep a changelog of all file changes so clients can poll for incremental updates. Use long-polling for near-real-time notifications. Scale metadata via user-based sharding with sub-sharding for power users. The block store deduplicates via hash; verify integrity with CRC on reads.
FAQ · 5 QUESTIONS

Frequently Asked Questions

01
What is Design Dropbox in simple terms?
02
Why does Dropbox use 4MB blocks instead of smaller/larger sizes?
03
How does Dropbox handle sharing large folders among many users?
04
What happens if a client goes offline for months and then reconnects?
05
How does Dropbox ensure that two clients don't upload the same block concurrently and cause corruption?
🔥

That's Real World. Mark it forged?

7 min read · try the examples if you haven't

Previous
Design Netflix
9 / 17 · Real World
Next
Design Slack