Senior 5 min · June 25, 2026

Design Google Docs: Real-Time Collaborative Editing at Scale

Learn how to design Google Docs with operational transformation, conflict resolution, and real-time sync.

N
Naren Founder & Principal Engineer

20+ years shipping large-scale distributed systems. Written from production experience, not tutorials.

Follow
Production
production tested
June 25, 2026
last updated
1,663
articles · all by Naren
 ● Production Incident 🔎 Debug Guide ⚙ Triage Commands
Quick Answer

The core challenge is handling concurrent edits from multiple users without data loss. Use Operational Transformation (OT) to transform each operation against concurrent ones, ensuring all clients converge to the same document state. Google Docs uses OT with a central server for ordering.

✦ Definition~90s read
What is Design Google Docs?

Design Google Docs refers to the system architecture behind real-time collaborative document editing, enabling multiple users to edit simultaneously with conflict resolution via Operational Transformation (OT) or Conflict-Free Replicated Data Types (CRDTs).

Imagine a group of people writing on a whiteboard with markers.
Plain-English First

Imagine a group of people writing on a whiteboard with markers. If two people try to write at the same spot, you need a rule to decide whose text goes where. Google Docs is like having a referee who catches every marker stroke, reorders them, and tells everyone the final result so no one's work gets erased.

You think building Google Docs is just WebSocket + CRDT? I've seen that assumption crater a startup's demo when two users typed the same word and the document turned into a jumble of 'helloworldhello'. Real-time collaboration is a distributed systems problem in disguise. The naive approach—send diffs and hope—loses data under load. This article walks you through the architecture that powers Google Docs: Operational Transformation, conflict resolution, and the production gotchas that'll kill your latency SLA. By the end, you'll be able to design a collaborative editor that survives concurrent edits, network partitions, and 3 AM pager duty.

Why Operational Transformation? The Problem with Naive Sync

Before OT, collaborative editors used lock-step or diff-merge. Lock-step blocks users—unacceptable for real-time. Diff-merge loses context: if Alice inserts 'A' at position 0 and Bob inserts 'B' at position 0, a simple merge produces 'AB' or 'BA' depending on order, but both lose the intent. OT solves this by transforming each operation against concurrent ones so they apply correctly regardless of order. The key insight: operations are functions that can be composed and transformed. Without OT, you get data corruption under concurrent edits. I've seen a production system where two users edited the same paragraph and the server applied both operations without transformation—result: half the paragraph vanished. The fix was implementing OT with a central sequencer.

OperationalTransform.systemdesignSYSTEMDESIGN
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
// io.thecodeforge — System Design tutorial

// Simplified OT for insert operations
// Assume document is a string, operations are {type, position, text}

function transform(op1, op2) {
  // op1 and op2 are concurrent operations
  // Return transformed op1' such that applying op2 then op1' is equivalent to op1 then op2'
  if (op1.type === 'insert' && op2.type === 'insert') {
    if (op1.position < op2.position) {
      // op1 stays, op2 shifts right by length of op1.text
      return { ...op1 };
    } else if (op1.position > op2.position) {
      // op1 shifts right by length of op2.text
      return { ...op1, position: op1.position + op2.text.length };
    } else {
      // Same position: use tie-breaker (e.g., user ID)
      if (op1.userId < op2.userId) {
        return { ...op1 };
      } else {
        return { ...op1, position: op1.position + op2.text.length };
      }
    }
  }
  // ... handle delete, replace etc.
}

// Example:
let doc = "hello";
let opA = { type: 'insert', position: 0, text: 'x', userId: 'A' };
let opB = { type: 'insert', position: 0, text: 'y', userId: 'B' };

// Server receives opA then opB (order arbitrary, but we need to apply both)
// Transform opB against opA:
let opBPrime = transform(opB, opA); // position becomes 1 (since opA inserted 'x' at 0)
// Apply opA then opBPrime:
doc = apply(doc, opA); // "xhello"
doc = apply(doc, opBPrime); // "xyhello"
// If we had applied opB then opAPrime, result would be "yxhello" — different! 
// OT ensures convergence only if transformation functions satisfy TP1 and TP2 properties.
console.log(doc); // "xyhello" — but this is not necessarily what users intended; real OT is more complex.
Output
xyhello
Production Trap: Non-Convergent Transformations
If your OT functions don't satisfy the transformation properties (TP1, TP2), clients will diverge. Test with random concurrent operations and verify final state is identical. I've seen a team spend weeks debugging 'ghost characters' because their delete transformation didn't handle overlapping ranges.
Real-Time Collaborative Editing at Scale THECODEFORGE.IO Real-Time Collaborative Editing at Scale Architecture and conflict resolution for Google Docs Operational Transformation Resolves concurrent edits via transform Central Sequencer Orders operations to ensure consistency Conflict Resolution Handles concurrent edits with OT Cursor & Selection Sync Broadcasts positions for UX Persistence & Recovery Logs operations for crash survival Sharding & Caching Scales to millions of documents ⚠ OT complexity grows with document size Consider CRDTs for simpler conflict resolution THECODEFORGE.IO
thecodeforge.io
Real-Time Collaborative Editing at Scale
Design Google Docs

Central Server Architecture: The Sequencer Pattern

Google Docs uses a central server that sequences all operations. Each client sends operations to the server, which assigns a monotonically increasing version number (timestamp or counter). The server transforms incoming operations against all previously applied operations and broadcasts the transformed version to all clients. This guarantees total order and simplifies conflict resolution. The downside: single point of failure and latency bottleneck. But for a document editor, the consistency guarantees are worth it. Without a sequencer, you need a distributed consensus protocol (like Raft) which adds complexity. For most use cases, a central server with a standby replica is fine. The classic rookie mistake: not handling server restarts. If the server crashes and loses the operation log, clients will have divergent states. Persist the operation log to a database (e.g., PostgreSQL with WAL) before broadcasting.

SequencerPattern.systemdesignSYSTEMDESIGN
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
// io.thecodeforge — System Design tutorial

// Server-side operation handler (pseudocode)

class DocumentServer {
  constructor() {
    this.version = 0;
    this.operations = []; // persisted to DB
    this.document = "";
  }

  handleOperation(clientOp, clientVersion) {
    // Client sends its last known version
    // Transform clientOp against all operations after clientVersion
    let transformedOp = clientOp;
    for (let i = clientVersion; i < this.version; i++) {
      transformedOp = transform(transformedOp, this.operations[i]);
    }
    // Apply to server document
    this.document = apply(this.document, transformedOp);
    // Assign new version
    const newVersion = this.version++;
    this.operations.push(transformedOp);
    // Persist operation to DB (async, but critical for recovery)
    persistOperation(newVersion, transformedOp);
    // Broadcast to all clients
    broadcast({ op: transformedOp, version: newVersion });
  }
}

// Client-side: send operation with last known version
// On receiving broadcast, apply op and update local version
// If broadcast version > expected, request missing ops from server
Output
(No direct output; pattern for server logic)
Senior Shortcut: Batching Operations
To reduce server load, batch operations from the same client every 50-100ms. Send a list of ops with the client's version. The server transforms the batch as a unit. This cuts WebSocket overhead by 10x.

Conflict Resolution: Handling Concurrent Edits

When two users edit the same word simultaneously, OT transforms both operations so they apply without loss. But edge cases abound: what if Alice deletes a range that Bob inserts into? The delete operation must be transformed to account for the insert. The standard approach is to use a two-phase transformation: first transform the incoming operation against the history, then apply. For complex edits (e.g., formatting), you need to track character positions with a position index that updates after each operation. Google Docs uses a 'cursor' model where each character has a unique ID, so operations reference characters by ID, not position. This avoids the 'shifting index' problem. Production gotcha: if your transformation functions are not commutative, you'll get different results depending on operation order. Always test with a random concurrent workload.

ConflictResolution.systemdesignSYSTEMDESIGN
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
// io.thecodeforge — System Design tutorial

// Using character IDs to avoid position shifts
// Each character gets a unique ID (e.g., UUID)
// Operations reference character IDs

class Char {
  constructor(id, value) {
    this.id = id;
    this.value = value;
  }
}

class Document {
  constructor() {
    this.chars = []; // ordered list of Char objects
  }

  applyInsert(afterCharId, newChar) {
    const idx = this.chars.findIndex(c => c.id === afterCharId);
    this.chars.splice(idx + 1, 0, newChar);
  }

  applyDelete(charId) {
    const idx = this.chars.findIndex(c => c.id === charId);
    this.chars.splice(idx, 1);
  }
}

// Transformation becomes simpler: no position shifting
// But you need to handle the case where the referenced character was deleted
// Solution: if charId not found, the operation is a no-op (or transformed to insert at end)

function transformInsert(op1, op2) {
  // op1 and op2 are inserts with afterCharId
  // If they reference the same afterCharId, use tie-breaker
  if (op1.afterCharId === op2.afterCharId) {
    // Insert op2's char before op1's char (or vice versa based on user ID)
    return { ...op1, afterCharId: op2.newChar.id }; // op1 now inserts after op2's char
  }
  // Otherwise, no transformation needed
  return op1;
}
Output
(No direct output; demonstrates character-ID approach)
Interview Gold: Character IDs vs Positions
Google Docs uses character IDs to avoid the 'shifting index' problem. This is a common interview question: 'How do you handle concurrent inserts at the same position?' Answer: assign each character a unique ID and reference that, not a numeric index.
OT Conflict Resolution FlowTHECODEFORGE.IOOT Conflict Resolution FlowTransforming concurrent edits to preserve intentClient A inserts 'X' at pos 0Operation is sent to serverClient B inserts 'Y' at pos 0Concurrent operation arrivesServer assigns versionsMonotonic sequence numbersTransform A against BShift positions to avoid conflictApply both operationsResult: 'XY' or 'YX' deterministically⚠ Without transformation, one user's edit would overwrite the other'sTHECODEFORGE.IO
thecodeforge.io
OT Conflict Resolution Flow
Design Google Docs

Cursors and Selections: The UX Nightmare

Showing remote cursors in real time is deceptively hard. Each client broadcasts its cursor position (character ID) on every movement. The server broadcasts these to other clients. But if the document changes, the cursor position must be transformed. For example, if Alice's cursor is at character ID 'abc' and Bob deletes that character, Alice's cursor should move to the next valid character. This requires the server to transform cursor positions against operations. The naive approach—send absolute position—breaks when the document changes. Instead, send the character ID of the character before the cursor (anchor). When an operation deletes that anchor, the cursor moves to the next character. Production gotcha: if you don't transform cursors, users will see cursors floating in the wrong place. I've seen a demo where a cursor ended up outside the document because the anchor was deleted and the client didn't handle it.

CursorSync.systemdesignSYSTEMDESIGN
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
// io.thecodeforge — System Design tutorial

// Cursor representation: { anchorCharId: string, focusCharId: string }
// anchorCharId is the character before the cursor (or null if at start)
// focusCharId is for selection (same as anchor for cursor)

function transformCursor(cursor, operation) {
  // If operation deletes the anchor character, move cursor to next valid
  if (operation.type === 'delete' && operation.charId === cursor.anchorCharId) {
    // Find next character in document after deleted one
    const nextChar = getNextChar(cursor.anchorCharId);
    cursor.anchorCharId = nextChar ? nextChar.id : null;
  }
  // If operation inserts before the anchor, anchor stays same (insert is after anchor)
  // If operation inserts after the anchor, no change
  // If operation inserts at the same position as anchor, anchor stays (insert is after)
  return cursor;
}

// Broadcast cursor updates at most every 50ms to avoid flooding
// Use a separate WebSocket channel for cursor updates (lower priority)
Output
(No direct output; cursor transformation logic)
Never Do This: Broadcast Cursor on Every Mouse Move
You'll saturate the network. Throttle to 20 updates per second max. Use a separate low-priority channel so cursor updates don't block document edits.

Persistence and Recovery: Surviving Crashes

The server must persist every operation before broadcasting. If the server crashes, it replays the operation log to reconstruct document state. But what about operations that were broadcast but not persisted? Clients will have applied them, but the server won't know. Solution: clients acknowledge operations. The server marks an operation as committed only after receiving acknowledgements from all clients. On recovery, the server requests missing operations from clients. This is the 'optimistic replication' pattern. Production gotcha: if you persist operations synchronously, latency spikes. Use async persistence with a write-ahead log (WAL). The WAL is flushed every 10ms or every 100 operations, whichever comes first. On crash, replay WAL. I've seen a system that persisted every operation synchronously to PostgreSQL—latency went from 10ms to 200ms. The fix was a WAL with async flush.

PersistenceRecovery.systemdesignSYSTEMDESIGN
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
// io.thecodeforge — System Design tutorial

// Write-Ahead Log (WAL) for operation persistence

class WAL {
  constructor() {
    this.buffer = [];
    this.flushInterval = setInterval(() => this.flush(), 10); // flush every 10ms
  }

  append(operation) {
    this.buffer.push(operation);
    if (this.buffer.length >= 100) {
      this.flush();
    }
  }

  flush() {
    if (this.buffer.length === 0) return;
    // Write buffer to disk (e.g., append to file or DB)
    db.insertOperations(this.buffer);
    this.buffer = [];
  }

  recover(documentId) {
    // Load all operations from DB and replay
    const ops = db.getOperations(documentId);
    let doc = "";
    for (const op of ops) {
      doc = apply(doc, op);
    }
    return doc;
  }
}

// On server start:
// 1. Recover document state from WAL
// 2. Connect to clients
// 3. For each client, compare last acknowledged version with server version
// 4. Request missing operations from clients if server is behind
Output
(No direct output; WAL pattern)
Senior Shortcut: Use a Dedicated WAL Service
Don't embed WAL in your app server. Use a separate service (e.g., Apache BookKeeper) that can handle high throughput and provides durability guarantees. This decouples persistence from business logic.

Scaling to Millions of Documents: Sharding and Caching

Google Docs handles billions of documents. The key is sharding by document ID. Each document's operations are stored on a single shard. The shard also handles the OT logic. This keeps the operation history local. For hot documents (e.g., a popular spreadsheet), you can replicate the shard and use a primary-replica pattern: all writes go to primary, reads can go to replicas. But replicas must apply operations in the same order—use the sequencer's version number. Caching: cache the document state (compiled from operations) in memory. Invalidate on new operation. For cold documents, load from persistent storage. Production gotcha: if you cache the compiled state, you must ensure it's consistent with the operation log. Use a version number that increments on each operation. On cache miss, rebuild from log and cache the result.

ShardingCaching.systemdesignSYSTEMDESIGN
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
// io.thecodeforge — System Design tutorial

// Shard assignment: hash(documentId) % numShards
// Each shard is a separate process or container

class DocumentShard {
  constructor(shardId) {
    this.shardId = shardId;
    this.documents = new Map(); // documentId -> { state, version, operationLog }
    this.cache = new LRUCache({ max: 10000 }); // cache compiled state
  }

  getDocumentState(documentId) {
    if (this.cache.has(documentId)) {
      return this.cache.get(documentId);
    }
    // Rebuild from operation log
    const log = this.getOperationLog(documentId);
    let state = "";
    for (const op of log) {
      state = apply(state, op);
    }
    this.cache.set(documentId, state);
    return state;
  }

  handleOperation(documentId, operation) {
    // Apply operation, update log, increment version
    // Invalidate cache for this document
    this.cache.delete(documentId);
    // Persist operation
    // Broadcast to clients
  }
}

// Load balancer routes requests based on document ID hash
Output
(No direct output; sharding pattern)
Interview Gold: Hot Document Problem
What happens when a document goes viral (e.g., a shared Google Doc with 10k concurrent editors)? The shard becomes a bottleneck. Solution: split the document into sections (e.g., paragraphs) and shard by section. Each section has its own operation log. This is what Google Docs does internally for large documents.

Offline Support and Conflict Resolution After Reconnect

Users expect to edit offline and sync later. This is a hard problem: the client accumulates operations locally. On reconnect, it sends them to the server. The server must transform these operations against any concurrent operations that happened while the client was offline. This is the same OT problem, but with a large batch. The server processes the client's operations in order, transforming each against the server's history since the client's last version. If conflicts are detected (e.g., both client and server edited the same word), the server's version wins (or you can use a merge UI). Google Docs uses a 'last writer wins' policy for simple conflicts, but for complex ones, it flags the conflict to the user. Production gotcha: if the client was offline for a long time, the transformation may produce unexpected results. Limit offline duration (e.g., 30 days) and force a full sync after that.

OfflineSync.systemdesignSYSTEMDESIGN
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
// io.thecodeforge — System Design tutorial

// Client-side offline operation queue
class OfflineQueue {
  constructor() {
    this.queue = [];
    this.lastSyncedVersion = 0;
  }

  addOperation(op) {
    this.queue.push(op);
  }

  async sync(server) {
    // Send all queued operations with lastSyncedVersion
    const response = await server.syncOperations({
      operations: this.queue,
      lastVersion: this.lastSyncedVersion
    });
    // Server returns transformed operations that the client must apply
    for (const op of response.transformedOps) {
      applyLocal(op);
    }
    this.queue = [];
    this.lastSyncedVersion = response.newVersion;
  }
}

// Server-side sync handler
function handleSync(clientOps, clientVersion) {
  let transformedOps = [];
  for (const op of clientOps) {
    // Transform against server operations after clientVersion
    let transformed = op;
    for (let i = clientVersion; i < serverVersion; i++) {
      transformed = transform(transformed, serverOps[i]);
    }
    // Apply to server
    applyServer(transformed);
    transformedOps.push(transformed);
    clientVersion++;
  }
  // Also send any server operations that happened after client's last version
  // that were not transformed (i.e., concurrent ops that client hasn't seen)
  const serverOpsSince = serverOps.slice(clientVersion);
  return { transformedOps, serverOpsSince, newVersion: serverVersion };
}
Output
(No direct output; offline sync pattern)
The Classic Bug: Offline Queue Overflow
If the client is offline for days, the queue can grow to millions of operations. This causes memory pressure and slow sync. Implement a max queue size (e.g., 10k ops) and force a full document download if exceeded.

When Not to Use OT: CRDTs as an Alternative

OT requires a central server for ordering. If you need peer-to-peer collaboration (no central server), CRDTs are a better fit. CRDTs guarantee convergence without a central coordinator by using commutative operations. However, CRDTs have larger metadata overhead (each character carries a unique ID and a tombstone for deletions). For a document editor, OT is simpler and more efficient when you have a server. Use CRDTs only if you need offline-first with no server or if you're building a decentralized app. Production gotcha: CRDTs can cause unbounded metadata growth if you don't implement garbage collection for tombstones. I've seen a CRDT-based editor where deleted characters accumulated and the document size grew 10x. The fix was periodic tombstone compaction.

CRDTvsOT.systemdesignSYSTEMDESIGN
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
// io.thecodeforge — System Design tutorial

// CRDT approach: each character has a unique ID and a list of 'causally ready' parents
// Insert operation: { charId, value, afterId: [list of IDs that this char should follow] }
// Delete operation: { charId, tombstone: true }

// Convergence: all clients apply operations in any order, result is the same
// because inserts are commutative (they specify explicit ordering via afterId)

// Example: two clients insert 'x' and 'y' after the same character 'a'
// Client A: insert { charId: 'x', afterId: ['a'] }
// Client B: insert { charId: 'y', afterId: ['a'] }
// Both clients will have 'axy' or 'ayx' depending on tie-breaker (e.g., charId comparison)
// But both will converge to the same order because tie-breaker is deterministic

// OT would require a server to order these operations; CRDT doesn't.
Output
(No direct output; CRDT vs OT comparison)
Interview Gold: OT vs CRDT Trade-offs
OT: simpler metadata, requires server, lower storage overhead. CRDT: no server needed, higher metadata, eventual consistency. Choose OT for server-based apps like Google Docs; choose CRDT for peer-to-peer or offline-first apps like Notion's offline mode.
OT vs CRDT for Collaborative EditingTHECODEFORGE.IOOT vs CRDT for Collaborative EditingChoosing the right sync strategyOperational TransformRequires central server for orderingLower metadata overhead per characterMature, battle-tested in Google DocsHard to support peer-to-peer offlineCRDTWorks without central coordinatorLarger metadata per character (IDs)Converges automatically via commutativityBetter for offline-first & P2P appsUse OT for server-centric apps; CRDT for decentralized or offline-heavy useTHECODEFORGE.IO
thecodeforge.io
OT vs CRDT for Collaborative Editing
Design Google Docs
● Production incidentPOST-MORTEMseverity: high

The 4GB Container That Kept Dying

Symptom
Server OOM-killed every 30 minutes under 100 concurrent editors. No obvious memory leak in heap dumps.
Assumption
Thought it was a memory leak in the OT transformation cache.
Root cause
The operation history buffer stored every operation since session start, never compacted. With 100 users typing at 5 ops/sec, history grew 500 ops/sec. After 30 minutes: 900k operations, each ~4KB JSON → 3.6GB. The OOM killer fired.
Fix
Implemented sliding window compaction: keep last 1000 operations per document, archive older ones to disk with a Bloom filter for conflict checks. Memory dropped to 200MB.
Key lesson
  • Always bound operation history.
  • Unbounded history is a memory bomb waiting to explode.
Production debug guideSystematic recovery paths for the failure modes engineers actually hit.3 entries
Symptom · 01
Users report document state diverges between clients
Fix
1. Check server operation log for missing transformations. 2. Verify OT functions satisfy TP1/TP2 with random test harness. 3. Ensure all clients apply operations in the same order (version number). 4. If using character IDs, check for duplicate IDs.
Symptom · 02
High latency on edits (>1 second)
Fix
1. Profile WebSocket message size (JSON vs binary). 2. Check server CPU: OT transformation is O(n) per operation. 3. Reduce broadcast frequency: batch operations. 4. Consider sharding hot documents.
Symptom · 03
Server OOM after hours of operation
Fix
1. Check operation history size. 2. Implement compaction: archive old ops, keep sliding window. 3. Profile memory usage of cached document states. 4. Set max cache size with LRU eviction.
★ Design Google Docs Triage Cheat SheetFirst-response commands for when things go wrong — copy-paste ready.
Document state mismatch between clients: `Error: Version mismatch`
Immediate action
Check server operation log for gaps
Commands
SELECT COUNT(*) FROM operations WHERE document_id = 'doc123' AND version > last_known_version;
Check client last acknowledged version in logs
Fix now
Force client to full sync: send current document state as a snapshot.
High latency on edits: `p95 latency > 500ms`+
Immediate action
Check WebSocket message size
Commands
tcpdump -i eth0 port 443 -A | grep 'op:' | head -100
Measure average operation size in bytes
Fix now
Switch to Protocol Buffers for serialization. Reduce operation frequency by batching.
Server OOM: `OutOfMemoryError: Java heap space`+
Immediate action
Check operation history size per document
Commands
jmap -histo <pid> | head -20
Check number of operations in memory: SELECT document_id, COUNT(*) FROM operations GROUP BY document_id;
Fix now
Set max operations per document to 1000. Archive older ops to disk.
Cursors jumping erratically: `Cursor position out of bounds`+
Immediate action
Check cursor transformation logic
Commands
Enable debug logging for cursor updates
Verify anchor character IDs exist in current document
Fix now
If anchor deleted, move cursor to next valid character. If none, set to end.
Feature / AspectOperational Transformation (OT)CRDT
Central coordinator requiredYes (sequencer)No
Metadata overheadLow (operation log)High (per-character IDs, tombstones)
Conflict resolutionTransform functionsCommutative operations
Offline supportComplex (batch transform)Natural (merge on reconnect)
ScalabilityServer bottleneck at high concurrencyBetter for P2P, but metadata grows
Production maturityGoogle Docs, Microsoft OfficeAutomerge, Yjs

Key takeaways

1
OT requires a central sequencer for total order; without it, clients diverge.
2
Always bound operation history to prevent OOM; use sliding window compaction.
3
Transform cursor positions against operations to avoid floating cursors.
4
CRDTs are overkill for server-based editors; OT is simpler and more efficient.
INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01SENIOR
How does Google Docs handle concurrent edits without data loss?
Q02SENIOR
When would you choose CRDTs over OT for a collaborative editor?
Q03SENIOR
What happens when a client reconnects after being offline for an hour wi...
Q04JUNIOR
What is the purpose of a sequencer in OT-based systems?
Q05SENIOR
You notice that after a server crash and recovery, some clients have ope...
Q06SENIOR
How would you design a collaborative editor that supports 10,000 concurr...
Q01 of 06SENIOR

How does Google Docs handle concurrent edits without data loss?

ANSWER
It uses Operational Transformation (OT) with a central sequencer. Each operation is transformed against concurrent operations before application, ensuring all clients converge to the same state. The server assigns a version number to each operation, guaranteeing total order.
FAQ · 4 QUESTIONS

Frequently Asked Questions

01
How does Google Docs handle multiple users typing at the same time?
02
What's the difference between OT and CRDT for collaborative editing?
03
How do I implement offline editing in a collaborative document editor?
04
What happens if the server crashes in the middle of processing an operation?
N
Naren Founder & Principal Engineer

20+ years shipping large-scale distributed systems. Written from production experience, not tutorials.

Follow
Verified
production tested
June 25, 2026
last updated
1,663
articles · all by Naren
🔥

That's Real World. Mark it forged?

5 min read · try the examples if you haven't

Previous
Design a Web Crawler
19 / 40 · Real World
Next
Design Instagram