Senior 5 min · June 25, 2026

Design Spotify: A Production System Architecture That Handles 500M Users

Design Spotify's real-world system architecture for 500M users.

N
Naren Founder & Principal Engineer

20+ years shipping large-scale distributed systems. Written from production experience, not tutorials.

Follow
Production
production tested
June 25, 2026
last updated
1,663
articles · all by Naren
 ● Production Incident 🔎 Debug Guide ⚙ Triage Commands
Quick Answer

Spotify's architecture uses microservices with gRPC for internal communication, Cassandra for playlist storage, and a custom content delivery network for audio streaming. They rely on event-driven pipelines for recommendations and offline processing for analytics.

✦ Definition~90s read
What is Design Spotify?

Design Spotify is the process of architecting a music streaming service that handles millions of concurrent users, stores billions of playlists, and delivers low-latency audio streaming. It involves distributed systems, microservices, and data pipelines for personalization.

Think of Spotify as a massive jukebox with a librarian who remembers every song you've ever played.
Plain-English First

Think of Spotify as a massive jukebox with a librarian who remembers every song you've ever played. The jukebox (CDN) stores the music files, the librarian (backend) keeps your playlists and history, and a recommendation engine (data pipeline) suggests new songs based on what you and others like. The trick is making all this work instantly for millions of people at once.

You're building a music streaming service. Day one, it works fine. Day 100, your database is on fire because you stored playlists in a relational database with joins. I've seen this exact pattern kill a startup's launch. The problem isn't the music — it's the metadata. Playlists, user history, social features — these are graph-like, write-heavy, and need to be available globally. Spotify handles 500M users with 100M+ tracks. They don't use a single database. They use a carefully chosen set of tools, each optimized for a specific job. After this article, you'll be able to design a system that scales to millions of users without falling over.

Why Microservices? The Monolith That Couldn't Stream

Spotify started as a monolith. By 2012, it was a nightmare. Deploying a change to the recommendation engine required redeploying the entire backend. A bug in playlist sharing could take down search. They split into microservices — each owning a bounded context: playlist service, social graph service, audio delivery service, search service, etc. This let them scale independently. The playlist service, for example, is write-heavy and uses Cassandra. The search service is read-heavy and uses Elasticsearch. They communicate via gRPC for low latency and Kafka for async events. The trade-off? Distributed debugging is harder. You need tracing (Jaeger) and structured logging. But for a system with 500M users, the isolation is worth it.

SpotifyMicroserviceArchitecture.systemdesignSYSTEMDESIGN
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
// io.thecodeforge — System Design tutorial

// High-level service interaction for adding a track to a playlist

// Client -> API Gateway -> Playlist Service
// Playlist Service writes to Cassandra, publishes event to Kafka
// Kafka event consumed by Search Service (to index), Recommendation Service (to update model)

// This is a sequence diagram in text:
// 1. Client POST /playlists/{id}/tracks
// 2. API Gateway authenticates, routes to Playlist Service
// 3. Playlist Service validates track exists (calls Metadata Service via gRPC)
// 4. Playlist Service writes to Cassandra: INSERT INTO playlist_tracks (playlist_id, track_id, added_at, position) VALUES (?, ?, ?, ?)
// 5. Playlist Service publishes event to Kafka topic 'playlist_updates'
// 6. Search Service consumes event, updates Elasticsearch index
// 7. Recommendation Service consumes event, updates user's listening history in Cassandra

// Key: gRPC for synchronous calls (low latency), Kafka for async (decoupling)
Output
No direct output — architectural diagram described.
Production Trap: gRPC Without Timeouts
If you don't set gRPC deadlines, a slow downstream service can exhaust your connection pool. Always set a deadline (e.g., 500ms) and implement circuit breakers. I've seen a metadata service slow query cascade to all playlist writes.
Spotify Production Architecture for 500M Users THECODEFORGE.IO Spotify Production Architecture for 500M Users Microservices, Cassandra, CDN, Kafka, Redis, and more Microservices Decomposition Monolith split into independent services Cassandra for Playlists NoSQL for high write throughput CDN & Adaptive Bitrate Audio streaming with low latency Offline Recommendation Pipeline Batch processing for personalized playlists Elasticsearch Search Scalable full-text search at scale Redis Caching Hot data cache for low latency ⚠ Monolith scaling fails under 500M users Use microservices with event-driven architecture THECODEFORGE.IO
thecodeforge.io
Spotify Production Architecture for 500M Users
Design Spotify

Cassandra for Playlists: Why Not SQL?

Playlists are write-heavy. Users add/remove tracks, reorder, share. A relational database with normalized tables (playlists, playlist_tracks, users) would require joins on every read. With millions of playlists and billions of tracks, joins become a bottleneck. Cassandra is a wide-column store that excels at write throughput and can handle large datasets with no single point of failure. The data model: playlist_tracks table with partition key playlist_id and clustering columns (position, track_id). This allows fast range queries for a playlist's tracks. The downside? No joins, no transactions across partitions. You have to denormalize. For example, store playlist metadata (name, owner) in a separate table with the same partition key. Spotify also uses Cassandra for user library (saved tracks, albums) and listening history. The trade-off is eventual consistency — but for playlists, that's acceptable. If a user adds a track and it doesn't appear immediately on another device, it's fine.

CassandraPlaylistSchema.systemdesignSYSTEMDESIGN
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
// io.thecodeforge — System Design tutorial

// Cassandra schema for playlists

CREATE TABLE IF NOT EXISTS playlist_tracks (
    playlist_id uuid,
    position int,
    track_id text,
    added_at timestamp,
    added_by text,
    PRIMARY KEY (playlist_id, position)
) WITH CLUSTERING ORDER BY (position ASC);

// Query: get all tracks in a playlist ordered by position
SELECT * FROM playlist_tracks WHERE playlist_id = ? ORDER BY position ASC;

// Insert a track at the end
INSERT INTO playlist_tracks (playlist_id, position, track_id, added_at, added_by)
VALUES (?, ?, ?, toTimestamp(now()), ?);

// Note: position is managed by the application. To insert at a specific position, shift subsequent positions.
// This is a batch operation within the same partition (Cassandra supports lightweight transactions for idempotency).

// For playlist metadata:
CREATE TABLE IF NOT EXISTS playlists (
    playlist_id uuid PRIMARY KEY,
    name text,
    owner_id text,
    description text,
    created_at timestamp,
    updated_at timestamp
);
Output
No direct output — schema definition.
Senior Shortcut: Use TimeUUID for Clustering
If you need time-ordered tracks, use TimeUUID as clustering key instead of position. This avoids the complexity of shifting positions. But if users expect manual ordering, stick with position.
Playlist Storage: SQL vs CassandraTHECODEFORGE.IOPlaylist Storage: SQL vs CassandraWhy Spotify chose wide-column over relationalSQL (Relational)Joins on every playlist readNormalized tables: playlists, tracks, usersWrite bottleneck under high concurrencyScales vertically, expensive shardingCassandra (Wide-Column)Denormalized: playlist as single rowNo joins, fast reads for write-heavy loadLinear horizontal scalingTunable consistency for availabilityCassandra's write-optimized model handles billions of track operationsTHECODEFORGE.IO
thecodeforge.io
Playlist Storage: SQL vs Cassandra
Design Spotify

Audio Streaming: The CDN and Adaptive Bitrate

Streaming audio is the core of Spotify. They don't serve audio from their own servers — that would be insane. They use a CDN (Google Cloud CDN, Akamai, etc.) to cache audio files close to users. Audio files are encoded in multiple bitrates (96kbps, 160kbps, 320kbps Ogg Vorbis). The client selects the appropriate bitrate based on network conditions (adaptive bitrate streaming). The challenge: how does the client know which CDN edge to hit? Spotify uses a 'delivery service' that returns a signed URL to the closest CDN edge. The URL includes a token for authentication. The client then fetches the audio file directly from the CDN. This offloads traffic from Spotify's infrastructure. The CDN handles the heavy lifting of global distribution. The trade-off: cost. CDN egress is expensive. Spotify optimizes by caching popular tracks aggressively and using peer-to-peer for some scenarios (though that's less common now).

AudioDeliveryFlow.systemdesignSYSTEMDESIGN
1
2
3
4
5
6
7
8
9
10
11
12
13
// io.thecodeforge — System Design tutorial

// Client requests audio stream
// 1. Client sends GET /stream/{track_id} to API Gateway
// 2. API Gateway authenticates, calls Delivery Service
// 3. Delivery Service checks user's subscription (free vs premium) to determine bitrate
// 4. Delivery Service generates a signed URL: https://cdn.spotify.com/audio/{track_id}.ogg?token={jwt}&expires={timestamp}
// 5. Client receives URL, fetches from CDN
// 6. CDN checks if file is cached. If not, fetches from origin storage (Google Cloud Storage)
// 7. Client plays audio, requests next chunk if needed

// Key: signed URL prevents unauthorized access. Expiry time is short (e.g., 1 hour).
// For live streaming, use HLS (HTTP Live Streaming) with .m3u8 playlist.
Output
No direct output — flow description.
Interview Gold: CDN Cache Invalidation
When a new version of a track is uploaded (e.g., remastered), you need to invalidate the CDN cache. Use a versioned URL (e.g., /audio/{track_id}/{version}.ogg) or purge the cache via CDN API. Spotify uses versioned URLs to avoid cache invalidation storms.

Recommendations: The Offline Pipeline That Never Sleeps

Spotify's recommendation engine is the secret sauce. It's not real-time — it's a batch processing pipeline that runs periodically (daily or hourly). It uses collaborative filtering, natural language processing (on song lyrics, blog posts), and audio analysis (tempo, key, loudness). The pipeline is built on Apache Beam (or Scio, a Scala wrapper for Beam) running on Google Cloud Dataflow. It processes user listening history (stored in Cassandra) and track features (stored in Bigtable) to generate recommendations. The output is stored in a 'recommendation table' in Cassandra, keyed by user_id. When a user opens the app, the client fetches recommendations from this table. The trade-off: recommendations are not real-time. If a user listens to a new genre, it won't affect recommendations until the next pipeline run. But that's acceptable — users don't expect instant personalization.

RecommendationPipeline.systemdesignSYSTEMDESIGN
1
2
3
4
5
6
7
8
9
10
11
12
13
// io.thecodeforge — System Design tutorial

// Pseudo-code for recommendation pipeline (simplified)

// 1. Read user listening history from Cassandra (last 30 days)
// 2. For each user, compute vector of liked tracks (weighted by play count, skip rate)
// 3. Use collaborative filtering to find similar users
// 4. For each user, find tracks liked by similar users that the user hasn't heard
// 5. Rank tracks by predicted affinity score
// 6. Write top 50 recommendations to Cassandra: INSERT INTO recommendations (user_id, track_id, score) VALUES (?, ?, ?)

// This runs as a Dataflow pipeline with sliding windows.
// For real-time personalization (e.g., Discover Weekly), use a separate pipeline that updates daily.
Output
No direct output — pipeline description.
Production Trap: Data Skew in Collaborative Filtering
Popular tracks get recommended to everyone, creating a feedback loop. Mitigate by downweighting popular tracks or using a 'popularity penalty' in the scoring function. Otherwise, your recommendations become a top-40 chart.

Search: Elasticsearch at Scale

Search is critical. Users search for tracks, albums, artists, playlists. Spotify uses Elasticsearch with a custom scoring function that combines text relevance (BM25) with popularity signals. The index is sharded across multiple nodes. Updates come from Kafka events (when a new track is added, metadata changes). The challenge: keeping the index fresh. Spotify uses near-real-time indexing with a refresh interval of 1 second. For large updates (e.g., new album), they use bulk indexing. The trade-off: Elasticsearch is memory-hungry. Each shard consumes heap. They optimize by using doc-values for aggregations and avoiding heavy analyzers on large fields.

ElasticsearchMapping.systemdesignSYSTEMDESIGN
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
// io.thecodeforge — System Design tutorial

// Elasticsearch mapping for tracks
PUT /tracks
{
  "settings": {
    "number_of_shards": 5,
    "number_of_replicas": 1,
    "refresh_interval": "1s"
  },
  "mappings": {
    "properties": {
      "track_name": { "type": "text", "analyzer": "standard" },
      "artist_name": { "type": "text", "analyzer": "standard" },
      "album_name": { "type": "text", "analyzer": "standard" },
      "popularity": { "type": "integer" },
      "duration_ms": { "type": "integer" },
      "genre": { "type": "keyword" },
      "release_date": { "type": "date" }
    }
  }
}

// Query: search for tracks by name or artist
GET /tracks/_search
{
  "query": {
    "multi_match": {
      "query": "bohemian rhapsody",
      "fields": ["track_name^3", "artist_name^2", "album_name"],
      "type": "best_fields"
    }
  },
  "sort": [
    { "popularity": "desc" }
  ]
}
Output
No direct output — mapping and query.
Never Do This: Indexing Without Aliases
Always use index aliases for zero-downtime reindexing. If you need to change mappings, create a new index, reindex, then swap alias. Otherwise, you'll have downtime or corrupted data.

Social Features: Graph Database for Friends and Follows

Spotify has social features: follow friends, see their playlists, collaborative playlists. This is a graph problem. They use a graph database (Neo4j) for the social graph. Why not relational? Because queries like 'find all friends of friends who listened to this track' are expensive in SQL with recursive joins. Neo4j handles these with ease. The social graph is write-heavy (follow/unfollow) but read-heavy for recommendations. They replicate the graph to Cassandra for read-heavy workloads (denormalized). The trade-off: maintaining consistency between Neo4j and Cassandra. They use an event-driven approach: when a follow happens, write to Neo4j, then publish event to Kafka, which updates Cassandra. This is eventually consistent, but acceptable.

SocialGraphNeo4j.systemdesignSYSTEMDESIGN
1
2
3
4
5
6
7
8
9
10
11
12
13
// io.thecodeforge — System Design tutorial

// Neo4j Cypher query to find friends who listened to a track
MATCH (u:User {id: $user_id})-[:FOLLOWS]->(friend:User)-[:LISTENED]->(track:Track {id: $track_id})
RETURN friend.id

// This is efficient because Neo4j traverses relationships in constant time per hop.
// In SQL, this would require multiple joins or recursive CTEs.

// For write-heavy operations, use batch inserts:
UNWIND $relationships AS rel
MATCH (u:User {id: rel.follower}), (v:User {id: rel.followee})
MERGE (u)-[:FOLLOWS]->(v)
Output
No direct output — Cypher queries.
Senior Shortcut: Use a Separate Graph Service
Isolate the graph database behind a microservice. This prevents schema changes in the graph from affecting other services. Also, it allows you to cache frequently accessed graph queries in Redis.

Event-Driven Architecture with Kafka

Kafka is the backbone of Spotify's async communication. Every significant action (play track, add to playlist, follow user) publishes an event to Kafka. Multiple consumers process these events: search indexing, recommendation pipeline, analytics, billing (for premium). This decouples services and allows independent scaling. The key design: use Avro for schema evolution. Spotify has a schema registry to ensure compatibility. They also use Kafka Streams for real-time processing (e.g., updating user session state). The trade-off: Kafka adds latency (milliseconds) and operational complexity. But for a system with hundreds of microservices, it's essential.

KafkaEventSchema.systemdesignSYSTEMDESIGN
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
// io.thecodeforge — System Design tutorial

// Avro schema for track play event
{
  "type": "record",
  "name": "TrackPlayEvent",
  "namespace": "com.spotify.event",
  "fields": [
    {"name": "user_id", "type": "string"},
    {"name": "track_id", "type": "string"},
    {"name": "timestamp", "type": "long"},
    {"name": "context", "type": ["null", "string"], "default": null}, // e.g., "playlist:123"
    {"name": "source", "type": ["null", "string"], "default": null} // e.g., "search", "recommendation"
  ]
}

// Producer (in Python using confluent-kafka)
producer.produce('track_plays', value=avro_serializer(event, schema))

// Consumer (in Java)
KafkaConsumer<String, TrackPlayEvent> consumer = new KafkaConsumer<>(props);
consumer.subscribe(Arrays.asList("track_plays"));
while (true) {
    ConsumerRecords<String, TrackPlayEvent> records = consumer.poll(100);
    for (ConsumerRecord<String, TrackPlayEvent> record : records) {
        processPlayEvent(record.value());
    }
}
Output
No direct output — schema and code.
Production Trap: Schema Evolution Backward Compatibility
Always set default values for new fields. Otherwise, old consumers will break when they encounter a new schema. Use the schema registry to enforce compatibility checks.
Event Flow: User Action to Downstream ServicesTHECODEFORGE.IOEvent Flow: User Action to Downstream ServicesKafka decouples producers from consumersUser ActionPlay track, add to playlist, follow userKafka EventPublished to topic (e.g. track_plays)Search IndexerUpdates Elasticsearch indexRecommendation PipelineFeeds batch processing jobsAnalytics & BillingUsage metrics, premium charges⚠ Kafka enables async, fault-tolerant fan-out to many consumersTHECODEFORGE.IO
thecodeforge.io
Event Flow: User Action to Downstream Services
Design Spotify

Caching: Redis for Hot Data

Spotify uses Redis extensively for caching: user sessions, recently played tracks, top charts, and playlist metadata for popular playlists. The key is to cache only hot data. For example, the top 1% of playlists account for 80% of reads. They cache these in Redis with a TTL of 5 minutes. For less popular playlists, they fall through to Cassandra. They also use Redis for rate limiting and distributed locks. The trade-off: cache invalidation is hard. They use a write-through cache: when a playlist is updated, they invalidate the Redis key and let the next read repopulate it from Cassandra. This ensures consistency at the cost of a cache miss.

RedisCachingPlaylist.systemdesignSYSTEMDESIGN
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
// io.thecodeforge — System Design tutorial

// Pseudo-code for caching playlist tracks

// On read:
function getPlaylistTracks(playlistId):
    cached = redis.get("playlist:" + playlistId)
    if cached:
        return deserialize(cached)
    tracks = cassandra.execute("SELECT * FROM playlist_tracks WHERE playlist_id = ?", playlistId)
    redis.setex("playlist:" + playlistId, 300, serialize(tracks)) // TTL 5 min
    return tracks

// On write (add track):
function addTrackToPlaylist(playlistId, trackId):
    cassandra.execute("INSERT INTO playlist_tracks ...", playlistId, trackId)
    redis.del("playlist:" + playlistId) // invalidate cache
    // Also publish event to Kafka for search/recommendations
Output
No direct output — pseudo-code.
The Classic Bug: Cache Stampede
When a popular playlist's cache expires, multiple requests may hit Cassandra simultaneously. Mitigate with a mutex lock (SETNX) or use a 'stale-while-revalidate' pattern: serve stale data and refresh in background.

Monitoring and Observability

With hundreds of services, you need centralized logging, metrics, and tracing. Spotify uses Google Cloud Monitoring for metrics, Jaeger for distributed tracing, and ELK stack for logs. Every service exposes Prometheus metrics (request rate, latency, error rate). They have dashboards for each service and SLOs (e.g., playlist read latency p99 < 100ms). Tracing is critical for debugging latency issues across services. They use a correlation ID propagated via gRPC metadata. The trade-off: instrumentation adds overhead. They sample traces (1% of requests) to keep costs down. But when debugging an incident, they can increase sampling rate dynamically.

TracingSetup.systemdesignSYSTEMDESIGN
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
// io.thecodeforge — System Design tutorial

// In Go, using OpenTelemetry
import (
    "go.opentelemetry.io/otel"
    "go.opentelemetry.io/otel/trace"
)

// Create a tracer
tracer := otel.Tracer("playlist-service")

// In a request handler
func handleGetPlaylist(w http.ResponseWriter, r *http.Request) {
    ctx, span := tracer.Start(r.Context(), "getPlaylist")
    defer span.End()

    // Call downstream service with context
    metadataResp, err := metadataService.GetTrack(ctx, trackID)
    if err != nil {
        span.RecordError(err)
        http.Error(w, err.Error(), 500)
        return
    }
    // ...
}

// Propagate context via gRPC metadata
// The gRPC interceptor automatically extracts/injects trace context.
Output
No direct output — code snippet.
Interview Gold: How to Debug a Slow Request
Look at the Jaeger trace. If the span for 'getPlaylist' is 500ms but the child span 'getTrackMetadata' is 450ms, the metadata service is the bottleneck. Then drill into that service's metrics.
● Production incidentPOST-MORTEMseverity: high

The Playlist That Broke Cassandra

Symptom
Playlist reads for a popular artist's album timed out for 30% of users during a new release Friday.
Assumption
Cassandra nodes were overloaded from traffic spike.
Root cause
The playlist partition key was (user_id, playlist_id). For the artist's official playlist, millions of users added it to their library, creating a 'hot partition' on the playlist_id. Cassandra's partition size limit (100MB default) was exceeded, causing compaction storms and read timeouts.
Fix
Redesign partition key to (playlist_id, user_id) with a bucketing strategy. Use a separate table for playlist metadata (small) and a wide-row table for playlist tracks with a composite key. Set gc_grace_seconds to 0 for that table to speed up repairs.
Key lesson
  • Never let a single partition grow unbounded.
  • Always estimate max partition size and use bucketing or time-based partitioning.
Production debug guideSystematic recovery paths for the failure modes engineers actually hit.3 entries
Symptom · 01
Cassandra read timeouts for playlist queries
Fix
1. Check if partition size > 100MB (nodetool cfstats). 2. If yes, redesign partition key with bucketing. 3. Increase read consistency to ONE (not QUORUM) to reduce latency. 4. Add more nodes to spread load.
Symptom · 02
Elasticsearch search returns stale results
Fix
1. Check refresh interval (GET /tracks/_settings). 2. If > 1s, reduce to 1s. 3. Check if indexing pipeline is lagging (Kafka consumer lag). 4. Increase consumer instances.
Symptom · 03
Audio streaming fails with 403 Forbidden
Fix
1. Check if signed URL expired (token expiry). 2. Verify user's subscription status. 3. Check CDN configuration for allowed origins.
★ Design Spotify Triage Cheat SheetFirst-response commands for when things go wrong — copy-paste ready.
Cassandra read timeout `CassandraTimeoutException: Operation timed out`
Immediate action
Check partition size
Commands
nodetool cfstats playlist_tracks | grep 'Partition size'
nodetool tablestats playlist_tracks
Fix now
If partition size > 100MB, redesign partition key with bucketing.
Elasticsearch high CPU `elasticsearch[pid]: high CPU usage`+
Immediate action
Check search rate and shard count
Commands
GET /_cat/nodes?v&h=name,cpu
GET /_cat/shards?v
Fix now
Reduce shard count by merging indices or increasing node count.
Kafka consumer lag `Consumer group 'playlist-indexer' has lag of 100000`+
Immediate action
Check consumer throughput
Commands
kafka-consumer-groups --bootstrap-server localhost:9092 --group playlist-indexer --describe
Check CPU/memory of consumer instances
Fix now
Increase number of consumer instances or optimize processing logic.
Redis cache miss rate > 50% `cache_miss_rate: 0.55`+
Immediate action
Check TTL and cache key pattern
Commands
redis-cli info stats | grep keyspace_misses
redis-cli --bigkeys
Fix now
Increase TTL for hot keys or pre-populate cache on write.
Feature / AspectCassandraRelational Database (PostgreSQL)
Write throughputVery high (linear scalability)Moderate (vertical scaling limit)
Read latencyLow for single partition queriesLow for indexed queries, but joins degrade
Data model flexibilityDenormalized, wide-columnNormalized, schema-on-write
ConsistencyEventual (tunable)Strong (ACID)
Operational complexityHigh (repair, compaction)Moderate (backup, replication)
Best forWrite-heavy, high-volume, simple queriesComplex queries, transactions, small datasets

Key takeaways

1
Polyglot persistence
use the right database for each workload — Cassandra for write-heavy playlists, Elasticsearch for search, Neo4j for social graphs.
2
Offload audio streaming to a CDN with signed URLs to reduce infrastructure load and improve latency.
3
Use event-driven architecture with Kafka to decouple services and enable async processing for search indexing, recommendations, and analytics.
4
Cache hot data in Redis to reduce database load, but beware of cache stampedes
use mutex locks or stale-while-revalidate.
INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01SENIOR
How does Spotify handle playlist consistency across devices when a user ...
Q02SENIOR
Why does Spotify use Cassandra for playlists instead of a relational dat...
Q03SENIOR
What happens when a popular playlist becomes a hot partition in Cassandr...
Q04JUNIOR
Explain how Spotify's recommendation pipeline works at a high level.
Q05SENIOR
A user reports that search for a new track returns no results even thoug...
Q06SENIOR
Design a system to handle 1 billion daily active users for a music strea...
Q01 of 06SENIOR

How does Spotify handle playlist consistency across devices when a user adds a track on mobile and then opens the desktop app?

ANSWER
Playlist data is stored in Cassandra with eventual consistency. When a track is added, it's written to Cassandra and a Kafka event is published. The desktop app fetches the playlist from Cassandra on load. Since Cassandra is eventually consistent, there may be a small delay (seconds) before the update is visible. Spotify accepts this trade-off for high write throughput.
FAQ · 4 QUESTIONS

Frequently Asked Questions

01
How does Spotify store playlists at scale?
02
What's the difference between Spotify's use of Cassandra and a relational database?
03
How do I implement a recommendation system like Spotify's?
04
What happens if a Cassandra node fails in Spotify's architecture?
N
Naren Founder & Principal Engineer

20+ years shipping large-scale distributed systems. Written from production experience, not tutorials.

Follow
Verified
production tested
June 25, 2026
last updated
1,663
articles · all by Naren
🔥

That's Real World. Mark it forged?

5 min read · try the examples if you haven't

Previous
Design Tinder
31 / 40 · Real World
Next
Design Zoom