Spotify's architecture uses microservices with gRPC for internal communication, Cassandra for playlist storage, and a custom content delivery network for audio streaming. They rely on event-driven pipelines for recommendations and offline processing for analytics.
✦ Definition~90s read
What is Design Spotify?
Design Spotify is the process of architecting a music streaming service that handles millions of concurrent users, stores billions of playlists, and delivers low-latency audio streaming. It involves distributed systems, microservices, and data pipelines for personalization.
★
Think of Spotify as a massive jukebox with a librarian who remembers every song you've ever played.
Plain-English First
Think of Spotify as a massive jukebox with a librarian who remembers every song you've ever played. The jukebox (CDN) stores the music files, the librarian (backend) keeps your playlists and history, and a recommendation engine (data pipeline) suggests new songs based on what you and others like. The trick is making all this work instantly for millions of people at once.
You're building a music streaming service. Day one, it works fine. Day 100, your database is on fire because you stored playlists in a relational database with joins. I've seen this exact pattern kill a startup's launch. The problem isn't the music — it's the metadata. Playlists, user history, social features — these are graph-like, write-heavy, and need to be available globally. Spotify handles 500M users with 100M+ tracks. They don't use a single database. They use a carefully chosen set of tools, each optimized for a specific job. After this article, you'll be able to design a system that scales to millions of users without falling over.
Why Microservices? The Monolith That Couldn't Stream
Spotify started as a monolith. By 2012, it was a nightmare. Deploying a change to the recommendation engine required redeploying the entire backend. A bug in playlist sharing could take down search. They split into microservices — each owning a bounded context: playlist service, social graph service, audio delivery service, search service, etc. This let them scale independently. The playlist service, for example, is write-heavy and uses Cassandra. The search service is read-heavy and uses Elasticsearch. They communicate via gRPC for low latency and Kafka for async events. The trade-off? Distributed debugging is harder. You need tracing (Jaeger) and structured logging. But for a system with 500M users, the isolation is worth it.
// io.thecodeforge — SystemDesign tutorial
// High-level service interaction for adding a track to a playlist
// Client -> APIGateway -> PlaylistService
// PlaylistService writes to Cassandra, publishes event to Kafka
// Kafka event consumed by SearchService (to index), RecommendationService (to update model)
// This is a sequence diagram in text:
// 1. ClientPOST /playlists/{id}/tracks
// 2. APIGateway authenticates, routes to PlaylistService
// 3. PlaylistService validates track exists (calls MetadataService via gRPC)
// 4. PlaylistService writes to Cassandra: INSERTINTOplaylist_tracks (playlist_id, track_id, added_at, position) VALUES (?, ?, ?, ?)
// 5. PlaylistService publishes event to Kafka topic 'playlist_updates'
// 6. SearchService consumes event, updates Elasticsearch index
// 7. RecommendationService consumes event, updates user's listening history in Cassandra
// Key: gRPC for synchronous calls (low latency), Kafkaforasync (decoupling)
Output
No direct output — architectural diagram described.
Production Trap: gRPC Without Timeouts
If you don't set gRPC deadlines, a slow downstream service can exhaust your connection pool. Always set a deadline (e.g., 500ms) and implement circuit breakers. I've seen a metadata service slow query cascade to all playlist writes.
thecodeforge.io
Spotify Production Architecture for 500M Users
Design Spotify
Cassandra for Playlists: Why Not SQL?
Playlists are write-heavy. Users add/remove tracks, reorder, share. A relational database with normalized tables (playlists, playlist_tracks, users) would require joins on every read. With millions of playlists and billions of tracks, joins become a bottleneck. Cassandra is a wide-column store that excels at write throughput and can handle large datasets with no single point of failure. The data model: playlist_tracks table with partition key playlist_id and clustering columns (position, track_id). This allows fast range queries for a playlist's tracks. The downside? No joins, no transactions across partitions. You have to denormalize. For example, store playlist metadata (name, owner) in a separate table with the same partition key. Spotify also uses Cassandra for user library (saved tracks, albums) and listening history. The trade-off is eventual consistency — but for playlists, that's acceptable. If a user adds a track and it doesn't appear immediately on another device, it's fine.
CassandraPlaylistSchema.systemdesignSYSTEMDESIGN
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
// io.thecodeforge — SystemDesign tutorial
// Cassandra schema for playlists
CREATETABLEIFNOTEXISTSplaylist_tracks (
playlist_id uuid,
position int,
track_id text,
added_at timestamp,
added_by text,
PRIMARYKEY (playlist_id, position)
) WITHCLUSTERINGORDERBY (position ASC);
// Query: get all tracks in a playlist ordered by position
SELECT * FROM playlist_tracks WHERE playlist_id = ? ORDERBY position ASC;
// Insert a track at the end
INSERTINTOplaylist_tracks (playlist_id, position, track_id, added_at, added_by)
VALUES (?, ?, ?, toTimestamp(now()), ?);
// Note: position is managed by the application. To insert at a specific position, shift subsequent positions.
// This is a batch operation within the same partition (Cassandra supports lightweight transactions for idempotency).
// For playlist metadata:
CREATETABLEIFNOTEXISTSplaylists (
playlist_id uuid PRIMARYKEY,
name text,
owner_id text,
description text,
created_at timestamp,
updated_at timestamp
);
Output
No direct output — schema definition.
Senior Shortcut: Use TimeUUID for Clustering
If you need time-ordered tracks, use TimeUUID as clustering key instead of position. This avoids the complexity of shifting positions. But if users expect manual ordering, stick with position.
thecodeforge.io
Playlist Storage: SQL vs Cassandra
Design Spotify
Audio Streaming: The CDN and Adaptive Bitrate
Streaming audio is the core of Spotify. They don't serve audio from their own servers — that would be insane. They use a CDN (Google Cloud CDN, Akamai, etc.) to cache audio files close to users. Audio files are encoded in multiple bitrates (96kbps, 160kbps, 320kbps Ogg Vorbis). The client selects the appropriate bitrate based on network conditions (adaptive bitrate streaming). The challenge: how does the client know which CDN edge to hit? Spotify uses a 'delivery service' that returns a signed URL to the closest CDN edge. The URL includes a token for authentication. The client then fetches the audio file directly from the CDN. This offloads traffic from Spotify's infrastructure. The CDN handles the heavy lifting of global distribution. The trade-off: cost. CDN egress is expensive. Spotify optimizes by caching popular tracks aggressively and using peer-to-peer for some scenarios (though that's less common now).
AudioDeliveryFlow.systemdesignSYSTEMDESIGN
1
2
3
4
5
6
7
8
9
10
11
12
13
// io.thecodeforge — SystemDesign tutorial
// Client requests audio stream
// 1. Client sends GET /stream/{track_id} to APIGateway
// 2. APIGateway authenticates, calls DeliveryService
// 3. DeliveryService checks user's subscription (free vs premium) to determine bitrate
// 4. DeliveryService generates a signed URL: https://cdn.spotify.com/audio/{track_id}.ogg?token={jwt}&expires={timestamp}
// 5. Client receives URL, fetches from CDN
// 6. CDN checks if file is cached. If not, fetches from origin storage (GoogleCloudStorage)
// 7. Client plays audio, requests next chunk if needed
// Key: signed URL prevents unauthorized access. Expiry time is short (e.g., 1 hour).
// For live streaming, use HLS (HTTPLiveStreaming) with .m3u8 playlist.
Output
No direct output — flow description.
Interview Gold: CDN Cache Invalidation
When a new version of a track is uploaded (e.g., remastered), you need to invalidate the CDN cache. Use a versioned URL (e.g., /audio/{track_id}/{version}.ogg) or purge the cache via CDN API. Spotify uses versioned URLs to avoid cache invalidation storms.
Recommendations: The Offline Pipeline That Never Sleeps
Spotify's recommendation engine is the secret sauce. It's not real-time — it's a batch processing pipeline that runs periodically (daily or hourly). It uses collaborative filtering, natural language processing (on song lyrics, blog posts), and audio analysis (tempo, key, loudness). The pipeline is built on Apache Beam (or Scio, a Scala wrapper for Beam) running on Google Cloud Dataflow. It processes user listening history (stored in Cassandra) and track features (stored in Bigtable) to generate recommendations. The output is stored in a 'recommendation table' in Cassandra, keyed by user_id. When a user opens the app, the client fetches recommendations from this table. The trade-off: recommendations are not real-time. If a user listens to a new genre, it won't affect recommendations until the next pipeline run. But that's acceptable — users don't expect instant personalization.
RecommendationPipeline.systemdesignSYSTEMDESIGN
1
2
3
4
5
6
7
8
9
10
11
12
13
// io.thecodeforge — SystemDesign tutorial
// Pseudo-code for recommendation pipeline (simplified)
// 1. Read user listening history from Cassandra (last 30 days)
// 2. For each user, compute vector of liked tracks (weighted by play count, skip rate)
// 3. Use collaborative filtering to find similar users
// 4. For each user, find tracks liked by similar users that the user hasn't heard
// 5. Rank tracks by predicted affinity score
// 6. Write top 50 recommendations to Cassandra: INSERTINTOrecommendations (user_id, track_id, score) VALUES (?, ?, ?)
// This runs as a Dataflow pipeline with sliding windows.
// For real-time personalization (e.g., DiscoverWeekly), use a separate pipeline that updates daily.
Output
No direct output — pipeline description.
Production Trap: Data Skew in Collaborative Filtering
Popular tracks get recommended to everyone, creating a feedback loop. Mitigate by downweighting popular tracks or using a 'popularity penalty' in the scoring function. Otherwise, your recommendations become a top-40 chart.
Search: Elasticsearch at Scale
Search is critical. Users search for tracks, albums, artists, playlists. Spotify uses Elasticsearch with a custom scoring function that combines text relevance (BM25) with popularity signals. The index is sharded across multiple nodes. Updates come from Kafka events (when a new track is added, metadata changes). The challenge: keeping the index fresh. Spotify uses near-real-time indexing with a refresh interval of 1 second. For large updates (e.g., new album), they use bulk indexing. The trade-off: Elasticsearch is memory-hungry. Each shard consumes heap. They optimize by using doc-values for aggregations and avoiding heavy analyzers on large fields.
Always use index aliases for zero-downtime reindexing. If you need to change mappings, create a new index, reindex, then swap alias. Otherwise, you'll have downtime or corrupted data.
Social Features: Graph Database for Friends and Follows
Spotify has social features: follow friends, see their playlists, collaborative playlists. This is a graph problem. They use a graph database (Neo4j) for the social graph. Why not relational? Because queries like 'find all friends of friends who listened to this track' are expensive in SQL with recursive joins. Neo4j handles these with ease. The social graph is write-heavy (follow/unfollow) but read-heavy for recommendations. They replicate the graph to Cassandra for read-heavy workloads (denormalized). The trade-off: maintaining consistency between Neo4j and Cassandra. They use an event-driven approach: when a follow happens, write to Neo4j, then publish event to Kafka, which updates Cassandra. This is eventually consistent, but acceptable.
SocialGraphNeo4j.systemdesignSYSTEMDESIGN
1
2
3
4
5
6
7
8
9
10
11
12
13
// io.thecodeforge — SystemDesign tutorial
// Neo4jCypher query to find friends who listened to a track
MATCH (u:User {id: $user_id})-[:FOLLOWS]->(friend:User)-[:LISTENED]->(track:Track {id: $track_id})
RETURN friend.id
// This is efficient because Neo4j traverses relationships in constant time per hop.
// InSQL, this would require multiple joins or recursive CTEs.
// For write-heavy operations, use batch inserts:
UNWIND $relationships AS rel
MATCH (u:User {id: rel.follower}), (v:User {id: rel.followee})
MERGE (u)-[:FOLLOWS]->(v)
Output
No direct output — Cypher queries.
Senior Shortcut: Use a Separate Graph Service
Isolate the graph database behind a microservice. This prevents schema changes in the graph from affecting other services. Also, it allows you to cache frequently accessed graph queries in Redis.
Event-Driven Architecture with Kafka
Kafka is the backbone of Spotify's async communication. Every significant action (play track, add to playlist, follow user) publishes an event to Kafka. Multiple consumers process these events: search indexing, recommendation pipeline, analytics, billing (for premium). This decouples services and allows independent scaling. The key design: use Avro for schema evolution. Spotify has a schema registry to ensure compatibility. They also use Kafka Streams for real-time processing (e.g., updating user session state). The trade-off: Kafka adds latency (milliseconds) and operational complexity. But for a system with hundreds of microservices, it's essential.
Production Trap: Schema Evolution Backward Compatibility
Always set default values for new fields. Otherwise, old consumers will break when they encounter a new schema. Use the schema registry to enforce compatibility checks.
thecodeforge.io
Event Flow: User Action to Downstream Services
Design Spotify
Caching: Redis for Hot Data
Spotify uses Redis extensively for caching: user sessions, recently played tracks, top charts, and playlist metadata for popular playlists. The key is to cache only hot data. For example, the top 1% of playlists account for 80% of reads. They cache these in Redis with a TTL of 5 minutes. For less popular playlists, they fall through to Cassandra. They also use Redis for rate limiting and distributed locks. The trade-off: cache invalidation is hard. They use a write-through cache: when a playlist is updated, they invalidate the Redis key and let the next read repopulate it from Cassandra. This ensures consistency at the cost of a cache miss.
RedisCachingPlaylist.systemdesignSYSTEMDESIGN
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
// io.thecodeforge — SystemDesign tutorial
// Pseudo-code for caching playlist tracks
// On read:
function getPlaylistTracks(playlistId):
cached = redis.get("playlist:" + playlistId)
if cached:
returndeserialize(cached)
tracks = cassandra.execute("SELECT * FROM playlist_tracks WHERE playlist_id = ?", playlistId)
redis.setex("playlist:" + playlistId, 300, serialize(tracks)) // TTL5 min
return tracks
// Onwrite (add track):
function addTrackToPlaylist(playlistId, trackId):
cassandra.execute("INSERT INTO playlist_tracks ...", playlistId, trackId)
redis.del("playlist:" + playlistId) // invalidate cache
// Also publish event to Kafkafor search/recommendations
Output
No direct output — pseudo-code.
The Classic Bug: Cache Stampede
When a popular playlist's cache expires, multiple requests may hit Cassandra simultaneously. Mitigate with a mutex lock (SETNX) or use a 'stale-while-revalidate' pattern: serve stale data and refresh in background.
Monitoring and Observability
With hundreds of services, you need centralized logging, metrics, and tracing. Spotify uses Google Cloud Monitoring for metrics, Jaeger for distributed tracing, and ELK stack for logs. Every service exposes Prometheus metrics (request rate, latency, error rate). They have dashboards for each service and SLOs (e.g., playlist read latency p99 < 100ms). Tracing is critical for debugging latency issues across services. They use a correlation ID propagated via gRPC metadata. The trade-off: instrumentation adds overhead. They sample traces (1% of requests) to keep costs down. But when debugging an incident, they can increase sampling rate dynamically.
TracingSetup.systemdesignSYSTEMDESIGN
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
// io.thecodeforge — SystemDesign tutorial
// InGo, using OpenTelemetryimport (
"go.opentelemetry.io/otel""go.opentelemetry.io/otel/trace"
)
// Create a tracer
tracer := otel.Tracer("playlist-service")
// In a request handler
func handleGetPlaylist(w http.ResponseWriter, r *http.Request) {
ctx, span := tracer.Start(r.Context(), "getPlaylist")
defer span.End()
// Call downstream service with context
metadataResp, err := metadataService.GetTrack(ctx, trackID)
if err != nil {
span.RecordError(err)
http.Error(w, err.Error(), 500)
return
}
// ...
}
// Propagate context via gRPC metadata
// The gRPC interceptor automatically extracts/injects trace context.
Output
No direct output — code snippet.
Interview Gold: How to Debug a Slow Request
Look at the Jaeger trace. If the span for 'getPlaylist' is 500ms but the child span 'getTrackMetadata' is 450ms, the metadata service is the bottleneck. Then drill into that service's metrics.
● Production incidentPOST-MORTEMseverity: high
The Playlist That Broke Cassandra
Symptom
Playlist reads for a popular artist's album timed out for 30% of users during a new release Friday.
Assumption
Cassandra nodes were overloaded from traffic spike.
Root cause
The playlist partition key was (user_id, playlist_id). For the artist's official playlist, millions of users added it to their library, creating a 'hot partition' on the playlist_id. Cassandra's partition size limit (100MB default) was exceeded, causing compaction storms and read timeouts.
Fix
Redesign partition key to (playlist_id, user_id) with a bucketing strategy. Use a separate table for playlist metadata (small) and a wide-row table for playlist tracks with a composite key. Set gc_grace_seconds to 0 for that table to speed up repairs.
Key lesson
Never let a single partition grow unbounded.
Always estimate max partition size and use bucketing or time-based partitioning.
Production debug guideSystematic recovery paths for the failure modes engineers actually hit.3 entries
Symptom · 01
Cassandra read timeouts for playlist queries
→
Fix
1. Check if partition size > 100MB (nodetool cfstats). 2. If yes, redesign partition key with bucketing. 3. Increase read consistency to ONE (not QUORUM) to reduce latency. 4. Add more nodes to spread load.
Symptom · 02
Elasticsearch search returns stale results
→
Fix
1. Check refresh interval (GET /tracks/_settings). 2. If > 1s, reduce to 1s. 3. Check if indexing pipeline is lagging (Kafka consumer lag). 4. Increase consumer instances.
Symptom · 03
Audio streaming fails with 403 Forbidden
→
Fix
1. Check if signed URL expired (token expiry). 2. Verify user's subscription status. 3. Check CDN configuration for allowed origins.
★ Design Spotify Triage Cheat SheetFirst-response commands for when things go wrong — copy-paste ready.
Increase number of consumer instances or optimize processing logic.
Redis cache miss rate > 50% `cache_miss_rate: 0.55`+
Immediate action
Check TTL and cache key pattern
Commands
redis-cli info stats | grep keyspace_misses
redis-cli --bigkeys
Fix now
Increase TTL for hot keys or pre-populate cache on write.
Feature / Aspect
Cassandra
Relational Database (PostgreSQL)
Write throughput
Very high (linear scalability)
Moderate (vertical scaling limit)
Read latency
Low for single partition queries
Low for indexed queries, but joins degrade
Data model flexibility
Denormalized, wide-column
Normalized, schema-on-write
Consistency
Eventual (tunable)
Strong (ACID)
Operational complexity
High (repair, compaction)
Moderate (backup, replication)
Best for
Write-heavy, high-volume, simple queries
Complex queries, transactions, small datasets
Key takeaways
1
Polyglot persistence
use the right database for each workload — Cassandra for write-heavy playlists, Elasticsearch for search, Neo4j for social graphs.
2
Offload audio streaming to a CDN with signed URLs to reduce infrastructure load and improve latency.
3
Use event-driven architecture with Kafka to decouple services and enable async processing for search indexing, recommendations, and analytics.
4
Cache hot data in Redis to reduce database load, but beware of cache stampedes
use mutex locks or stale-while-revalidate.
INTERVIEW PREP · PRACTICE MODE
Interview Questions on This Topic
Q01SENIOR
How does Spotify handle playlist consistency across devices when a user ...
Q02SENIOR
Why does Spotify use Cassandra for playlists instead of a relational dat...
Q03SENIOR
What happens when a popular playlist becomes a hot partition in Cassandr...
Q04JUNIOR
Explain how Spotify's recommendation pipeline works at a high level.
Q05SENIOR
A user reports that search for a new track returns no results even thoug...
Q06SENIOR
Design a system to handle 1 billion daily active users for a music strea...
Q01 of 06SENIOR
How does Spotify handle playlist consistency across devices when a user adds a track on mobile and then opens the desktop app?
ANSWER
Playlist data is stored in Cassandra with eventual consistency. When a track is added, it's written to Cassandra and a Kafka event is published. The desktop app fetches the playlist from Cassandra on load. Since Cassandra is eventually consistent, there may be a small delay (seconds) before the update is visible. Spotify accepts this trade-off for high write throughput.
Q02 of 06SENIOR
Why does Spotify use Cassandra for playlists instead of a relational database like PostgreSQL?
ANSWER
Playlists are write-heavy with simple queries (get all tracks by playlist_id). Cassandra provides linear scalability for writes and low latency for single-partition queries. A relational database would require joins and struggle with write throughput at Spotify's scale. The trade-off is giving up joins and transactions, which are not needed for playlists.
Q03 of 06SENIOR
What happens when a popular playlist becomes a hot partition in Cassandra? How would you mitigate it?
ANSWER
A hot partition occurs when a single partition (playlist_id) receives too many reads/writes. Symptoms: high latency, compaction storms. Mitigation: use a composite partition key (e.g., playlist_id + bucket_id) to distribute load. Also, cache the playlist in Redis to reduce Cassandra reads.
Q04 of 06JUNIOR
Explain how Spotify's recommendation pipeline works at a high level.
ANSWER
It's a batch pipeline using Apache Beam on Dataflow. It processes user listening history (from Cassandra) and track features (from Bigtable) to compute recommendations using collaborative filtering and NLP. Results are stored in Cassandra and served to clients. It runs daily, so recommendations are not real-time.
Q05 of 06SENIOR
A user reports that search for a new track returns no results even though it was added an hour ago. What could be wrong?
ANSWER
The search index may not have been updated. Check the Kafka consumer lag for the search indexing pipeline. If lag is high, the consumer may be overwhelmed. Also check Elasticsearch refresh interval (should be 1s). If the track was added via a bulk import, ensure the indexing job completed.
Q06 of 06SENIOR
Design a system to handle 1 billion daily active users for a music streaming service. What are the key components?
ANSWER
Key components: API Gateway, microservices (playlist, search, social, audio delivery), Cassandra for playlists, Elasticsearch for search, Neo4j for social graph, Kafka for async events, CDN for audio, Redis for caching, and a batch pipeline for recommendations. Each component must scale independently. Use sharding and replication for databases. Use global load balancers and multiple data centers for low latency.
01
How does Spotify handle playlist consistency across devices when a user adds a track on mobile and then opens the desktop app?
SENIOR
02
Why does Spotify use Cassandra for playlists instead of a relational database like PostgreSQL?
SENIOR
03
What happens when a popular playlist becomes a hot partition in Cassandra? How would you mitigate it?
SENIOR
04
Explain how Spotify's recommendation pipeline works at a high level.
JUNIOR
05
A user reports that search for a new track returns no results even though it was added an hour ago. What could be wrong?
SENIOR
06
Design a system to handle 1 billion daily active users for a music streaming service. What are the key components?
SENIOR
FAQ · 4 QUESTIONS
Frequently Asked Questions
01
How does Spotify store playlists at scale?
Spotify uses Cassandra, a distributed NoSQL database, to store playlists. Each playlist is a partition key, and tracks are stored as rows with a clustering column for order. This allows fast reads and high write throughput.
Was this helpful?
02
What's the difference between Spotify's use of Cassandra and a relational database?
Cassandra is optimized for write-heavy workloads and simple queries, while relational databases support complex joins and transactions. Spotify chose Cassandra for playlists because they need to handle billions of writes with low latency and linear scalability.
Was this helpful?
03
How do I implement a recommendation system like Spotify's?
Use a batch processing pipeline (e.g., Apache Beam) that runs periodically. Process user listening history and track features using collaborative filtering and content-based filtering. Store results in a fast database like Cassandra for serving.
Was this helpful?
04
What happens if a Cassandra node fails in Spotify's architecture?
Cassandra is designed for fault tolerance. Data is replicated across multiple nodes (typically 3). If a node fails, requests are routed to replicas. The failed node is automatically repaired when it comes back online using hinted handoff and read repair.