Senior 5 min · June 25, 2026

Design Spotify: A Production System Architecture That Handles 500M Users

Q: How does Spotify store playlists at scale?

Spotify uses Cassandra, a distributed NoSQL database, to store playlists. Each playlist is a partition key, and tracks are stored as rows with a clustering column for order. This allows fast reads and high write throughput.

Q: What's the difference between Spotify's use of Cassandra and a relational database?

Cassandra is optimized for write-heavy workloads and simple queries, while relational databases support complex joins and transactions. Spotify chose Cassandra for playlists because they need to handle billions of writes with low latency and linear scalability.

Q: How do I implement a recommendation system like Spotify's?

Use a batch processing pipeline (e.g., Apache Beam) that runs periodically. Process user listening history and track features using collaborative filtering and content-based filtering. Store results in a fast database like Cassandra for serving.

Q: What happens if a Cassandra node fails in Spotify's architecture?

Cassandra is designed for fault tolerance. Data is replicated across multiple nodes (typically 3). If a node fails, requests are routed to replicas. The failed node is automatically repaired when it comes back online using hinted handoff and read repair.

Design Spotify's real-world system architecture for 500M users.

Naren Founder & Principal Engineer

20+ years shipping large-scale distributed systems. Written from production experience, not tutorials.

✓ Production

production tested

June 25, 2026

last updated

1,663

articles · all by Naren

● Production Incident 🔎 Debug Guide ⚙ Triage Commands

⚡Quick Answer

Spotify's architecture uses microservices with gRPC for internal communication, Cassandra for playlist storage, and a custom content delivery network for audio streaming. They rely on event-driven pipelines for recommendations and offline processing for analytics.

✦ Definition~90s read

What is Design Spotify?

Design Spotify is the process of architecting a music streaming service that handles millions of concurrent users, stores billions of playlists, and delivers low-latency audio streaming. It involves distributed systems, microservices, and data pipelines for personalization.

★

Think of Spotify as a massive jukebox with a librarian who remembers every song you've ever played.

Plain-English First

Think of Spotify as a massive jukebox with a librarian who remembers every song you've ever played. The jukebox (CDN) stores the music files, the librarian (backend) keeps your playlists and history, and a recommendation engine (data pipeline) suggests new songs based on what you and others like. The trick is making all this work instantly for millions of people at once.

You're building a music streaming service. Day one, it works fine. Day 100, your database is on fire because you stored playlists in a relational database with joins. I've seen this exact pattern kill a startup's launch. The problem isn't the music — it's the metadata. Playlists, user history, social features — these are graph-like, write-heavy, and need to be available globally. Spotify handles 500M users with 100M+ tracks. They don't use a single database. They use a carefully chosen set of tools, each optimized for a specific job. After this article, you'll be able to design a system that scales to millions of users without falling over.

Why Microservices? The Monolith That Couldn't Stream

Spotify started as a monolith. By 2012, it was a nightmare. Deploying a change to the recommendation engine required redeploying the entire backend. A bug in playlist sharing could take down search. They split into microservices — each owning a bounded context: playlist service, social graph service, audio delivery service, search service, etc. This let them scale independently. The playlist service, for example, is write-heavy and uses Cassandra. The search service is read-heavy and uses Elasticsearch. They communicate via gRPC for low latency and Kafka for async events. The trade-off? Distributed debugging is harder. You need tracing (Jaeger) and structured logging. But for a system with 500M users, the isolation is worth it.

SpotifyMicroserviceArchitecture.systemdesignSYSTEMDESIGN

// io.thecodeforge — System Design tutorial

// High-level service interaction for adding a track to a playlist

// Client -> API Gateway -> Playlist Service
// Playlist Service writes to Cassandra, publishes event to Kafka
// Kafka event consumed by Search Service (to index), Recommendation Service (to update model)

// This is a sequence diagram in text:
// 1. Client POST /playlists/{id}/tracks
// 2. API Gateway authenticates, routes to Playlist Service
// 3. Playlist Service validates track exists (calls Metadata Service via gRPC)
// 4. Playlist Service writes to Cassandra: INSERT INTO playlist_tracks (playlist_id, track_id, added_at, position) VALUES (?, ?, ?, ?)
// 5. Playlist Service publishes event to Kafka topic 'playlist_updates'
// 6. Search Service consumes event, updates Elasticsearch index
// 7. Recommendation Service consumes event, updates user's listening history in Cassandra

// Key: gRPC for synchronous calls (low latency), Kafka for async (decoupling)

Output

No direct output — architectural diagram described.

Production Trap: gRPC Without Timeouts

If you don't set gRPC deadlines, a slow downstream service can exhaust your connection pool. Always set a deadline (e.g., 500ms) and implement circuit breakers. I've seen a metadata service slow query cascade to all playlist writes.

thecodeforge.io

Spotify Production Architecture for 500M Users

Design Spotify

Cassandra for Playlists: Why Not SQL?

Playlists are write-heavy. Users add/remove tracks, reorder, share. A relational database with normalized tables (playlists, playlist_tracks, users) would require joins on every read. With millions of playlists and billions of tracks, joins become a bottleneck. Cassandra is a wide-column store that excels at write throughput and can handle large datasets with no single point of failure. The data model: playlist_tracks table with partition key playlist_id and clustering columns (position, track_id). This allows fast range queries for a playlist's tracks. The downside? No joins, no transactions across partitions. You have to denormalize. For example, store playlist metadata (name, owner) in a separate table with the same partition key. Spotify also uses Cassandra for user library (saved tracks, albums) and listening history. The trade-off is eventual consistency — but for playlists, that's acceptable. If a user adds a track and it doesn't appear immediately on another device, it's fine.

CassandraPlaylistSchema.systemdesignSYSTEMDESIGN

// io.thecodeforge — System Design tutorial

// Cassandra schema for playlists

CREATE TABLE IF NOT EXISTS playlist_tracks (
    playlist_id uuid,
    position int,
    track_id text,
    added_at timestamp,
    added_by text,
    PRIMARY KEY (playlist_id, position)
) WITH CLUSTERING ORDER BY (position ASC);

// Query: get all tracks in a playlist ordered by position
SELECT * FROM playlist_tracks WHERE playlist_id = ? ORDER BY position ASC;

// Insert a track at the end
INSERT INTO playlist_tracks (playlist_id, position, track_id, added_at, added_by)
VALUES (?, ?, ?, toTimestamp(now()), ?);

// Note: position is managed by the application. To insert at a specific position, shift subsequent positions.
// This is a batch operation within the same partition (Cassandra supports lightweight transactions for idempotency).

// For playlist metadata:
CREATE TABLE IF NOT EXISTS playlists (
    playlist_id uuid PRIMARY KEY,
    name text,
    owner_id text,
    description text,
    created_at timestamp,
    updated_at timestamp
);

Output

No direct output — schema definition.

Senior Shortcut: Use TimeUUID for Clustering

If you need time-ordered tracks, use TimeUUID as clustering key instead of position. This avoids the complexity of shifting positions. But if users expect manual ordering, stick with position.

thecodeforge.io

Playlist Storage: SQL vs Cassandra

Design Spotify

Audio Streaming: The CDN and Adaptive Bitrate

Streaming audio is the core of Spotify. They don't serve audio from their own servers — that would be insane. They use a CDN (Google Cloud CDN, Akamai, etc.) to cache audio files close to users. Audio files are encoded in multiple bitrates (96kbps, 160kbps, 320kbps Ogg Vorbis). The client selects the appropriate bitrate based on network conditions (adaptive bitrate streaming). The challenge: how does the client know which CDN edge to hit? Spotify uses a 'delivery service' that returns a signed URL to the closest CDN edge. The URL includes a token for authentication. The client then fetches the audio file directly from the CDN. This offloads traffic from Spotify's infrastructure. The CDN handles the heavy lifting of global distribution. The trade-off: cost. CDN egress is expensive. Spotify optimizes by caching popular tracks aggressively and using peer-to-peer for some scenarios (though that's less common now).

AudioDeliveryFlow.systemdesignSYSTEMDESIGN

// io.thecodeforge — System Design tutorial

// Client requests audio stream
// 1. Client sends GET /stream/{track_id} to API Gateway
// 2. API Gateway authenticates, calls Delivery Service
// 3. Delivery Service checks user's subscription (free vs premium) to determine bitrate
// 4. Delivery Service generates a signed URL: https://cdn.spotify.com/audio/{track_id}.ogg?token={jwt}&expires={timestamp}
// 5. Client receives URL, fetches from CDN
// 6. CDN checks if file is cached. If not, fetches from origin storage (Google Cloud Storage)
// 7. Client plays audio, requests next chunk if needed

// Key: signed URL prevents unauthorized access. Expiry time is short (e.g., 1 hour).
// For live streaming, use HLS (HTTP Live Streaming) with .m3u8 playlist.

Output

No direct output — flow description.

Interview Gold: CDN Cache Invalidation

When a new version of a track is uploaded (e.g., remastered), you need to invalidate the CDN cache. Use a versioned URL (e.g., /audio/{track_id}/{version}.ogg) or purge the cache via CDN API. Spotify uses versioned URLs to avoid cache invalidation storms.

Recommendations: The Offline Pipeline That Never Sleeps

Spotify's recommendation engine is the secret sauce. It's not real-time — it's a batch processing pipeline that runs periodically (daily or hourly). It uses collaborative filtering, natural language processing (on song lyrics, blog posts), and audio analysis (tempo, key, loudness). The pipeline is built on Apache Beam (or Scio, a Scala wrapper for Beam) running on Google Cloud Dataflow. It processes user listening history (stored in Cassandra) and track features (stored in Bigtable) to generate recommendations. The output is stored in a 'recommendation table' in Cassandra, keyed by user_id. When a user opens the app, the client fetches recommendations from this table. The trade-off: recommendations are not real-time. If a user listens to a new genre, it won't affect recommendations until the next pipeline run. But that's acceptable — users don't expect instant personalization.

RecommendationPipeline.systemdesignSYSTEMDESIGN

// io.thecodeforge — System Design tutorial

// Pseudo-code for recommendation pipeline (simplified)

// 1. Read user listening history from Cassandra (last 30 days)
// 2. For each user, compute vector of liked tracks (weighted by play count, skip rate)
// 3. Use collaborative filtering to find similar users
// 4. For each user, find tracks liked by similar users that the user hasn't heard
// 5. Rank tracks by predicted affinity score
// 6. Write top 50 recommendations to Cassandra: INSERT INTO recommendations (user_id, track_id, score) VALUES (?, ?, ?)

// This runs as a Dataflow pipeline with sliding windows.
// For real-time personalization (e.g., Discover Weekly), use a separate pipeline that updates daily.

Output

No direct output — pipeline description.

Production Trap: Data Skew in Collaborative Filtering

Popular tracks get recommended to everyone, creating a feedback loop. Mitigate by downweighting popular tracks or using a 'popularity penalty' in the scoring function. Otherwise, your recommendations become a top-40 chart.

Search: Elasticsearch at Scale

Search is critical. Users search for tracks, albums, artists, playlists. Spotify uses Elasticsearch with a custom scoring function that combines text relevance (BM25) with popularity signals. The index is sharded across multiple nodes. Updates come from Kafka events (when a new track is added, metadata changes). The challenge: keeping the index fresh. Spotify uses near-real-time indexing with a refresh interval of 1 second. For large updates (e.g., new album), they use bulk indexing. The trade-off: Elasticsearch is memory-hungry. Each shard consumes heap. They optimize by using doc-values for aggregations and avoiding heavy analyzers on large fields.

ElasticsearchMapping.systemdesignSYSTEMDESIGN

// io.thecodeforge — System Design tutorial

// Elasticsearch mapping for tracks
PUT /tracks
{
  "settings": {
    "number_of_shards": 5,
    "number_of_replicas": 1,
    "refresh_interval": "1s"
  },
  "mappings": {
    "properties": {
      "track_name": { "type": "text", "analyzer": "standard" },
      "artist_name": { "type": "text", "analyzer": "standard" },
      "album_name": { "type": "text", "analyzer": "standard" },
      "popularity": { "type": "integer" },
      "duration_ms": { "type": "integer" },
      "genre": { "type": "keyword" },
      "release_date": { "type": "date" }
    }
  }
}

// Query: search for tracks by name or artist
GET /tracks/_search
{
  "query": {
    "multi_match": {
      "query": "bohemian rhapsody",
      "fields": ["track_name^3", "artist_name^2", "album_name"],
      "type": "best_fields"
    }
  },
  "sort": [
    { "popularity": "desc" }
  ]
}

Output

No direct output — mapping and query.

Never Do This: Indexing Without Aliases

Always use index aliases for zero-downtime reindexing. If you need to change mappings, create a new index, reindex, then swap alias. Otherwise, you'll have downtime or corrupted data.

Spotify has social features: follow friends, see their playlists, collaborative playlists. This is a graph problem. They use a graph database (Neo4j) for the social graph. Why not relational? Because queries like 'find all friends of friends who listened to this track' are expensive in SQL with recursive joins. Neo4j handles these with ease. The social graph is write-heavy (follow/unfollow) but read-heavy for recommendations. They replicate the graph to Cassandra for read-heavy workloads (denormalized). The trade-off: maintaining consistency between Neo4j and Cassandra. They use an event-driven approach: when a follow happens, write to Neo4j, then publish event to Kafka, which updates Cassandra. This is eventually consistent, but acceptable.

SocialGraphNeo4j.systemdesignSYSTEMDESIGN

// io.thecodeforge — System Design tutorial

// Neo4j Cypher query to find friends who listened to a track
MATCH (u:User {id: $user_id})-[:FOLLOWS]->(friend:User)-[:LISTENED]->(track:Track {id: $track_id})
RETURN friend.id

// This is efficient because Neo4j traverses relationships in constant time per hop.
// In SQL, this would require multiple joins or recursive CTEs.

// For write-heavy operations, use batch inserts:
UNWIND $relationships AS rel
MATCH (u:User {id: rel.follower}), (v:User {id: rel.followee})
MERGE (u)-[:FOLLOWS]->(v)

Output

No direct output — Cypher queries.

Senior Shortcut: Use a Separate Graph Service

Isolate the graph database behind a microservice. This prevents schema changes in the graph from affecting other services. Also, it allows you to cache frequently accessed graph queries in Redis.

Event-Driven Architecture with Kafka

Kafka is the backbone of Spotify's async communication. Every significant action (play track, add to playlist, follow user) publishes an event to Kafka. Multiple consumers process these events: search indexing, recommendation pipeline, analytics, billing (for premium). This decouples services and allows independent scaling. The key design: use Avro for schema evolution. Spotify has a schema registry to ensure compatibility. They also use Kafka Streams for real-time processing (e.g., updating user session state). The trade-off: Kafka adds latency (milliseconds) and operational complexity. But for a system with hundreds of microservices, it's essential.

KafkaEventSchema.systemdesignSYSTEMDESIGN

// io.thecodeforge — System Design tutorial

// Avro schema for track play event
{
  "type": "record",
  "name": "TrackPlayEvent",
  "namespace": "com.spotify.event",
  "fields": [
    {"name": "user_id", "type": "string"},
    {"name": "track_id", "type": "string"},
    {"name": "timestamp", "type": "long"},
    {"name": "context", "type": ["null", "string"], "default": null}, // e.g., "playlist:123"
    {"name": "source", "type": ["null", "string"], "default": null} // e.g., "search", "recommendation"
  ]
}

// Producer (in Python using confluent-kafka)
producer.produce('track_plays', value=avro_serializer(event, schema))

// Consumer (in Java)
KafkaConsumer<String, TrackPlayEvent> consumer = new KafkaConsumer<>(props);
consumer.subscribe(Arrays.asList("track_plays"));
while (true) {
    ConsumerRecords<String, TrackPlayEvent> records = consumer.poll(100);
    for (ConsumerRecord<String, TrackPlayEvent> record : records) {
        processPlayEvent(record.value());
    }
}

Output

No direct output — schema and code.

Production Trap: Schema Evolution Backward Compatibility

Always set default values for new fields. Otherwise, old consumers will break when they encounter a new schema. Use the schema registry to enforce compatibility checks.

thecodeforge.io

Event Flow: User Action to Downstream Services

Design Spotify

Caching: Redis for Hot Data

Spotify uses Redis extensively for caching: user sessions, recently played tracks, top charts, and playlist metadata for popular playlists. The key is to cache only hot data. For example, the top 1% of playlists account for 80% of reads. They cache these in Redis with a TTL of 5 minutes. For less popular playlists, they fall through to Cassandra. They also use Redis for rate limiting and distributed locks. The trade-off: cache invalidation is hard. They use a write-through cache: when a playlist is updated, they invalidate the Redis key and let the next read repopulate it from Cassandra. This ensures consistency at the cost of a cache miss.

RedisCachingPlaylist.systemdesignSYSTEMDESIGN

// io.thecodeforge — System Design tutorial

// Pseudo-code for caching playlist tracks

// On read:
function getPlaylistTracks(playlistId):
    cached = redis.get("playlist:" + playlistId)
    if cached:
        return deserialize(cached)
    tracks = cassandra.execute("SELECT * FROM playlist_tracks WHERE playlist_id = ?", playlistId)
    redis.setex("playlist:" + playlistId, 300, serialize(tracks)) // TTL 5 min
    return tracks

// On write (add track):
function addTrackToPlaylist(playlistId, trackId):
    cassandra.execute("INSERT INTO playlist_tracks ...", playlistId, trackId)
    redis.del("playlist:" + playlistId) // invalidate cache
    // Also publish event to Kafka for search/recommendations

Output

No direct output — pseudo-code.

The Classic Bug: Cache Stampede

When a popular playlist's cache expires, multiple requests may hit Cassandra simultaneously. Mitigate with a mutex lock (SETNX) or use a 'stale-while-revalidate' pattern: serve stale data and refresh in background.

Monitoring and Observability

With hundreds of services, you need centralized logging, metrics, and tracing. Spotify uses Google Cloud Monitoring for metrics, Jaeger for distributed tracing, and ELK stack for logs. Every service exposes Prometheus metrics (request rate, latency, error rate). They have dashboards for each service and SLOs (e.g., playlist read latency p99 < 100ms). Tracing is critical for debugging latency issues across services. They use a correlation ID propagated via gRPC metadata. The trade-off: instrumentation adds overhead. They sample traces (1% of requests) to keep costs down. But when debugging an incident, they can increase sampling rate dynamically.

TracingSetup.systemdesignSYSTEMDESIGN

// io.thecodeforge — System Design tutorial

// In Go, using OpenTelemetry
import (
    "go.opentelemetry.io/otel"
    "go.opentelemetry.io/otel/trace"
)

// Create a tracer
tracer := otel.Tracer("playlist-service")

// In a request handler
func handleGetPlaylist(w http.ResponseWriter, r *http.Request) {
    ctx, span := tracer.Start(r.Context(), "getPlaylist")
    defer span.End()

    // Call downstream service with context
    metadataResp, err := metadataService.GetTrack(ctx, trackID)
    if err != nil {
        span.RecordError(err)
        http.Error(w, err.Error(), 500)
        return
    }
    // ...
}

// Propagate context via gRPC metadata
// The gRPC interceptor automatically extracts/injects trace context.

Output

No direct output — code snippet.

Interview Gold: How to Debug a Slow Request

Look at the Jaeger trace. If the span for 'getPlaylist' is 500ms but the child span 'getTrackMetadata' is 450ms, the metadata service is the bottleneck. Then drill into that service's metrics.

● Production incidentPOST-MORTEMseverity: high

The Playlist That Broke Cassandra

Symptom

Playlist reads for a popular artist's album timed out for 30% of users during a new release Friday.

Assumption

Cassandra nodes were overloaded from traffic spike.

Root cause

The playlist partition key was (user_id, playlist_id). For the artist's official playlist, millions of users added it to their library, creating a 'hot partition' on the playlist_id. Cassandra's partition size limit (100MB default) was exceeded, causing compaction storms and read timeouts.

Fix

Redesign partition key to (playlist_id, user_id) with a bucketing strategy. Use a separate table for playlist metadata (small) and a wide-row table for playlist tracks with a composite key. Set gc_grace_seconds to 0 for that table to speed up repairs.

Key lesson

Never let a single partition grow unbounded.
Always estimate max partition size and use bucketing or time-based partitioning.

Production debug guideSystematic recovery paths for the failure modes engineers actually hit.3 entries

Symptom · 01

Cassandra read timeouts for playlist queries

→

Fix

1. Check if partition size > 100MB (nodetool cfstats). 2. If yes, redesign partition key with bucketing. 3. Increase read consistency to ONE (not QUORUM) to reduce latency. 4. Add more nodes to spread load.

Symptom · 02

Elasticsearch search returns stale results

→

Fix

1. Check refresh interval (GET /tracks/_settings). 2. If > 1s, reduce to 1s. 3. Check if indexing pipeline is lagging (Kafka consumer lag). 4. Increase consumer instances.

Symptom · 03

Audio streaming fails with 403 Forbidden

→

Fix

1. Check if signed URL expired (token expiry). 2. Verify user's subscription status. 3. Check CDN configuration for allowed origins.

★ Design Spotify Triage Cheat SheetFirst-response commands for when things go wrong — copy-paste ready.

Cassandra read timeout `CassandraTimeoutException: Operation timed out`−

Immediate action

Check partition size

Commands

nodetool cfstats playlist_tracks | grep 'Partition size'

nodetool tablestats playlist_tracks

Fix now

If partition size > 100MB, redesign partition key with bucketing.

Elasticsearch high CPU `elasticsearch[pid]: high CPU usage`+

Kafka consumer lag `Consumer group 'playlist-indexer' has lag of 100000`+

Redis cache miss rate > 50% `cache_miss_rate: 0.55`+

Feature / Aspect	Cassandra	Relational Database (PostgreSQL)
Write throughput	Very high (linear scalability)	Moderate (vertical scaling limit)
Read latency	Low for single partition queries	Low for indexed queries, but joins degrade
Data model flexibility	Denormalized, wide-column	Normalized, schema-on-write
Consistency	Eventual (tunable)	Strong (ACID)
Operational complexity	High (repair, compaction)	Moderate (backup, replication)
Best for	Write-heavy, high-volume, simple queries	Complex queries, transactions, small datasets

Key takeaways

Polyglot persistence

use the right database for each workload — Cassandra for write-heavy playlists, Elasticsearch for search, Neo4j for social graphs.

Offload audio streaming to a CDN with signed URLs to reduce infrastructure load and improve latency.

Use event-driven architecture with Kafka to decouple services and enable async processing for search indexing, recommendations, and analytics.

Cache hot data in Redis to reduce database load, but beware of cache stampedes

use mutex locks or stale-while-revalidate.

INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01SENIOR

How does Spotify handle playlist consistency across devices when a user ...

Q02SENIOR

Why does Spotify use Cassandra for playlists instead of a relational dat...

Q03SENIOR

What happens when a popular playlist becomes a hot partition in Cassandr...

Q04JUNIOR

Explain how Spotify's recommendation pipeline works at a high level.

Q05SENIOR

A user reports that search for a new track returns no results even thoug...

Q06SENIOR

Design a system to handle 1 billion daily active users for a music strea...

Q01 of 06SENIOR

How does Spotify handle playlist consistency across devices when a user adds a track on mobile and then opens the desktop app?

ANSWER

Playlist data is stored in Cassandra with eventual consistency. When a track is added, it's written to Cassandra and a Kafka event is published. The desktop app fetches the playlist from Cassandra on load. Since Cassandra is eventually consistent, there may be a small delay (seconds) before the update is visible. Spotify accepts this trade-off for high write throughput.

FAQ · 4 QUESTIONS

Frequently Asked Questions

How does Spotify store playlists at scale?

What's the difference between Spotify's use of Cassandra and a relational database?

How do I implement a recommendation system like Spotify's?

What happens if a Cassandra node fails in Spotify's architecture?

Naren Founder & Principal Engineer

20+ years shipping large-scale distributed systems. Written from production experience, not tutorials.

✓ Verified

production tested

June 25, 2026

last updated

1,663

articles · all by Naren

🔥

That's Real World. Mark it forged?

5 min read · try the examples if you haven't

Design Spotify: A Production System Architecture That Handles 500M Users

Why Microservices? The Monolith That Couldn't Stream

Cassandra for Playlists: Why Not SQL?

Audio Streaming: The CDN and Adaptive Bitrate

Recommendations: The Offline Pipeline That Never Sleeps

Search: Elasticsearch at Scale

Social Features: Graph Database for Friends and Follows

Event-Driven Architecture with Kafka

Caching: Redis for Hot Data

Monitoring and Observability

The Playlist That Broke Cassandra

Key takeaways

Interview Questions on This Topic

Frequently Asked Questions

That's Real World. Mark it forged?