Netflix System Design — CDN Cache Miss Storms at Scale
CDN cache hit ratio dropped from 95% to 40%, causing widespread user buffering.
- Netflix architecture: tiered CDN delivery + fault-tolerant microservices + ML recommendations
- CDN caches video chunks at ISP edges — travel distance <10km for most users
- Adaptive bitrate (ABR) chunks video into 2-second segments at multiple bitrates
- Client-side ABR algorithm selects bitrate based on current bandwidth and buffer
- Microservices are isolated with circuit breakers — one service failing doesn't cascade
- Chaos Engineering (Chaos Monkey) proactively kills instances to test resilience
Netflix streams to over 250 million subscribers across 190 countries, serving more than 100 million hours of video every single day. At peak hours in North America, Netflix alone accounts for roughly 15% of all downstream internet traffic. When you're designing a system at that scale, every architectural decision — from how you encode a video file to how you route a DNS query — has a measurable dollar cost and a direct impact on whether someone abandons their Friday-night movie before the credits roll. This isn't an academic exercise; it's the kind of problem where a 500ms latency increase costs millions in subscriber churn.
The core challenge Netflix solves is fundamentally different from a typical web application. A social media post weighs a few kilobytes; a 4K HDR episode of a prestige drama weighs 15–20 gigabytes. You can't serve that from a handful of origin servers the way you'd serve a JSON API. You need a tiered delivery hierarchy, intelligent client-side adaptation, and fault-tolerant microservices that can independently fail without taking the whole platform down. On top of the delivery problem sits the discovery problem: recommending the right title to the right person at the right moment, at a scale where even a 1% improvement in recommendation accuracy translates to hundreds of millions in retained subscriptions.
By the end of this article you'll be able to walk into a system design interview and confidently reason through Netflix's full architecture — from the moment a user clicks play to the moment a video frame renders on screen. You'll understand the tradeoffs behind each major subsystem, know which database engines Netflix actually uses and why, recognize the failure modes that keep SREs awake at night, and speak credibly about how adaptive bitrate streaming actually works under the hood. Let's build it from the ground up.
System Requirements and Scale
Start with the numbers that drive every decision. Netflix serves 250M+ subscribers, each watching an average of 2 hours per day. That's 500 million hours of streaming daily — 57,000 years of video every 24 hours.
Storage: A 4K movie is roughly 20GB. With 20,000+ titles, the master library is 400TB. But you don't serve masters — you serve 50+ encoded versions per title (different resolutions, codecs, bitrates). That pushes total storage to tens of petabytes.
Bandwidth: Peak hours see 15% of all downstream internet traffic in North America. You need multi-terabit-per-second capacity at the CDN edge. Every 1% improvement in compression efficiency saves millions in bandwidth costs.
Latency: A 500ms increase in start-up time drops viewer retention by 20%. Your architecture must deliver the first video frame in under 2 seconds.
High-Level Architecture
Netflix's architecture breaks into four layers: Client Apps (web, mobile, TV), CDN Layer (Open Connect Appliances), Backend Services (microservices running on AWS), and Data Layer (Cassandra, EVCache, S3).
Client apps are thin — they fetch manifests, chunks, and metadata from APIs. The CDN is a massive network of custom appliances deployed inside ISP data centers, serving 95%+ of traffic. Backend services are hundreds of microservices, each owned by a small team, communicating via REST or gRPC. Data is spread across multiple database technologies: Cassandra for user viewing history and profiles, EVCache (Memcached-based) for session and recommendation caches, and S3 for raw video and assets.
The glue that holds it together: a service mesh (Envoy) for traffic management, Hystrix for circuit breaking, and Chaos Monkey to continuously test resilience.
Content Delivery: CDN and Adaptive Bitrate Streaming
Netflix doesn't serve a single video file. Each title is encoded into 50+ versions (resolutions from 360p to 4K, codecs like H.264, AV1, VP9). These are sliced into 2-second chunks and placed on CDN edge servers (Open Connect Appliances). When you press play, the client fetches a manifest file (MPD or M3U8) listing available bitrates and chunk URLs. The client's ABR algorithm — typically BOLA or a custom buffer-based scheme — monitors download speed and buffer fill level. If the buffer drops below 5 seconds, it requests a lower bitrate chunk. If the buffer is healthy, it might try the next higher bitrate.
Open Connect is Netflix's secret weapon. They lease space in ISP data centers, install custom servers pre-loaded with popular content. A user's DNS request resolves to the nearest Open Connect appliance. If the content isn't there (cache miss), the appliance fetches from a regional hub, which may fetch from Netflix's own origin in AWS. The hop count is at most 2 or 3, keeping latency low.
Microservices and Fault Tolerance
Netflix runs hundreds of microservices, each responsible for a narrow domain (user profile, playback, billing, recommendations, etc.). They communicate via synchronous REST or gRPC, and asynchronously via Kafka for events like 'new title added' or 'user watched'.
The critical pattern here is the Circuit Breaker, implemented via Netflix's Hystrix library. Each remote call is wrapped in a HystrixCommand. If the call fails or times out (e.g., 90% of calls fail within 10 seconds), the circuit breaker opens. Subsequent calls fail immediately without hitting the backend. After a configurable sleep window (e.g., 30 seconds), a single probe is allowed through. If it succeeds, the circuit closes. This prevents cascading failures when one service degrades.
Chaos Monkey is an extension of this philosophy: it randomly terminates instances in production during business hours, forcing engineers to build systems that survive individual failures. Over time, this has made the platform remarkably resilient — Netflix can lose an entire AWS Availability Zone and still serve users seamlessly.
Data Storage and Caching
Netflix uses multiple database engines, each chosen for a specific workload:
- Cassandra: Used for all user metadata — profiles, viewing history, ratings. Netflix runs one of the largest Cassandra clusters in the world (~2000 nodes). The data model is heavily denormalized for fast reads. Each row is keyed by user ID, with columns for watch history, recommendations metadata, etc.
- EVCache: A Memcached-based distributed cache layer. Used for session data, short-lived recommendations, and any data that needs sub-millisecond latency. EVCache is built on top of Amazon EC2 instances with SSDs, and data is replicated across two AZs for durability.
- S3 (Simple Storage Service): All raw video masters, encoded chunks, thumbnail images, and static assets live in S3. The CDN cache is populated from S3 via a pre-fetch pipeline.
- Elasticsearch: Used for log aggregation and search within internal tools (e.g., finding playback errors per content ID).
The key tradeoff: Cassandra provides high write throughput (ingesting viewing events) and tunable consistency, but doesn't support complex queries. That's why Netflix uses separate services for search and recommendations.
Recommendations Engine
Netflix's recommendations are responsible for 80% of content discovery. The system uses a mix of collaborative filtering, content-based filtering, and deep learning models. The pipeline has two parts:
Offline training: Machine learning models (e.g., factorization machines, neural networks) are trained on historical viewing data (Cassandra + S3). This produces user embeddings and item embeddings. Training runs daily on large Spark clusters.
Online serving: When a user opens Netflix, the recommendation service fetches their embeddings from EVCache, scores all available titles using nearest-neighbor search (approximate via FAISS or similar), and returns the top 20-40 titles. This must happen in under 200ms.
Personalization also considers time of day, device type, and recent searches. A/B testing is continuous — Netflix runs thousands of experiments simultaneously to evaluate new models.
| Database | Primary Use | Why Not the Others |
|---|---|---|
| Cassandra | User metadata, viewing history | High write throughput, wide rows. Not good for complex joins (use Elasticsearch for search). |
| EVCache | Session cache, recommendation lookups | Sub-millisecond reads. Not durable (data can be lost). Uses replication for availability. |
| S3 | Video masters and encoded chunks | Durable, cheap, but high latency for random reads. CDN caches mitigate this. |
| Elasticsearch | Log search, debugging | Full-text search power, but not designed as primary data store. |
Key Takeaways
- Netflix system design is a masterclass in distributed systems at extreme scale
- The CDN is the most critical infrastructure — every millisecond saved comes from proximity
- Circuit breakers and bulkheading prevent cascading failures in microservices
- Polyglot persistence: use the right database for each workload
- Chaos engineering (Chaos Monkey) is not optional — it forces resilience
- Adaptive bitrate streaming is a client-side algorithm that adapts to network conditions
Common Mistakes to Avoid
- Single database for both writes and reads
Symptom: Reading user history takes >1s during high write load; Cassandra compaction throttles reads.
Fix: Use Cassandra for writes and EVCache for reads. Derive read models via event-based CQRS. - No circuit breaker on downstream calls
Symptom: When one microservice slows, all threads are blocked, causing cascading timeouts across the system.
Fix: Wrap all remote calls with Hystrix or resilience4j. Set short timeouts (2s) and provide fallbacks. - Assuming CDN cache hit ratio is always high (99%)
Symptom: Cache miss storm during a major release causes origin overload and buffering for large regions.
Fix: Pre-warm CDN caches based on predicted popularity. Monitor per-region hit ratio and alert if it drops below 90%.
Interview Questions on This Topic
- QHow would you design the CDN cache hierarchy for Netflix?SeniorReveal
- QNetflix uses Chaos Monkey. Why is it effective, and how would you implement a similar system for a microservice architecture?SeniorReveal
- QExplain the tradeoffs between serving video chunks from a CDN vs a centralized origin.Mid-levelReveal
Frequently Asked Questions
What database does Netflix use for user data?
Cassandra, one of the largest clusters (~2000 nodes). It provides high write throughput for ingesting viewing events and a wide-row model for user profiles.
How does Netflix handle traffic spikes (e.g., new season of a hit show)?
They pre-warm CDN caches based on ML prediction of which titles will be popular. They also automatically scale microservices via AWS autoscaling and use Spinnaker for rapid deployment of additional capacity.
What is Chaos Monkey and why is it used?
Chaos Monkey is a tool that randomly terminates production instances. It forces engineering teams to build systems that survive individual failures without user impact. Over time, this has made Netflix's infrastructure extremely resilient.
How does adaptive bitrate streaming work?
The client measures download speed of video chunks and maintains a buffer. If the buffer drops below a threshold (e.g., 5 seconds), it requests a lower quality chunk. If the buffer is full, it may try to upgrade. Netflix uses custom BOLA and buffer-based algorithms to balance quality and stability.
That's Real World. Mark it forged?
5 min read · try the examples if you haven't