Senior 11 min · March 06, 2026

Live Video Streaming — Why CPU Metrics Lie on Spikes

Input bitrate spikes 3x during fast pans, causing encoder backup and viewer stutter while CPU looks normal.

N
Naren · Founder
Plain-English first. Then code. Then the interview question.
About
 ● Production Incident 🔎 Debug Guide
Quick Answer
  • Core concept: Live streaming delivers video from one source to many viewers in seconds.
  • Key component 1: Ingest captures raw video and audio from a broadcaster.
  • Key component 2: Transcoding converts the stream into multiple bitrate/resolution versions.
  • Key component 3: CDN caches and serves content from edge nodes close to viewers.
  • Performance insight: End-to-end latency targets range from 2–10 seconds for live sports to <1 second for interactive apps.
  • Production insight: A single CDN cache miss during peak load can double playback startup latency.
  • Biggest mistake: Assuming the ingest pipeline has infinite bandwidth — uplink saturation is the #1 cause of stream drops.
Plain-English First

Imagine a TV news van parked outside a stadium. The van captures the game, compresses the footage, beams it to a satellite, which fans it out to thousands of TV towers, which finally push it to millions of TVs — all in under 10 seconds. A live streaming system is exactly that van-to-TV pipeline, just built from software on commodity servers instead of broadcasting hardware. The 'hard part' isn't capturing the video — it's making sure those millions of TVs all get a smooth picture even when some viewers are on slow Wi-Fi and others are on fibre.

Every time a Twitch streamer goes live, a surgeon broadcasts a remote operation, or a stadium replays a controversial goal in real-time, an enormously complex distributed system quietly does its job. Live streaming is one of the few domains where every engineering trade-off — bandwidth, latency, consistency, cost — hits you at the same time, at scale, with zero tolerance for downtime because the event is happening right now and can never be replayed.

The core problem is a mismatch of supply and demand. One camera produces one stream. But a million viewers want to consume it simultaneously, from different continents, on devices that range from a 2015 Android phone on 3G to a 4K smart TV on gigabit fibre. You need to ingest one stream, transform it into many adaptive versions, store it for replay, distribute it globally, and do all of this with end-to-end latency measured in seconds — not minutes.

By the end of this article you'll be able to whiteboard a production-grade live streaming architecture from the broadcaster's camera all the way to a viewer's screen. You'll understand why each component exists, what breaks under load, how platforms like YouTube Live and Twitch actually solve adaptive bitrate, and what interviewers are really probing when they ask you to 'design a live streaming platform'.

Don't let the complexity scare you. The fundamental pipeline hasn't changed in a decade — what changes is how each stage handles failure at scale. That's what separates a hobby stream from a 10-million-viewer broadcast.

What is Design a Live Video Streaming System?

At its heart, live streaming is a real-time data pipeline that turns a single video source into a globally distributed, multi-format experience. The pipeline has four stages: ingest, transcode, deliver, and play. Each stage introduces its own failure modes. Ingest fails when the broadcaster's uplink drops. Transcode fails when the encoder can't keep up with motion. Delivery fails when the CDN cache misses. Playback fails when the player can't negotiate the right bitrate.

The key insight: live streaming is the only system design problem where you can't retry. A dropped frame during a live event is gone forever. That's what separates a toy demo from a production system.

But understanding the pipeline isn't enough — you also need to know how each stage interacts. A bottleneck in one stage cascades: an overloaded transcoder backs up the ingest buffer, increasing latency. A misconfigured CDN forces retries, filling the player buffer with stale segments. Treat each stage as a system that can fail independently, but the overall architecture must absorb those failures gracefully.

Here's a trap senior engineers see all the time: teams build perfect pipelines for 10,000 viewers and then hit a million. The CDN tier needs to handle flash crowds, and the transcoder tier needs to scale horizontally. Build for 10x load from day one. If you don't, your first major event will be your last.

ForgeExample.javaSYSTEM DESIGN
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
// io.thecodeforge.media.LiveStreamPipelineTheCodeForge
// Conceptual pipeline stages for a live stream
public class LiveStreamPipeline {
    public enum Stage {
        INGEST, TRANSCODE, DELIVER, PLAY
    }
    public static void main(String[] args) {
        Stage[] stages = Stage.values();
        for (Stage s : stages) {
            System.out.printf("Stage: %s — failure mode: %s%n", s, getFailureMode(s));
        }
    }
    static String getFailureMode(Stage stage) {
        return switch (stage) {
            case INGEST -> "Uplink drop";
            case TRANSCODE -> "Encoder overload";
            case DELIVER -> "Cache miss";
            case PLAY -> "Bandwidth mismatch";
        };
    }
}
Output
Stage: INGEST — failure mode: Uplink drop
Stage: TRANSCODE — failure mode: Encoder overload
Stage: DELIVER — failure mode: Cache miss
Stage: PLAY — failure mode: Bandwidth mismatch
Forge Tip:
Type this code yourself rather than copy-pasting. The muscle memory of writing it will help it stick.
Production Insight
Ingest is the single point of failure in most live setups.
Broadcaster uplink loss means black screen for all viewers.
Always design for broadcaster redundancy — dual send to two ingest regions.
Key Takeaway
Live streaming has no retry — every dropped frame is permanent.
Understand each pipeline stage's failure mode before building.
The system that works at 100 viewers breaks at 10,000.
Live Streaming vs VOD: Which Architecture?
IfContent is pre-recorded, viewers can pause/rewind
UseUse VOD — static files, no ingest latency, cheaper infrastructure.
IfContent is happening now and can't be replayed
UseUse live streaming — real-time pipeline, redundant ingest, higher cost.
IfYou need both live and on-demand from the same source
UseUse live with recording — transcode once, store segments, serve from CDN.

Ingest Pipeline: Capturing the Stream at Source

The ingest pipeline is where live streaming begins. A camera sends raw video to an encoder, which compresses it using a codec like H.264 or HEVC. The encoder packetizes the data into a transport protocol — typically RTMP (Real-Time Messaging Protocol) for push, or SRT (Secure Reliable Transport) for lossy networks. The ingest server receives this stream, validates it, and forwards it to the transcoding layer.

Most production systems use ingest clusters behind a load balancer that routes based on geolocation. The broadcaster connects to the nearest ingest endpoint to minimize latency. If the ingest server fails mid-stream, the broadcaster must reconnect — session continuity is achieved via redundant pushes to multiple ingest nodes.

One often overlooked detail: the ingest server must buffer a few seconds of content to absorb network jitter. Too little buffer and packet loss causes glitches. Too much buffer and you add latency before transcoding even starts. A typical production buffer is 2–4 seconds.

Choosing the right ingest protocol matters. RTMP is simple but has no built-in retransmission; it's fine over reliable connections but fails on public internet with packet loss. SRT adds selective retransmission, AES encryption, and congestion control — it's the default for 2026 production systems. Always test your ingest path with a synthetic stream before going live.

One production trap: using RTMP over a cellular uplink. RTMP has no retransmission; packet loss causes frame drops. SRT is mandatory for any mobile broadcaster. Many major platforms now require SRT for ingest. Don't assume your broadcaster's connection is stable — it almost never is.

Here's something most whiteboard designs miss: the ingest server should expose a health endpoint that returns the current buffer depth and packet loss ratio. If you're at 10,000 concurrent pushes, you need to know which ingest edge is about to fail before it does.

Buffer Sizing Tip
Set ingest buffer to 2 seconds minimum. Less than 1 second and packet retransmission from SRT will trigger too often, causing encoder backpressure and frame drops.
Production Insight
Ingest uplink saturation is the #1 cause of stream drops.
Monitor broadcaster's actual bitrate vs configured bitrate.
Use SRT instead of RTMP over unreliable networks — it handles packet loss with retransmission.
Key Takeaway
Ingest is the single point of failure for the entire pipeline.
Redundant pushes and SRT for lossy links are non-negotiable.
If the source stream drops, everything downstream goes black.
Choosing an Ingest Protocol
IfBroadcaster has a stable, high-bandwidth uplink (e.g., datacenter)
UseUse RTMP — low overhead, widely supported, no extra encryption overhead.
IfBroadcaster is on an unreliable network (e.g., cellular, Wi-Fi)
UseUse SRT — packet retransmission and congestion control prevent drops.
IfLatency must be under 1 second for real-time interaction
UseUse WebRTC for ingest — but note it's UDP-only, no CDN caching, requires TURN relays.

Transcoding and Adaptive Bitrate (ABR)

Transcoding converts the single high-bitrate stream into multiple renditions at different resolutions and bitrates. This enables Adaptive Bitrate Streaming (ABR) — viewers automatically switch to the best rendition based on their network conditions. The transcoder decomposes the video into short segments (2–6 seconds) and encodes each segment at multiple quality levels.

The key trade-off: more renditions improve viewer experience but increase processing cost and storage. A typical production ladder includes 6–8 variants from 240p at 400kbps to 1080p at 8mbps. Encoding parameters like GOP size, encoder preset, and rate control mode directly impact latency, quality, and CPU cost.

You'll also need to decide on codec: H.264 is universal, but HEVC (H.265) reduces bitrate by ~40% at the cost of higher encoder CPU. AV1 is emerging but too slow for live in 2026 except for pre-recorded. Most platforms use H.264 for live and offer HEVC as an option for capable devices.

One trap: using the same encoder preset for all renditions. Low-bitrate renditions (240p, 360p) suffer more from encoder noise — use a slower preset for those to keep quality acceptable. High-bitrate renditions can use faster presets. Profile-based encoding (encoding multiple renditions in one pass) can cut CPU usage by 30% but requires careful GOP alignment.

Another common mistake: not accounting for content type. A talking-head show needs less bitrate than a fast-action sports event. Use content-aware encoding presets if your encoder supports them. Some cloud transcoding services offer per-scene encoding that adapts GOP size and bitrate dynamically — it's more expensive but worth it for high-motion content.

A real production gotcha: if your encoder preset is 'fast' and you're doing sports, you'll drop frames on every sprint. We learned this the hard way during a World Cup qualifier — the encoder couldn't keep up with fast player movement, causing constant bitrate spikes. Switched to 'medium' preset for the 1080p rendition and it held. Always test your encoder at the expected motion level, not just on a static scene.

TranscodePipeline.javaSYSTEM DESIGN
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
// io.thecodeforge.media.TranscodePipelineTheCodeForge
// Demonstrates ABR ladder generation config
public class TranscodePipeline {
    public enum Resolution {
        RES_240(426, 240, 400_000),
        RES_360(640, 360, 800_000),
        RES_480(854, 480, 1_200_000),
        RES_720(1280, 720, 2_500_000),
        RES_1080(1920, 1080, 5_000_000);

        public final int width, height, bitrate;

        Resolution(int width, int height, int bitrate) {
            this.width = width;
            this.height = height;
            this.bitrate = bitrate;
        }
    }

    public static void main(String[] args) {
        for (Resolution res : Resolution.values()) {
            System.out.printf("Rendition: %dx%d at %d bps%n", res.width, res.height, res.bitrate);
        }
    }
}
Output
Rendition: 426x240 at 400000 bps
Rendition: 640x360 at 800000 bps
Rendition: 854x480 at 1200000 bps
Rendition: 1280x720 at 2500000 bps
Rendition: 1920x1080 at 5000000 bps
Production Pitfall: Locking GOP Size
Using a fixed GOP size (keyframe interval) equal to segment duration breaks ABR switching on some players. Always use an integer multiple (e.g., 2-second segments with 1-second GOP) or set GOP to segment duration to ensure clean chunk boundaries.
Production Insight
Transcoder queue depth is the real bottleneck, not CPU.
Use encoding presets that balance speed and quality: "veryfast" for production, "slow" for VOD.
Over-provision transcoder slots by 30% to handle bitrate spikes during high-motion scenes.
Key Takeaway
ABR makes streaming resilient but increases complexity.
Rendition ladder design directly impacts viewer experience and infra cost.
Always test ABR switching under load — it often fails in production.
Codec Selection for Live Streaming
IfMaximum compatibility across devices and browsers
UseUse H.264 — supported everywhere, but higher bitrate for same quality.
IfBandwidth is limited and viewers are on HEVC-capable devices
UseUse H.265 (HEVC) — 40% bitrate reduction, but higher encoder CPU cost.
IfYou need to support WebRTC sub-second latency
UseUse VP8 or VP9 — H.264 in WebRTC has licensing issues on some platforms.

Content Delivery: CDN and Edge Caching

A CDN (Content Delivery Network) distributes the transcoded segments across globally distributed edge servers. When a viewer requests the stream, the manifest points to the nearest CDN edge, which serves cached segments. If a segment is not cached (cache miss), the edge fetches it from the origin server, adding latency.

Live streaming puts a unique strain on CDNs: content is constantly updated (new segments every few seconds), and a popular event generates millions of simultaneous requests. CDNs use techniques like segment prewarm (pushing new segments to edges before viewers fetch them) and tiered caching to reduce origin load. Key metrics: cache hit ratio, origin request rate, and segment fetch time.

Don't make the mistake of treating the CDN as a black box. You need to know which edge locations serve your audience, what their cache fill latency is, and whether your manifest TTL causes excessive re-fetches. A 5-minute manifest TTL works for most live streams — shorter TTLs burn origin bandwidth.

A common oversight: geo-DNS routing. Viewers in a region with no nearby CDN edge get directed to a faraway origin, increasing latency by 100ms+. Use latency-based DNS routing instead of naive geo routing. Also, be aware that some CDNs have different pricing for live streaming; cost can spike if you don't negotiate prewarm and egress rates.

One more nuance: cache stampede. When a popular segment expires, thousands of viewers request it simultaneously from origin. Use staggered TTLs or a write-through cache layer to avoid origin thundering herd. Some CDNs offer 'origin shield' — a mid-tier cache that absorbs these bursts.

Here's a painful lesson: we once had a CDN provider that throttled origin fetches after 10,000 requests per second. Our event hit 15,000, and the origin shield wasn't enabled. Viewers in half the regions got 503s. Enable origin shield and prewarm for every event. It's cheap insurance.

cdn_prewarm.shBASH
1
2
3
4
5
6
7
8
9
10
11
12
#!/bin/bash
# io.thecodeforge.cdn.prewarm — Prewarm CDN edges for a live event
# Usage: ./cdn_prewarm.sh <origin-url> <region-list>
ORIGIN=$1
REGIONS=$2
for region in $(echo $REGIONS | tr ',' ' '); do
  echo "Prewarming $region..."
  curl -X POST "https://cdn-api.example.com/prewarm?region=$region" \
    -H "Content-Type: application/json" \
    -d '{"origin": "'$ORIGIN'", "segments": ["seg1.ts", "seg2.ts"]}'
done
echo "Prewarm complete."
Output
Prewarming us-east...
Prewarming eu-west...
...
CDN Cost Trap
Some CDNs charge by the number of origin fetches, not just egress. Prewarming and tiered caching reduce origin fetches — without them, a popular event can cause a $50k surprise bill.
Production Insight
A cache miss on a single segment causes ripple effect — viewers request it from origin, increasing load.
Prewarm CDN edges for expected audience regions before a major event.
Tiered caching with intermediate layer halves origin requests during traffic spikes.
Key Takeaway
CDN is not a black box — monitor cache hit ratio per edge.
Prewarming prevents origin overload during flash crowds.
Segment prewarm logic must respect segment expiry to avoid serving stale data.
CDN Strategy for Live Events
IfAudience is concentrated in a few regions (e.g., US only)
UsePre-select CDN edges in those regions and prewarm them. Simpler orchestration.
IfAudience is global and unpredictable
UseUse a multi-CDN approach with real-time failover. Prewarm all major regions.
IfEvent is short and low-bitrate (e.g., webinar)
UseSingle CDN is fine. Prewarm not necessary — cache-miss impact is small.

Latency vs Quality Trade-offs: When Milliseconds Matter

Live streaming latency is the time between a broadcaster's camera capturing a frame and a viewer seeing it. Different use cases have different tolerances: live sports tolerate 30-60 seconds with traditional CDN, but interactive streaming (e.g., live auctions, gaming) needs <5 seconds.

Reducing latency requires trade-offs: shorter segments (2 seconds instead of 6) increase manifest download overhead and CDN requests. Lower latency also limits buffer size, making viewers more susceptible to network jitter. Techniques like CMAF (Common Media Application Format) chunked encoding and LL-HLS (Low Latency HLS) push latency below 3 seconds but require specific player support.

Here's the reality: if your business model doesn't demand sub-5-second latency, don't build for it. The infrastructure complexity is significant. WebRTC-based streaming, while giving sub-second latency, requires relay servers (TURN) and doesn't benefit from standard CDN caching. You need to carefully justify every millisecond reduction with actual user impact.

One more nuance: latency is not uniform across viewers. A viewer on a fast network may see 2-second latency while another on a congested path sees 10 seconds — all from the same stream. Player-side buffering strategies (like catch-up logic) can compensate, but they add complexity. Always measure p95 and p99 latency, not just average.

Measuring latency properly is tricky. Don't rely on server-side timestamps alone. Inject a clock overlay in the video and compare broadcaster time vs viewer time. Run synthetic viewers that report offset. Some platforms use SCTE-35 cues to inject timestamps into the stream for automated latency measurement.

A nuance often missed: latency measurement from the server side is always optimistic. We once had a system where server-to-edge time was 2 seconds, but the CDN added 5 seconds of buffering to smooth out origin fetches. Our viewer-side latency was 12 seconds while dashboards showed 3. Always measure end-to-end from a real player.

low_latency_ffmpeg.shFFMPEG
1
2
3
4
5
6
7
8
9
10
11
12
#!/bin/bash
# io.thecodeforge.media.low_latency — Generate 2-second segments for LL-HLS
ffmpeg -i rtmp://ingest.example.com/live/stream \
  -c:v libx264 -preset veryfast -tune zerolatency \
  -g 30 -keyint_min 30 -sc_threshold 0 \
  -b:v:0 5000k -s:v:0 1920x1080 \
  -b:v:1 2500k -s:v:1 1280x720 \
  -b:v:2 1200k -s:v:2 854x480 \
  -map 0:v -map 0:a -map 0:v -map 0:a -map 0:v -map 0:a \
  -f hls -hls_time 2 -hls_list_size 10 -hls_flags independent_segments \
  -hls_segment_type mpegts \
  /var/www/live/stream.m3u8
Output
// Produces HLS with 2-second segments, three renditions, low-latency tuning
Latency Monitoring Trap
Don't rely on server-side timestamps alone. Inject a frame-accurate timestamp in the video (e.g., a clock overlay) and measure from a real viewer. CDN buffering can add 5-10 seconds of uncounted latency.
Production Insight
Short segments increase CDN origin load by 3x.
Base decision on business requirements, not engineering preferences.
Test latency under load — buffering ratios spike when segments are too short.
Key Takeaway
There's no free latency reduction — shorter segments cost more.
Match latency target to viewer expectation and infrastructure budget.
Measure end-to-end latency from broadcaster to viewer's screen, not just server-side.
Choose Your Latency Approach
IfAudience tolerance >30 seconds, standard CDN acceptable
UseUse traditional HLS with 6-second segments. Simple, reliable, lowest cost.
IfNeed 10-30 seconds, want better ABR switching
UseUse LL-HLS or CMAF with partial segments. Slightly more encoder CPU, but player experience improves.
IfRequire <5 seconds for real-time interaction
UseUse WebRTC with relay infrastructure. Higher server cost, no CDN caching, but sub-second latency.

Monitoring and Observability for Live Streams

Live streaming demands real-time monitoring across the entire pipeline. At minimum, track: ingest bitrate, encoder frame drops, transcoder queue depth, CDN cache hit ratio, segment fetch time, and viewer playback errors. Use metrics like TTLV (Time to Live Video) to measure how quickly a viewer starts playing after hitting play.

Alerting should distinguish between transient glitches and systemic failures. A single frame drop in the encoder isn't a problem — but sustained frame drops over 5 seconds indicates a bottleneck. Use distributed tracing to correlate viewer playback issues with transcoder or CDN health.

One critical monitoring gap: you need synthetic streams. Run a test stream 24/7 that goes through the full pipeline. If the synthetic stream breaks, you know the problem is platform-wide before any customer reports it. Real events produce too much noise — synthetic streams give you a clean baseline.

Another area often missed: player-side metrics. Collect playback stall rate, average bitrate switch time, and rebuffering frequency from the client. These reflect the actual viewer experience and can pinpoint issues that server-side metrics miss, like a CDN with high latency to a particular ISP.

For alerting thresholds: encoder queue depth > 50 for 10 seconds = auto-scale. CDN cache hit ratio < 90% for 2 minutes = prewarm. Viewer rebuffering rate > 5% = investigate player ABR or CDN latency. Synthetic stream fails = page the whole team.

One more practice: set up a dashboard that shows pipeline health per-stream. If you have 50 concurrent streams, you need to know which one is about to fail. We use a heatmap of encoder queue depth per stream during events. It's saved us more than once.

MetricsEndpoint.javaSYSTEM DESIGN
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
// io.thecodeforge.monitoring.LiveStreamMetricsTheCodeForge
// Simple Prometheus metrics endpoint for a live stream pipeline
import io.prometheus.client.*;
public class LiveStreamMetrics {
    static final Gauge encoderQueueDepth = Gauge.build()
        .name("transcoder_queue_depth")
        .help("Current depth of transcoder input queue")
        .register();
    static final Gauge cdnCacheHitRatio = Gauge.build()
        .name("cdn_cache_hit_ratio")
        .help("Cache hit ratio for live segments")
        .register();
    static final Counter frameDrops = Counter.build()
        .name("encoder_frame_drops_total")
        .help("Total frames dropped by encoder")
        .register();

    public static void updateMetrics(int queueDepth, double cdnHit, int drops) {
        encoderQueueDepth.set(queueDepth);
        cdnCacheHitRatio.set(cdnHit);
        frameDrops.inc(drops);
    }
}
Output
// Registered metrics: transcoder_queue_depth, cdn_cache_hit_ratio, encoder_frame_drops_total
Pro Tip: Synthetic Streams Save Your Weekend
Run a 24/7 synthetic live stream that goes through the entire pipeline — ingest to playback. If it breaks, you know the platform is down before any user complains. Use a test pattern with embedded timestamps to automatically detect latency regressions.
Production Insight
Most monitoring dashboards miss the encoder queue depth metric.
Correlate viewer error rate with transcoder queue depth — when queue depth exceeds 100, frame drops are imminent.
Set up synthetic heartbeat streams (test streams running 24/7) to detect platform issues before real events.
Key Takeaway
Monitor the pipeline end-to-end, not just individual components.
Encoder queue depth and CDN cache hit ratio are the two most under-monitored metrics.
Synthetic streams catch problems before users do.
Monitoring Alerts: What to Trigger On
IfEncoder queue depth > 50 for more than 10 seconds
UseCritical — transcoder falling behind. Auto-scale or reduce renditions.
IfCDN cache hit ratio < 90% for more than 2 minutes
UseWarning — origin may be overloaded. Prewarm edges.
IfViewer rebuffering rate > 5% for any segment
UseCritical — investigate player ABR or CDN latency.
IfSynthetic stream playback fails
UsePager duty — platform-wide issue.

Player-Side Delivery and Playback Optimization

The final mile in live streaming is the player running on the viewer's device. Even if the pipeline is perfect, a badly configured player can ruin the experience. Key player concerns: manifest fetching strategy, buffer management, and ABR logic.

Manifest files (M3U8 for HLS, MPD for DASH) list available renditions and segment URLs. Players fetch the manifest periodically — too often and they waste bandwidth, too rarely and they miss new segments. Standard practice: refresh the manifest every time before fetching the next segment, but with a caching header to avoid redundant downloads.

Buffer management is critical. A player that fills too much buffer adds latency; a player that keeps too small a buffer stutters on network jitter. Many production players use a dynamic buffer target: start with a small buffer for quick startup, then ramp up to absorb jitter. Use a 3-second buffer for low-latency streams, 10-15 seconds for standard.

ABR logic varies wildly between players. The best players consider bandwidth, buffer health, and device capabilities. One common failure: the player never switches down because the bandwidth estimation algorithm is too slow. Implement a conservative bandwidth estimator that halves on a single rebuffer event, and increases slowly on success.

Don't forget codec support. Many mobile devices can't decode HEVC in hardware at high resolutions. Always provide an H.264 baseline profile rendition for maximum compatibility. Test on real devices in the regions your audience uses.

One more thing: preload the manifest before the user hits play. Many players fetch the manifest only after user interaction, adding 1-2 seconds of startup delay. Preload the manifest and the first segment in the background for instant start.

Here's a debugging story: we once had a 20% buffering rate on iOS but not on Android. Turned out the iOS player was requesting segments with an incorrect byte range. The fix was to update the player SDK version. Always test across devices and update player SDKs regularly.

player_config.jsonJAVASCRIPT
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
{
  "player": {
    "manifestRefreshIntervalMs": 2000,
    "initialBufferTarget": 2,
    "maxBufferTarget": 15,
    "dynamicBuffer": true,
    "abr": {
      "algorithm": "bandwidth-buffer",
      "downSwitchOnRebuffer": true,
      "upSwitchBandwidthFraction": 0.85,
      "maxUpSwitchBitratePerStep": 1.5
    },
    "renditions": {
      "minBitrate": 400000,
      "maxBitrate": 8000000
    }
  }
}
Output
// Configures dynamic buffer, conservative ABR, and 400kbps minimum rendition
ABR Testing with Throttling
Use Chrome DevTools network throttling (preset: Slow 3G) to verify ABR switches down within 5 seconds and doesn't stay on a failing rendition. Many players fail this test.
Production Insight
Player-side ABR is the least monitored link in the pipeline.
A misconfigured bandwidth estimator causes 30% more rebuffers.
Always include a 240p audio-only rendition for extreme low bandwidth, and test on 3G throttled connections.
Key Takeaway
The player is the last mile — configure it carefully.
Buffer size and ABR aggressiveness trade off latency vs smoothness.
Test player behavior on real networks, not just localhost.
Player Strategy for Different Use Cases
IfViewers are on desktop with stable connections
UseUse fixed 10-second buffer, standard ABR with 4-second segments. Lower CPU overhead.
IfViewers are on mobile with variable networks
UseUse dynamic buffer (start 2s, grow to 8s), aggressive ABR downswitching. Add 240p broadcast.
IfReal-time interaction needed (<1s latency)
UseWebRTC-based playback with no buffer. Tolerate packet loss with FEC. No ABR — single bitrate.
● Production incidentPOST-MORTEMseverity: high

Transcoding Farm Overload During a Major Event

Symptom
Viewers reported periodic freezes and audio desync, but only in certain geographic regions. The ingest was fine, CDN logs showed 200s, and transcoding nodes appeared normal in CPU metrics.
Assumption
Engineers assumed the CDN was the bottleneck because viewer regions mapped to edge locations with high load.
Root cause
The transcoding farm was scaled based on average bitrate demand, but during peak action (fast camera pans) the input bitrate spiked 3x, causing the H.264 encoders to fall behind. Frame drops and PTS resets propagated through the pipeline, causing stutter on viewers regardless of CDN health.
Fix
Added per-stream bitrate monitoring and dynamic transcoding slot allocation. Switched from a fixed GOP size to adaptive GOP to smooth encoder load. Implemented a backpressure signal from transcoder to ingest to reduce frame rate during sustained spikes.
Key lesson
  • Transcoding capacity must be provisioned based on peak input bitrate, not average.
  • CPU metrics on transcoder nodes are misleading — monitor encoder queue depth and frame drop counts.
  • Use bitrate smoothing and encoder backpressure to handle spikes without dropping frames.
  • Always over-provision transcoder slots by 30% to absorb bitrate swings during high-motion content.
Production debug guideSymptom → Action guide for the most common live streaming failures5 entries
Symptom · 01
Playback stalls or buffers after 30 seconds
Fix
Check CDN cache hit ratio. Low ratio? Prewarm edge nodes for the expected audience regions. Then check manifest file TTL — too short causes repeated origin fetches.
Symptom · 02
Audio/video out of sync (desync)
Fix
Verify PTS (Presentation Timestamp) alignment in the transcoder output. Compare source and transcoded streams with ffprobe. If desync only appears after transcoding, check encoder preset — ultrafast presets often drop reference frames causing PTS drift.
Symptom · 03
Stream drops entirely for some viewers
Fix
Inspect ingest server logs for dropped packets. Check broadcaster's uplink bandwidth vs. stream bitrate. If ingest is fine, trace CDN routing — regional ISP peering may cause packet loss. Use traceroute from affected viewer IPs.
Symptom · 04
High latency (>15 seconds) but no buffering
Fix
Check segment duration in HLS/DASH manifest. Large segments (6s+) increase latency. Reduce segment size to 2–4 seconds. Also check transcoder keyframe interval — matching it to segment duration reduces decode delay.
Symptom · 05
Playback fails on mobile devices only
Fix
Verify the manifest includes a lower bitrate rendition. Mobile browsers and apps often have bandwidth detection that fails if the minimum rate is too high. Ensure at least one sub-1 Mbps rendition (e.g., 480p at 800 kbps).
★ Quick Debug Cheat Sheet: Live Streaming Playback ProblemsImmediate steps to diagnose and resolve the most common live streaming issues, from buffering to desync.
Playback stalls every 30-60 seconds
Immediate action
Check CDN cache hit ratio via metrics dashboard
Commands
curl -I https://cdn.example.com/live/manifest.m3u8
ffprobe -v error -show_entries stream=index,codec_type,codec_name -of default=nokey=1:noprint_wrappers=1 input.ts
Fix now
Force prewarm: curl -X POST https://cdn-api.example.com/prewarm -d '{"url":"https://cdn.example.com/live/*"}'
Audio/video desync detected+
Immediate action
Compare PTS of audio and video tracks using ffprobe
Commands
ffprobe -v quiet -show_entries packet=pts_time,stream_index -of csv=p=0 input.ts | head -20
ffmpeg -i input.ts -af apad -c:v copy -c:a aac -b:a 128k -shortest -f mpegts output.ts
Fix now
Restream with ts restamper: mpegtsmuxer --restamp --sync-video input.ts output.ts
Stream not loading on some devices+
Immediate action
Verify the HLS manifest contains multiple bitrate renditions
Commands
curl -s https://cdn.example.com/live/manifest.m3u8 | grep 'RESOLUTION'
ffplay -sync ext -max_delay 100000 https://cdn.example.com/live/manifest.m3u8
Fix now
Add a 480p 800kbps rendition to the encoding ladder and update the manifest
High latency despite low buffering+
Immediate action
Check segment duration in HLS manifest
Commands
curl -s https://cdn.example.com/live/manifest.m3u8 | grep '#EXTINF' | head -1
ffmpeg -i input -c:v libx264 -g 30 -sc_threshold 0 -c:a aac -f hls -hls_time 2 -hls_list_size 10 output.m3u8
Fix now
Reduce segment duration to 2 seconds in encoder, rebuild manifest
Live Streaming Protocols Comparison
ProtocolUse CaseLatencyCDN FriendlinessBrowser Support
HLSVOD and live (Apple ecosystem)6-30s (standard), 2-6s (LL-HLS)Excellent (segments cache well)Native on iOS, polyfill on others
DASHVOD and live (universal)4-20s (standard), 2-5s (low-latency)Excellent (segment-based)Requires MSE, widely supported
RTMPIngest to encoder/transcoder1-3sPoor (persistent connection)None (deprecated in Flash)
SRTIngest over lossy networks0.5-2sN/A (ingest only)None (ingest protocol)
WebRTCReal-time interactive (calling, gaming)<500msPoor (UDP, no caching)Native in modern browsers

Key takeaways

1
You now understand what Design a Live Video Streaming System is and why it exists
2
You've seen it working in a real runnable example
3
Practice daily
the forge only works when it's hot 🔥
4
Ingest is the single point of failure; always plan for source drop redundancy.
5
ABR ladder design is a cost-quality trade-off; test under real network conditions.
6
CDN prewarm and tiered caching prevent origin overload during flash crowds.
7
Latency decisions must align with business requirements, not just engineering preferences.
8
Monitor encoder queue depth and CDN cache hit ratio
these are the silent killers.
9
Player-side configuration is as critical as server-side
test on real networks.
10
Synthetic streams and proper alerting save your weekends.
11
Always test for peak motion, not just average bitrate.
12
Enable origin shield to prevent thundering herd on your origin servers.

Common mistakes to avoid

10 patterns
×

Memorising syntax before understanding the concept

Symptom
Engineers can recite protocol names but cannot explain when to use HLS vs DASH vs WebRTC.
Fix
Focus on the trade-offs: latency vs reliability vs CDN support. Build a mental model of the pipeline before diving into tools.
×

Skipping practice and only reading theory

Symptom
Unable to debug a real stream failure (e.g., stuttering) because they've never worked with actual logs or metrics.
Fix
Set up a test live stream with a small server (e.g., using FFmpeg + nginx-rtmp) and simulate failures. Theory is useless without hands-on.
×

Assuming CDN caching works the same for live as for static content

Symptom
Cache miss spikes at the start of a live event, causing origin overload.
Fix
Pre-populate CDN edges with the first few segments before the event begins. Use tiered caching with a mid-tier cache.
×

Using a fixed encoding preset without considering content type

Symptom
High-motion sports event causes frame drops on an encoder preset optimized for talking heads.
Fix
Match encoder preset to content type: use slower presets for high-motion content, but adjust GOP size to avoid excessive I-frames.
×

Not testing ABR switching under real network conditions

Symptom
Viewers on slow connections never switch to lower quality, causing buffering.
Fix
Use synthetic throttling (e.g., Chrome DevTools) to simulate various bandwidths and verify ABR logic works end-to-end.
×

Relying on single ingest server without redundancy

Symptom
Broadcaster loses connection to the ingest server — stream goes black for all viewers until reconnection.
Fix
Configure dual ingest pushes from broadcaster to two geographically separate ingest regions. Use SRT for automatic failover.
×

Over-provisioning renditions without considering viewer device mix

Symptom
High CPU/GPU cost on encoder for renditions that few viewers use (e.g., 4K for a mobile-only audience).
Fix
Analyze viewer device capabilities from player analytics. Drop renditions that serve <5% of viewers. Add them back only if needed.
×

Neglecting player-side ABR configuration

Symptom
Player never switches down during congestion, leading to constant buffering.
Fix
Configure conservative bandwidth estimation in the player. Provide at least one sub-1Mbps rendition. Test on throttled connections.
×

Using default TTL for manifest files without tuning

Symptom
Short TTL causes excessive origin requests; long TTL causes viewers to miss segments.
Fix
Set manifest TTL to 5 seconds for live streams. Use cache-control headers that match segment duration. Monitor origin load.
×

Forgetting to enable origin shield in CDN configuration

Symptom
When a popular segment expires, thousands of edge nodes fetch from origin simultaneously, causing origin overload and 503s.
Fix
Enable origin shield (or super-pop) in your CDN config. This creates a mid-tier cache that absorbs the thundering herd before it reaches your origin servers.
INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01SENIOR
You're designing a live streaming platform for a global sports event. Ou...
Q02SENIOR
Explain the difference between HLS and DASH. When would you choose one o...
Q03SENIOR
Our live stream keeps buffering after 30 seconds, but the CDN shows 100%...
Q04SENIOR
How do you measure end-to-end latency in a live streaming system?
Q05SENIOR
What factors influence the number of renditions in an ABR ladder?
Q06SENIOR
How would you debug a situation where a specific geographic region exper...
Q07SENIOR
What is the role of the manifest file in live streaming, and what happen...
Q08SENIOR
How would you design a failover mechanism for the ingest pipeline?
Q01 of 08SENIOR

You're designing a live streaming platform for a global sports event. Outline the key components and trade-offs.

ANSWER
Start with ingest: redundant pushes to multiple data centres using SRT for lossy uplinks. Then transcoding: ABR ladder with 5-7 renditions (240p to 1080p), using H.264 with 'veryfast' preset. CDN: tiered caching with edge prewarm for expected audience regions. Latency target ~15 seconds using HLS with 4-second segments. Key trade-offs: more renditions increase cost but improve cross-device experience; shorter segments reduce latency but increase CDN origin load. Monitoring: track encoder queue depth, CDN cache hit ratio, and viewer play failure rate. Use synthetic heartbeat streams to detect platform issues.
FAQ · 9 QUESTIONS

Frequently Asked Questions

01
What is the difference between live streaming and on-demand streaming?
02
Why do some streaming platforms have 30-second latency while others have <5 seconds?
03
How does adaptive bitrate (ABR) work?
04
What is a CDN and why do I need one for live streaming?
05
What protocol should I use for ingest in 2026?
06
How do I handle a transcoder fallback when a node fails?
07
What is the most common cause of playback failure on mobile devices?
08
Can I use the same CDN for both VOD and live streaming?
09
How do synthetic streams work and why are they important?
🔥

That's Real World. Mark it forged?

11 min read · try the examples if you haven't

Previous
Design a Payment System
15 / 17 · Real World
Next
Design an E-commerce Platform