Intermediate 4 min · March 06, 2026

CDN Caching — Why Your 24-Hour TTL Blocks Content Updates

A CDN purge returned success, but 200+ edge PoPs served stale images for 30 minutes.

N
Naren · Founder
Plain-English first. Then code. Then the interview question.
About
 ● Production Incident 🔎 Debug Guide
Quick Answer
  • CDN stands for Content Delivery Network — a globally distributed network of edge servers.
  • Edge servers cache static assets (images, CSS, JS) close to users to reduce latency.
  • DNS routing directs each user to the nearest edge server based on geographic location.
  • Cache TTL controls how long content stays fresh — too short increases origin load, too long causes stale content.
  • Invalidation is hard: purging all edge servers takes time due to propagation delay.
  • Biggest mistake: assuming cache headers are set correctly — one wrong header can bypass the entire CDN.
  • Use X-Cache header to verify hit/miss status — if it's missing, your CDN may be bypassed entirely.
  • Real challenge: cache key fragmentation from random query params can silently destroy hit ratios.
Plain-English First

Imagine your favourite pizza place only has one kitchen in New York. If you order from Los Angeles, your pizza travels 2,800 miles — cold and late. Now imagine that pizza place opens mini-kitchens in every major city, each stocked with the most popular pizzas ready to go. That's a CDN. Instead of every user fetching files from one distant server, a CDN places copies of your content on dozens (or hundreds) of servers worldwide, so users always get served from the kitchen closest to them.

Every second of load time costs you users. Amazon famously found that a 100ms delay costs them 1% in sales. Netflix streams to 190 countries without melting a single origin server. Both rely on the same invisible infrastructure: Content Delivery Networks. CDNs are not just a performance luxury — for any application with a global or even national audience, they're table stakes.

The problem CDNs solve is simple but brutal: physics. Data travels through fibre optic cables at roughly two-thirds the speed of light. A user in Tokyo requesting an image hosted in Frankfurt will wait 150–200ms just for the round trip — before a single byte of content is transferred. Multiply that by dozens of assets per page and you've already lost the user. A CDN collapses that distance by caching content at geographically distributed edge servers so the round trip becomes 5–20ms instead.

By the end of this article you'll understand exactly what happens from the moment a browser requests a CDN-backed URL to the moment the content arrives. You'll know the difference between origin pull and push CDNs, how cache invalidation actually works (and why it's harder than it sounds), and how to configure cache headers so your CDN behaves exactly as you intend — not randomly. You'll also walk away with the mental models that senior engineers use when debugging CDN behaviour in production.

Here's a truth most engineers miss: your CDN is only as good as the weakest link in the chain. That could be a misconfigured Vary header, a stale DNS record, or a single line of code that sets Cache-Control: private. I've seen all three take down a production deployment. The goal of this guide is to make you the person who finds those weak links before they become incidents.

What is CDN How It Works?

CDN How It Works is a core concept in CS Fundamentals. Rather than starting with a dry definition, let's see it in action and understand why it exists.

A CDN is not a single server — it's a globally distributed network of servers called edge nodes. Each edge node stores copies of your static content. When a user requests a file, the CDN routes them to the nearest edge node, reducing network round trips. The key components are: origin server (where your original files live), edge servers (cached copies), DNS routing (to find the closest edge), and cache control headers (to manage freshness).

You'll often hear about 'cache hit ratio' — that's the percentage of requests served from edge without contacting origin. Top CDNs hit 90-95% for well-configured static assets. Below 80% and you're paying for bandwidth you could offload.

In practice, CDN nodes can number in the hundreds — Cloudflare has 330+ cities, Akamai over 4000 locations. But more edges don't automatically mean better performance. The real win comes from intelligent routing and proper cache configuration. A misconfigured CDN can actually increase latency if it sends requests through unnecessary layers or fails to cache correctly.

Here's a deeper nuance: Edge selection isn't always geographic proximity. Some CDNs use latency-based routing — they measure actual response times from each PoP to the user's ISP and pick the fastest, not the closest. That can make a 10ms difference. Also, anycast routing (where the same IP is announced from multiple PoPs) can cause routing asymmetries if BGP tables change. Always validate with real user tests.

One more thing: don't assume your CDN is working just because a curl returns 200. Always check the X-Cache or CF-Cache-Status header. If you don't see it, your CDN may not even be in the path. I've seen production outages where a DNS change accidentally bypassed the CDN entirely.

Edge selection algorithms vary by provider. Cloudflare uses latency-based routing measured from each PoP to the user's ISP. Akamai uses a proprietary mapping system considering load, availability, and network conditions. Some CDNs support customer-specified routing policies like geographic affinity or ASN-based routing. Understanding which algorithm your CDN uses helps debug routing anomalies.

CdnPerformanceTest.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
package io.thecodeforge.cdn;

import java.net.HttpURLConnection;
import java.net.URL;

public class CdnPerformanceTest {
    public static void main(String[] args) throws Exception {
        String url = "https://cdn.example.com/logo.png";
        HttpURLConnection conn = (HttpURLConnection) new URL(url).openConnection();
        conn.setRequestMethod("GET");
        conn.setRequestProperty("Cache-Control", "public, max-age=3600");
        long start = System.nanoTime();
        int code = conn.getResponseCode();
        long durationMs = (System.nanoTime() - start) / 1_000_000;
        String cacheStatus = conn.getHeaderField("X-Cache");
        System.out.printf("Status: %d\nTime: %dms\nX-Cache: %s%n", code, durationMs, cacheStatus);
    }
}
Beware the Missing X-Cache Header
If you don't see an X-Cache header, the CDN might not even be in the request path. Check DNS resolution first — a misconfigured CNAME can bypass the CDN entirely.
Production Insight
A CDN's routing is as important as its caching.
Misconfigured DNS geo-IP can send users to a distant edge.
Rule: always test routing from multiple real user locations, not just synthetic cloud instances.
Use curl with a header spoofing different country IPs to verify edge selection.
For mobile users, the CDN's DNS resolver might be far from the device's actual location — check with real mobile network tests.
Key Takeaway
CDN = distributed caching + intelligent routing.
Both are needed for performance gains.
Never treat CDN as a black box — understand the path from user to edge to origin.
Check routing first when users in one region complain of slowness.
If you can't see X-Cache header, your CDN might be bypassed entirely.

How CDN Routing Works

When a user requests a CDN-backed URL, the browser first does a DNS lookup. The CDN's DNS server uses the user's IP to determine their geographic location and returns the IP of the nearest edge server. Some CDNs use anycast routing where the same IP is announced from multiple points and the internet's BGP routing chooses the closest. That's faster because it avoids an extra DNS hop. But it also means routing can drift if BGP paths change.

You can't control which edge a user hits — but you can test. Use tools like dig to see which CDN IP resolves for different DNS servers around the world. If you see a user in Brazil hitting a server in Texas, that's a routing problem worth investigating.

Edge servers don't just cache — they also terminate TLS, compress responses, and sometimes even execute edge-side includes. Every one of these features adds processing overhead, so you don't want to enable them all blindly. Measure before and after.

One production gotcha: DNS-based routing can misidentify users if their ISP uses a DNS resolver far from their actual location. Mobile users on 4G/5G may appear to be at the core network location, not their phone's location. Anycast avoids this but can cause asymmetric routing if BGP routes change.

For a deeper debugging routine: run curl -w '%{http_code} %{time_total} %{time_connect} %{time_starttransfer}' -o /dev/null -s https://yourdomain.com/file from a server in the affected region. A long time_connect suggests routing latency between user and edge. A long time_starttransfer may indicate origin response delay.

Another subtle point: CDN providers often have multiple tiers of routing — some use latency-based routing that measures real-time conditions via probes. That can shift traffic between PoPs dynamically, so a user's edge may change hour by hour. That's fine for static content but can cause issues for stateful edge compute. Plan accordingly.

Here's a practical way to test routing from multiple locations using a simple Python script:

cdn_routing_test.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
import subprocess
import json

locations = [
    ("London", "8.8.8.8"),
    ("Sydney", "1.1.1.1"),
    ("Sao Paulo", "208.67.222.222"),
]

domain = "example.com"

for city, dns in locations:
    result = subprocess.run(
        [\"dig\", \"@\" + dns, domain, \"+short\"],\n        capture_output=True, text=True\n    )\n    ip = result.stdout.strip()\n    print(f\"{city} via {dns}: {ip}\")"
      },
      "callout": {
        "type": "info",
        "title": "DNS vs Anycast Routing",
        "text": "DNS-based routing is simple but subject to resolver location. Anycast is more deterministic but can shift with BGP. Choose based on your traffic pattern: anycast for consistency, DNS for fine-grained control."
      },
      "production_insight": "DNS-based routing can be tricked by DNS resolvers far from the user.\nAnycast avoids that but introduces BGP dependency.\nAlways test from real mobile networks; cloud instances don't reflect mobile routing.\nRaise a ticket with your CDN provider if geo-IP data is inaccurate — they can adjust their database.\nFor anycast, monitor BGP announcements — a misconfiguration can blackhole traffic for entire regions.",
      "key_takeaway": "CDN routing is only as accurate as the geo-IP database.\nAlways validate routing with real-world tests.\nAnycast is more deterministic but not immune to BGP changes.\nUse `dig` from representative locations to audit which edge IPs are resolved.\nMobile users may be routed incorrectly due to carrier DNS — test with real devices."
    },
    {
      "heading": "Caching Strategies and Cache Control Headers",
      "content": "CDNs respect HTTP cache headers set by the origin. The most important is `Cache-Control: public, max-age=3600`. That tells the CDN (and browsers, unless you also set `s-maxage`) to store a copy for one hour. During that hour, the CDN serves the cached copy without contacting your origin. After TTL expires, the CDN revalidates: if the content hasn't changed, it gets a new TTL via a 304 Not Modified response.\n\nYou can also use `Etag` or `Last-Modified` headers for validation. That way even after TTL expires, the CDN can ask your origin \"Still the same?\" and save bandwidth. But the round trip still happens — so longer TTLs are better for performance.\n\nThe trick is balancing freshness and load. Too short TTL means your origin gets hammered. Too long means users see stale content. The answer is versioned URLs: change the file name or path on every deploy. Then you can set max-age to a year — the CDN never needs to revalidate.\n\nA common mistake: setting `Cache-Control: public, max-age=0` thinking it forces revalidation each time. It actually disables caching entirely plus forces revalidation on every request — worst of both worlds. Use `no-cache` if you want revalidation without storing.\n\nAnother nuance: `s-maxage` overrides `max-age` for shared caches like CDNs, while `max-age` applies to browser caches. So you can set `Cache-Control: public, max-age=3600, s-maxage=86400` — browsers cache for 1 hour, CDN for 24 hours. That's useful when you trust the CDN's invalidation more than browser cache clearing.\n\nAlso watch out for `Vary: Accept-Encoding`. That's fine — split cache keys by encoding. But `Vary: User-Agent` is dangerous. It creates a separate cache entry per browser version, destroying your hit ratio. Only use `Vary` when you absolutely must, and prefer stripping it via CDN configuration.\n\nTwo modern directives: Cache-Control: immutable tells the browser not to revalidate on reload for the TTL duration. And stale-while-revalidate allows serving stale content while fetching fresh in background. Both improve perceived performance without risking stale data.\n\nHere's a practical code snippet to set headers correctly from a Java backend:",
      "code": {
        "language": "java",
        "filename": "CacheHeaderFilter.java",
        "code": "package io.thecodeforge.cdn;\n\nimport javax.servlet.*;\nimport javax.servlet.http.HttpServletResponse;\nimport java.io.IOException;\n\npublic class CacheHeaderFilter implements Filter {\n    @Override\n    public void doFilter(ServletRequest request, ServletResponse response,\n                         FilterChain chain) throws IOException, ServletException {\n        HttpServletResponse resp = (HttpServletResponse) response;\n        // Set CDN and browser TTL: browser 1 hour, CDN 1 day\n        resp.setHeader(\"Cache-Control\", \"public, max-age=3600, s-maxage=86400\");\n        resp.setHeader(\"Vary\", \"Accept-Encoding\");\n        resp.setDateHeader(\"Expires\", System.currentTimeMillis() + 3600000L);\n        resp.setHeader(\"CDN-Cache-Control\", \"max-age=86400\");\n        chain.doFilter(request, response);\n    }\n}"
      },
      "callout": {
        "type": "warning",
        "title": "Avoid Vary: User-Agent",
        "text": "It creates dozens of cache entries per URL, killing hit ratio. Only use Vary for Accept-Encoding or when you absolutely need to differentiate by user-agent."
      },
      "decision_tree": {
        "title": "Choosing a Cache Strategy",
        "items": [
          {
            "condition": "Assets change rarely (e.g., image files for years)",
            "result": "Use versioned URLs + long max-age (e.g., 1 year). No purges needed."
          },
          {
            "condition": "Assets change on deploy but not between users (e.g., JS bundles)",
            "result": "Use versioned URL (hash in filename) + max-age=year. Purge only if security patch."
          },
          {
            "condition": "Assets change frequently and need immediate freshness (e.g., pricing)",
            "result": "Use short TTL + cache tags + purge on content update. Monitor purge propagation."
          },
          {
            "condition": "Assets are personalized (e.g., user dashboard)",
            "result": "Consider not caching (Cache-Control: no-store) or use Edge Side Includes for fragments."
          }
        ]
      },
      "production_insight": "s-maxage overrides max-age for CDN caches — use it to set a longer CDN TTL than browser TTL.\nVersioned URLs are the simplest way to achieve permanent cacheability without stale content.\nBad Cache-Control can destroy your CDN hit ratio — one header disables caching entirely.\nTest cache headers with curl -I and look for both Cache-Control and X-Cache headers.\nIf you see 'MISS' on every request, your caching policy is not working.",
      "key_takeaway": "Set Cache-Control carefully: public, max-age for browser, s-maxage for CDN.\nVersioned URLs allow max-age=31536000 — eliminate revalidation.\nAvoid Vary: User-Agent — use Accept-Encoding only.\nAlways verify with curl headers before declaring it working."
    },
    {
      "heading": "Cache Invalidation and Purge Strategies",
      "content": "Even with perfect TTLs, you'll eventually need to invalidate cached content — a security patch, a pricing update, a typo fix. CDNs provide purge APIs to remove content from all edge servers. But purging is not instant.\n\nPropagation delay is real. Akamai's purge can take 1-5 seconds, Cloudflare's up to 30 seconds. For large files, propagation can take minutes. The purge command marks the content as stale, but each edge server must fetch the new version on the next request. If a user hits an edge that hasn't received the purge signal yet, they'll get the old content.\n\nTo handle this, use cache tags (if your CDN supports them). Tag each asset with a group label (e.g., \"product-images\"). When you update product images, purge by tag instead of individual URLs. That's faster and less error-prone.\n\nAnother strategy: use versioned URLs. Instead of purging, just change the version number. Old URLs become orphaned and expire naturally via TTL. This adds complexity to your build system but eliminates purge delays entirely.\n\nIf you must purge, always verify. Use the CDN provider's API to check purge status. Also test from multiple geographic locations. Most CDNs return a surge header indicating the purge's effect.\n\nOne production scenario: you purge a URL, but your origin also has caching headers that serve stale content. Your CDN then caches the stale version again. Always flush your origin cache before purging the CDN.\n\nFor automated pipelines, integrate purge commands into your deployment script. But add a manual gate for critical assets — accidental purge of a million URLs can cause a stampede to your origin.",
      "code": {
        "language": "python",
        "filename": "cdn_purge.py",
        "code": "import requests\nimport time\n\n# Example using Fastly purge API\napi_key = \"your-api-key\"\nservice_id = \"your-service-id\"\nurl_to_purge = \"https://cdn.example.com/images/product.jpg\"\n\nheaders = {\n    \"Fastly-Key\": api_key,\n    \"Accept\": \"application/json\"\n}\n\npurge_response = requests.post(\n    f\"https://api.fastly.com/service/{service_id}/purge/{url_to_purge}\",\n    headers=headers\n)\n\nif purge_response.status_code == 200:\n    purge_id = purge_response.json().get(\"id\")\n    print(f\"Purge initiated: {purge_id}\")\n    # Poll for completion\n    status_url = f\"https://api.fastly.com/purge/{purge_id}\"\n    while True:\n        status = requests.get(status_url, headers=headers).json()\n        if status[\"status\"] == \"completed\":\n            print(\"Purge completed globally\")\n            break\n        time.sleep(1)\nelse:\n    print(f\"Purge failed: {purge_response.status_code}\")"
      },
      "callout": {
        "type": "info",
        "title": "Purge Propagation Times",
        "text": "Cloudflare: ~30 seconds. Akamai: 1-5 seconds. Fastly: <150ms. Know your provider's SLA and build verification checks accordingly."
      },
      "production_insight": "Purge propagation is not instant — verify from multiple edge locations.\nCache tags let you invalidate groups of related content in one call.\nAlways flush origin cache before purging CDN or you may re-cache stale content.\nAutomate purge in deployment pipelines but add a manual gate for bulk purges.\nMonitor purge API response times — slow purges may indicate CDN provider issues.\nUse purge API with a callback or poll for completion to avoid assuming success.",
      "key_takeaway": "Purge = mark stale, not delete — propagation takes time.\nVersioned URLs avoid purge entirely.\nUse cache tags for group invalidation.\nAlways verify purge with curl from multiple regions.\nFlush origin first, then CDN."
    },
    {
      "heading": "CDN Security: DDoS Protection and WAF",
      "content": "CDNs are often the first line of defense against DDoS attacks. By absorbing traffic at edge servers, they shield your origin from huge volumes. But not all CDN security features are equal.\n\nMost CDNs offer Web Application Firewall (WAF) rules that inspect HTTP requests for OWASP Top 10 threats — SQL injection, XSS, path traversal. But these rules can introduce false positives. A legitimate customer request might be blocked because it contains the word \"DROP\" in a parameter. Tune your WAF rules carefully and use logging-only mode initially.\n\nAnother key feature: rate limiting. You can configure the CDN to block IPs that exceed a certain request rate. This prevents brute-force attacks and API abuse. But be careful — a mobile app with many real users behind a single NAT IP can trigger rate limits. Use the CDN's advanced rate limiting that can consider headers like User-Agent or custom tokens.\n\nAlso, CDNs can terminate TLS at the edge, which offloads encryption overhead from your origin. But this means the CDN sees your decrypted traffic. If you have compliance requirements (PCI, HIPAA), you may need to use end-to-end encryption where the CDN only passes encrypted traffic through.\n\nA common mistake: assuming CDN caching also caches security headers. If you set CSP or HSTS headers on your origin, ensure the CDN forwards them. Some CDNs strip headers by default. Use curl to compare response headers from origin and edge.",
      "code": {
        "language": "shell",
        "filename": "check_security_headers.sh",
        "code": "# Compare headers from origin (bypass CDN) vs edge\n# Direct to origin\ncurl -I https://origin-server.example.com/resource | grep -E '^(content-security-policy|strict-transport-security|x-frame-options)'\n# Via CDN\ncurl -I https://cdn.example.com/resource | grep -E '^(content-security-policy|strict-transport-security|x-frame-options)'"
      },
      "callout": {
        "type": "warning",
        "title": "CDN Strips Security Headers",
        "text": "Many CDNs strip HSTS and CSP headers by default. Check your CDN's \"honor origin headers\" setting and whitelist security headers."
      },
      "production_insight": "CDN WAF can block legitimate traffic — always test in log-only mode first.\nRate limiting at edge protects origin but can break mobile users behind NAT.\nTLS termination at edge offloads encryption but exposes decrypted traffic to CDN.\nSecurity headers must be explicitly passed through — CDNs often strip them.\nDDoS absorption capacity varies: Cloudflare offers unlimited, others charge for cleanup.",
      "key_takeaway": "CDN is a security shield but not a silver bullet.\nAlways test WAF rules in log-only mode before enforcement.\nEnsure security headers are forwarded from origin.\nUnderstand your CDN's DDoS coverage limits.\nRate limit carefully — test with real user traffic patterns."
    },
    {
      "heading": "Origin Shielding and Tiered Caching",
      "content": "When a cache miss occurs at a edge PoP, the CDN requests the content from your origin. If many edges miss simultaneously (e.g., after a purge), your origin gets hammered. Origin shielding solves this by inserting a middle-tier cache.\n\nOrigin shielding works like this: all edges that miss will ask a designated shield PoP (or regional hub) for the content. The shield PoP checks its cache first; if it has it, it serves. Only if the shield also misses does it go to your origin. This drastically reduces origin load.\n\nTiered caching takes this further: a hierarchy of caches — edge -> regional -> national -> origin. Each tier adds latency but multiplies cache efficiency. For global sites, this can reduce origin traffic by 80-90%.\n\nThe trade-off: additional latency on cache misses. A miss that would have gone directly to origin now takes two hops (edge to shield to origin). But in practice, the shield is often much closer to origin than the edge is, so the penalty is minimal.\n\nConfigure shield locations based on your origin's geography. If your origin is in Frankfurt, use a shield in Frankfurt. If you have multiple origins, use multiple shields. Some CDNs auto-select the shield based on latency.\n\nMonitor shield hit ratio separately from edge hit ratio. If shield miss ratio is high, your origin is still taking too many requests. Consider increasing shield TTL or adding more shield layers.\n\nOne gotcha: shield PoPs have their own IPs. If your origin firewall allows only CDN IPs, make sure shield IPs are included. Also, for signed URLs, ensure the shield can authenticate with your origin.",
      "code": {
        "language": "python",
        "filename": "cdn_tier_analysis.py",
        "code": "# Analyze CDN logs to identify origin load patterns\nimport gzip\nfrom collections import Counter\n\ndef analyze_shield_effectiveness(log_path):\n    edge_misses = 0\n    shield_misses = 0\n    with gzip.open(log_path, 'rt') as f:\n        for line in f:\n            parts = line.split()\n            # Assuming log format: timestamp edge_ip cache_status url\n            cache_status = parts[2]  # e.g., 'HIT', 'MISS', 'SHIELD_MISS'\n            if cache_status == 'MISS':\n                edge_misses += 1\n            elif cache_status == 'SHIELD_MISS':\n                shield_misses += 1\n    print(f\"Edge misses: {edge_misses}\")\n    print(f\"Shield misses (origin hits): {shield_misses}\")\n    print(f\"Origin load reduced by {(1 - shield_misses/max(edge_misses,1))*100:.1f}%\")"
      },
      "callout": {
        "type": "mental_model",
        "title": "Origin Shielding Mindset",
        "hook": "Think of it as a single queue in front of your origin rather than hundreds of distracted customers.",
        "bullets": [
          "All edge misses converge to one shield PoP instead of hitting origin directly.",
          "Shield PoP acts as a second-level cache, absorbing repeated misses.",
          "Reduces origin spikes after purge or traffic surge.",
          "Adds one extra hop on miss but protects origin from hammering.",
          "Configure shield close to origin for minimal added latency."
        ]
      },
      "production_insight": "Origin shielding reduces origin load by 80-90% during traffic spikes.\nShield miss ratio should be below 10% — if higher, shield is ineffective.\nEnsure shield IPs are in your origin firewall whitelist.\nMonitor shield performance separately from edge performance.\nTiered caching adds latency on cold cache but is worth it for global scale.",
      "key_takeaway": "Origin shielding = single point of origin contact, not a crowd.\nTiered caching multiplies cache efficiency but adds hop latency.\nMonitor shield miss ratio as key performance indicator.\nConfigure shield location based on origin geography.\nAlways whitelist shield IPs in origin firewall."
    },
    {
      "heading": "CDN Logging and Analytics for Production Debugging",
      "content": "To troubleshoot and optimize a CDN, you need detailed logs. Most CDN providers offer access logs that record every request: timestamp, client IP, edge location, cache status (HIT/MISS), response size, and latency. Enable these logs and ship them to your analytics pipeline.\n\nAnalyze logs to find patterns: which URLs have the most misses? Which regions see the highest latency? Which user agents are causing cache fragmentation? These insights drive configuration changes.\n\nSet up dashboards for key metrics: cache hit ratio over time, origin traffic volume, top missed URLs, average TTFB by region. Alert on anomalies — for example, a sudden drop in hit ratio across all regions often means a deployment broke cache headers.\n\nOne often-overlooked metric: purge requests per day. A high purge rate indicates your caching strategy is failing — you're treating purge as a crutch instead of fixing TTLs or versioning.\n\nAlso log the `X-Cache` header from your own applications if you proxy through the CDN. That lets you correlate user-reported issues with cache status at the time.\n\nFor large-scale logs, consider using AWS Athena or Google BigQuery to query CDN logs efficiently. Raw log files can be terabytes, but columnar queries make analysis fast and cheap.",
      "code": {
        "language": "python",
        "filename": "cdn_log_parser.py",
        "code": "import gzip\nfrom collections import defaultdict\n\ndef parse_cdn_log(filepath):\n    hits = 0\n    misses = 0\n    url_misses = defaultdict(int)\n\n    with gzip.open(filepath, 'rt') as f:\n        for line in f:\n            parts = line.split()\n            cache_status = parts[6]  # assuming column 6 is X-Cache\n            url = parts[3]           # assuming column 3 is request URL\n            if cache_status == 'HIT':\n                hits += 1\n            else:\n                misses += 1\n                url_misses[url] += 1\n\n    print(f'Hit ratio: {hits/(hits+misses)*100:.1f}%')\n    print('Top 10 missed URLs:')\n    for url, count in sorted(url_misses.items(), key=lambda x: -x[1])[:10]:\n        print(f'{url}: {count}')\n\nparse_cdn_log('cdn_access.log.gz')"
      },
      "callout": {
        "type": "tip",
        "title": "Log Retention and Cost",
        "text": "CDN logs can be huge. For a site with 1M requests/day, logs can be several GB per day. Set retention to 30 days for raw logs, or aggregate metrics to save costs. Use services like Cloudflare's Logpush to stream directly to your analytics platform."
      },
      "production_insight": "CDN logs are gold for debugging but costly to store — set retention wisely.\nAlert on sudden hit ratio drops — indicates a bad deployment.\nHigh purge rate means your caching strategy is broken.\nUse columnar query engines for fast analysis of large logs.\nCorrelate X-Cache headers with user reports for targeted investigations.",
      "key_takeaway": "Enable CDN access logs immediately — they are essential for debugging.\nMonitor hit ratio, purge rate, and origin traffic proactively.\nUse tools like Athena or BigQuery for scalable log analysis.\nSet up alerts for anomalies — don't wait for users to complain.\nCorrelate backend logs with CDN logs to trace full request path."
    },
    {
      "heading": "CDN for API Caching and Dynamic Content Acceleration",
      "content": "APIs present a different caching challenge than static assets. Most API responses are dynamic — they depend on the authenticated user, query parameters, or real-time data. But even dynamic APIs can benefit from CDN caching. The trick is to cache at the right granularity: short TTL for personalized data, longer TTL for public endpoints like /products or /pricing.\n\nCDNs now offer surrogate keys (cache tags) and dynamic content optimization. You can tag responses with categories, and invalidate them selectively. Also CDNs can use Edge Side Includes (ESI) to assemble a page from cached fragments and dynamic parts served from origin.\n\nAnother technique: GraphQL CDN caching. Because GraphQL uses a single endpoint with varying queries, caching requires normalized cache keys. Some CDNs support automatic cache key generation based on query hash. Use persisted queries for maximum cacheability.\n\nFor APIs, set Cache-Control: public, s-maxage=60 for a 60-second CDN cache. That absorbs traffic spikes without serving stale data for long. For authenticated APIs, don't cache responses or use authorized edge caching with signed URLs.\n\nCommon mistake: caching API responses that include user-specific data. If one user sees another's data, that's a security incident. Always inspect the response for user-specific fields before enabling CDN cache.\n\nHere's an example of setting cache headers in a Python Flask API:",
      "code": {
        "language": "python",
        "filename": "api_cache_config.py",
        "code": "from flask import Flask, jsonify, make_response\n\napp = Flask(__name__)\n\n@app.route('/api/products')\ndef get_products():\n    # Public endpoint — cacheable\n    products = fetch_products()\n    response = make_response(jsonify(products))\n    response.headers['Cache-Control'] = 'public, s-maxage=60'\n    response.headers['Surrogate-Key'] = 'products'\n    return response\n\n@app.route('/api/orders')\ndef get_orders():\n    # Authenticated — not cacheable\n    response = make_response(jsonify(get_user_orders()))\n    response.headers['Cache-Control'] = 'no-store'\n    return response"
      },
      "callout": {
        "type": "tip",
        "title": "API Cache Granularity",
        "text": "Use s-maxage for CDN cache only (browser doesn't cache). Combine with Cache-Control: no-store for browser and s-maxage=60 for CDN. That way only the CDN caches, not the client."
      },
      "production_insight": "API caching at CDN edge can reduce origin load by 50-70%.\nBut caching personalized API responses causes data leaks — always audit response content.\nUse cache tags to invalidate API responses by resource type.\nMonitor API cache hit ratio: if below 30%, caching may not be effective.\nRule: start with short TTL (30s) and increase based on hit ratio and freshness requirements.",
      "key_takeaway": "API caching reduces origin load but requires careful granularity.\nNever cache personalized responses without authorization check.\nUse s-maxage for CDN-only caching, max-age for browser.\nSurrogate keys enable selective invalidation of dynamic content.\nTest with X-Cache header to verify caching is actually working."
    },
    {
      "heading": "CDN Cost Optimization — Understanding Your Bill",
      "content": "CDN costs can spiral if you don't monitor carefully. Most providers charge by total data transfer (egress) from edge to users, plus request counts. But there are hidden costs: origin fetch fees (when cache misses cause the CDN to pull from origin), purge API calls (sometimes metered), and advanced features like WAF or DDoS protection.\n\nKey levers to control cost:\n- Increase cache hit ratio: every percentage point saved reduces origin traffic. Target >95% for static assets.\n- Enable compression (gzip, Brotli) — reduces transfer size by 60-80%.\n- Use image optimization (WebP, AVIF) — CDN can resize and convert images on the fly, reducing bytes.\n- Set proper TTLs — longer TTLs mean fewer revalidations and lower request costs.\n- Enable origin shielding — reduces origin bandwidth by consolidating misses.\n- Monitor bandwidth by geographic region — some providers charge more for certain PoPs.\n\nAlso watch for 'surge' pricing: if your traffic spikes (e.g., viral content), some CDNs apply higher rates. Negotiate enterprise agreements if you expect spikes.\n\nCheck your bill regularly and set up cost alerts. A misconfigured asset that bypasses CDN can cost thousands overnight.",
      "code": {
        "language": "python",
        "filename": "cdn_cost_analysis.py",
        "code": "# Simple cost estimation based on CDN logs\nimport gzip\nfrom collections import defaultdict\n\ndef estimate_cost(log_path, cost_per_gb=0.085):\n    total_bytes = 0\n    with gzip.open(log_path, 'rt') as f:\n        for line in f:\n            parts = line.split()\n            # Assuming column 4 = response size in bytes\n            response_size = int(parts[4])\n            total_bytes += response_size\n    total_gb = total_bytes / (1024**3)\n    cost = total_gb * cost_per_gb\n    print(f\"Total data transfer: {total_gb:.2f} GB\")\n    print(f\"Estimated cost: ${cost:.2f}\")\n    return cost\n\ndef compare_with_optimization(log_path, improvement_factor=0.3):\n    cost = estimate_cost(log_path)\n    saved = cost * improvement_factor\n    print(f\"With 30% optimization: ${saved:.2f} savings\")\n    \ncompare_with_optimization('cdn_access.log.gz')"
      },
      "callout": {
        "type": "info",
        "title": "Hidden Cost: Origin Fetch Fees",
        "text": "Some CDNs charge for data transferred from origin to CDN (origin fetch). This can equal or exceed edge egress costs if hit ratio is low. Reducing misses directly cuts this cost."
      },
      "production_insight": "CDN bills can explode if cache hit ratio drops below 80%.\nImage optimization at the edge can cut bandwidth by 50%.\nCompression (Brotli) reduces transfer size by up to 70% — ensure both CDN and origin enable it.\nNegotiate enterprise contracts if you expect traffic spikes.\nSet billing alerts to catch cost anomalies early.",
      "key_takeaway": "CDN cost = (data transfer + request count) × cache efficiency.\nEvery % increase in hit ratio reduces cost.\nCompression and image optimization are cheapest performance gains.\nMonitor bills weekly — a misconfiguration can cost thousands.\nUse CDN analytics to identify high-cost, low-hit assets."
    }
  ]
● Production incidentPOST-MORTEMseverity: high

The Great Image Refresh Failure — 24-Hour Stale Cache Nightmare

Symptom
Product images on the homepage didn't update for a full day after a scheduled refresh. Users complained about outdated product visuals.
Assumption
The team assumed the CDN would automatically pick up new images because they were published on the origin with the same URL.
Root cause
The CDN was configured with a default TTL of 24 hours for image assets, and no cache invalidation was triggered after the upload. The origin server had new images, but the CDN edge servers never requested them because the cache was still considered valid. Even after a manual purge was issued, propagation to all 200+ edge PoPs took over 30 minutes, and the team did not verify that the purge completed globally.
Fix
1. Set a shorter TTL for frequently changing assets (e.g., 1 hour). 2. After publishing new images, issue a CDN purge API call for the affected URLs. 3. Monitor purge status via CDN provider's dashboard to confirm propagation. 4. Implement cache-busting with versioned URLs (e.g., image.png?v=2). 5. Use cache tags to group related assets and purge all at once. 6. Automate the purge verification by checking X-Cache headers on sample edge nodes after deployment.
Key lesson
  • Always know your CDN's default TTL for each asset type.
  • Assume a cached asset will stay cached until explicitly invalidated.
  • Use versioned URLs or fingerprinting to force cache refresh on content change.
  • Always verify purge propagation — don't trust the API response alone.
  • Implement monitoring for cache invalidation completion to catch partial failures.
  • Automate cache invalidation tests in your CI/CD pipeline — simulate a user request from multiple regions to confirm old content is gone.
  • Set up an alert when purge takes longer than the provider's SLA (typically 5 minutes).
Production debug guideSymptom → Action matrix for common CDN issues12 entries
Symptom · 01
User reports slow page load from a specific region
Fix
Use curl with --resolve flag to test from different edge locations. Check CDN provider's latency map.
Symptom · 02
Content is stale — users see old version of a file
Fix
Verify cache TTL headers (Cache-Control max-age). Check if purge was issued and confirm propagation via CDN provider's debug header.
Symptom · 03
Some users get 403 Forbidden or access denied
Fix
Check CDN origin configuration (IP whitelist, signed URLs). Ensure origin server allows CDN IPs. Validate authentication headers are passed correctly.
Symptom · 04
Mixed Content error in browser (HTTP vs HTTPS)
Fix
Ensure CDN forces HTTPS redirect. Check origin serves all assets over HTTPS. Update any hardcoded HTTP links in HTML.
Symptom · 05
High error rate (5xx) from CDN
Fix
Check origin server health and response times. Verify CDN origin timeout settings (default 30s may be too short). Look for origin overload or misconfigured keep-alive.
Symptom · 06
Content not cached despite Cache-Control: public
Fix
Inspect response for Set-Cookie header or Vary: Cookie — these disable CDN caching. Also check Cache-Control: no-store, no-cache, or private. Use browser dev tools or curl -I to see raw headers.
Symptom · 07
Cache hit ratio below 70%
Fix
Analyze cache keys: look for dynamic query parameters (e.g., ?t=timestamp), random session IDs in URLs, or missing Vary headers. Configure CDN to ignore irrelevant query parameters.
Symptom · 08
Unexpected high bandwidth bill
Fix
Check cache hit ratio and object size distribution. Enable CDN logging to identify uncached large files. Verify that compression (gzip/brotli) is enabled on both origin and CDN.
Symptom · 09
Purge did not take effect globally
Fix
Check CDN provider's purge propagation status. Use curl with a debug header to force cache refresh from different regions. Verify origin serves new content before re-purge.
Symptom · 10
Cache-Control headers are overwritten by CDN
Fix
Review CDN configuration for 'honor origin' settings. Some CDNs force a default TTL; configure CDN to respect origin headers or adjust origin headers to match.
Symptom · 11
Mobile users experience high latency despite CDN
Fix
Check if CDN has PoPs in the mobile user's region. Test with a real device or emulate mobile network throttling. Verify that the CDN supports HTTP/2 or HTTP/3 for multiplexing on slow connections.
Symptom · 12
CDN is serving stale content after tag-based purge
Fix
Confirm that the cache tag was correctly applied on the origin response (Surrogate-Key header). Some CDNs require the tag to be present in the response; purge by tag may silently ignore untagged objects.
★ CDN Quick Debug Cheat SheetCommands and actions to quickly diagnose CDN-related performance and caching issues.
Check if content is served from CDN or origin
Immediate action
Inspect response headers for X-Cache or CF-Cache-Status
Commands
curl -I https://example.com/image.png | grep -i 'x-cache'
curl -s -o /dev/null -w '%{http_code} %{time_total}\n' https://example.com/image.png
Fix now
If X-Cache: MISS or DYNAMIC, cache policy is not set. Add Cache-Control header with public, max-age=86400.
High latency from specific region+
Immediate action
Run traceroute to CDN edge IP
Commands
traceroute example.com (or tracert on Windows)
curl -w 'TCP handshake: %{time_connect}s\n' -o /dev/null https://example.com
Fix now
If traceroute shows many hops or high RTT, consider using a different CDN provider with PoPs in that region.
Stale content after purge+
Immediate action
Check purge confirmation via CDN provider API
Commands
Check purge request ID in provider dashboard
curl -H 'X-CDN-Purge-Status: pending' https://example.com/image.png -I | grep -i 'X-Cache'
Fix now
Some CDNs take minutes to propagate. If still stale after 10 minutes, re-issue purge and verify origin is serving new content.
Low cache hit ratio+
Immediate action
Analyze cache key uniqueness and TTL settings
Commands
curl -I https://example.com/asset.js | grep -i 'cache-control\|cf-cache-status\|x-cache'
grep 'MISS' /var/log/cdn/access.log | head -20 (or view CDN analytics dashboard)
Fix now
If many unique URLs (e.g., random query params), implement URL normalisation or versioned paths. Increase TTL for stable assets.
SSL/TLS handshake failure+
Immediate action
Test SSL connection from multiple locations
Commands
openssl s_client -connect example.com:443 -servername example.com 2>/dev/null | openssl x509 -noout -dates
curl -vI https://example.com 2>&1 | grep 'SSL connection'
Fix now
Ensure CDN edge certificate is valid and not expired. Check that origin certificate chain is complete. Use CDN-provided managed certificate for edge domains.
Cache-Control header not being respected by CDN+
Immediate action
Inspect full response headers from edge and compare with origin
Commands
curl -I https://example.com/resource | grep -i 'cache-control\|pragma\|expires'
curl -H 'CDN-Debug: true' -I https://example.com/resource 2>&1 | grep -i 'x-cache\|cf-cache-status'
Fix now
If CDN overwrites Cache-Control (e.g., set default TTL), configure CDN behavior to 'honor origin' or adjust origin headers. Also check for Vary: Accept-Encoding that might split cache keys.
High origin load despite CDN+
Immediate action
Check cache hit ratio and number of cache misses
Commands
curl -I https://example.com/asset -w '%{http_code} %{time_total}' -o /dev/null
Watch CDN analytics for 'misses' spike
Fix now
Increase TTL, enable origin shielding, or pre-warm cache for popular content.
Purge tag not working+
Immediate action
Check surrogate-key header on origin response
Commands
curl -I https://origin.example.com/asset | grep -i 'surrogate-key'
curl -I https://cdn.example.com/asset | grep -i 'surrogate-key'
Fix now
Origin must send Surrogate-Key response header. If missing, add it from backend. Verify CDN respects surrogate keys — some providers require explicit configuration.
🔥

That's Computer Networks. Mark it forged?

4 min read · try the examples if you haven't

Previous
Network Interview Questions
16 / 22 · Computer Networks
Next
ARP — Address Resolution Protocol