Intermediate 5 min · March 06, 2026

CDN Caching — Why Your 24-Hour TTL Blocks Content Updates

A CDN purge returned success, but 200+ edge PoPs served stale images for 30 minutes.

Naren Founder & Principal Engineer

20+ years shipping production systems from the metal up. Notes here come from systems that actually shipped.

✓ Production

production tested

May 24, 2026

last updated

1,554

articles · all by Naren

● Production Incident 🔎 Debug Guide ⚙ Triage Commands

⚡Quick Answer

CDN stands for Content Delivery Network — a globally distributed network of edge servers.
Edge servers cache static assets (images, CSS, JS) close to users to reduce latency.
DNS routing directs each user to the nearest edge server based on geographic location.
Cache TTL controls how long content stays fresh — too short increases origin load, too long causes stale content.
Invalidation is hard: purging all edge servers takes time due to propagation delay.
Biggest mistake: assuming cache headers are set correctly — one wrong header can bypass the entire CDN.
Use X-Cache header to verify hit/miss status — if it's missing, your CDN may be bypassed entirely.
Real challenge: cache key fragmentation from random query params can silently destroy hit ratios.

✦ Definition~90s read

What is CDN How It Works?

A CDN (Content Delivery Network) cache is a geographically distributed layer of proxy servers that store static copies of your web assets — HTML, CSS, JS, images, videos — closer to end users. When configured correctly, it reduces origin server load by 60-80% and cuts latency by 50-200ms for global audiences.

★

Imagine your favourite pizza place only has one kitchen in New York.

The core mechanism is simple: on the first request for a resource, the edge server fetches it from your origin, then holds it for a duration set by the Cache-Control header's max-age directive (your TTL). Subsequent requests within that window are served directly from the edge cache, bypassing your server entirely.

This is why a 24-hour TTL means any update you push — a CSS fix, a new product image, a JavaScript bundle — won't be visible to users until the cache expires, unless you explicitly purge or version your assets.

CDN routing, often called 'request routing' or 'anycast DNS,' is the mechanism that directs a user's browser to the nearest or fastest edge server. It works by announcing the same set of IP addresses from multiple data centers worldwide using BGP anycast — the internet's routing protocol automatically sends packets to the closest announced location.

Providers like Cloudflare, Fastly, and Akamai maintain 200+ points of presence (PoPs), and routing decisions are made in under 50ms based on real-time metrics like latency, packet loss, and server load. This is distinct from DNS-based load balancing (e.g., round-robin or geo-DNS), which can be slower and less adaptive.

CDN routing is essential for global performance but introduces a subtlety: even with a short TTL, a user's request might hit a different edge server on each visit, meaning cache state is not globally synchronized unless you use a centralized cache invalidation API or a shared tier like an origin shield.

Plain-English First

Imagine your favourite pizza place only has one kitchen in New York. If you order from Los Angeles, your pizza travels 2,800 miles — cold and late. Now imagine that pizza place opens mini-kitchens in every major city, each stocked with the most popular pizzas ready to go. That's a CDN. Instead of every user fetching files from one distant server, a CDN places copies of your content on dozens (or hundreds) of servers worldwide, so users always get served from the kitchen closest to them.

Every second of load time costs you users. Amazon famously found that a 100ms delay costs them 1% in sales. Netflix streams to 190 countries without melting a single origin server. Both rely on the same invisible infrastructure: Content Delivery Networks. CDNs are not just a performance luxury — for any application with a global or even national audience, they're table stakes.

The problem CDNs solve is simple but brutal: physics. Data travels through fibre optic cables at roughly two-thirds the speed of light. A user in Tokyo requesting an image hosted in Frankfurt will wait 150–200ms just for the round trip — before a single byte of content is transferred. Multiply that by dozens of assets per page and you've already lost the user. A CDN collapses that distance by caching content at geographically distributed edge servers so the round trip becomes 5–20ms instead.

By the end of this article you'll understand exactly what happens from the moment a browser requests a CDN-backed URL to the moment the content arrives. You'll know the difference between origin pull and push CDNs, how cache invalidation actually works (and why it's harder than it sounds), and how to configure cache headers so your CDN behaves exactly as you intend — not randomly. You'll also walk away with the mental models that senior engineers use when debugging CDN behaviour in production.

Here's a truth most engineers miss: your CDN is only as good as the weakest link in the chain. That could be a misconfigured Vary header, a stale DNS record, or a single line of code that sets Cache-Control: private. I've seen all three take down a production deployment. The goal of this guide is to make you the person who finds those weak links before they become incidents.

How CDN Caching Actually Works — And Why Your 24-Hour TTL Blocks Content Updates

A Content Delivery Network (CDN) is a globally distributed network of proxy servers that cache static and dynamic content closer to end users. The core mechanic is simple: when a user requests a resource, the CDN edge server serves a cached copy if the TTL (time-to-live) hasn't expired, otherwise it fetches a fresh copy from the origin. This reduces latency, offloads origin traffic, and improves availability. The TTL is set via HTTP headers like Cache-Control: max-age=86400 for 24 hours.

In practice, the CDN acts as a reverse proxy with a key-value cache. On a cache miss, the edge server forwards the request to the origin, caches the response, and returns it. On a cache hit, it returns the cached object without contacting the origin. The cache key is typically the full URL (including query parameters), but can be customized. Stale-while-revalidate and soft-purge allow serving stale content while fetching a new version in the background. Purge APIs invalidate cached objects by URL or tag, but propagation across all edge nodes takes seconds to minutes.

Use a CDN for any globally distributed application where latency matters — e.g., serving images, CSS, JS, API responses, or streaming video. It's essential for handling traffic spikes (like Black Friday) because the cache absorbs requests that would otherwise hit your origin servers. Without a CDN, a single server or load balancer becomes a bottleneck and single point of failure. The trade-off: you must design your caching strategy — TTL, cache keys, purge mechanisms — to balance freshness against hit rate.

TTL ≠ Content Freshness

A 24-hour TTL means users may see stale content for up to a day after you update the origin — even after a purge, because some ISPs and browsers ignore CDN cache headers.

Production Insight

A major e-commerce site pushed a CSS fix for a checkout button but forgot to purge the CDN — users saw the broken button for 6 hours during a flash sale, causing a 12% drop in conversions.

Symptom: CDN returned HTTP 200 with old content despite origin serving new content; browser DevTools showed 'from disk cache' with the old TTL.

Rule of thumb: Always set a short TTL (e.g., 5 minutes) on mutable assets and use versioned filenames (style.v2.css) for immutable ones — never rely on purge alone for time-sensitive updates.

Key Takeaway

CDN caching is a distributed key-value store with TTL-based expiry — not a real-time mirror of your origin.

Cache invalidation via purge is asynchronous and takes seconds to minutes to propagate globally — never assume instant freshness.

Design your cache strategy around TTL, cache keys, and versioning — not around purge — to avoid serving stale content under load.

thecodeforge.io

CDN Caching TTL Pitfalls

Cdn How It Works

How CDN Routing Works

When a user requests a CDN-backed URL, the browser first does a DNS lookup. The CDN's DNS server uses the user's IP to determine their geographic location and returns the IP of the nearest edge server. Some CDNs use anycast routing where the same IP is announced from multiple points and the internet's BGP routing chooses the closest. That's faster because it avoids an extra DNS hop. But it also means routing can drift if BGP paths change.

You can't control which edge a user hits — but you can test. Use tools like dig to see which CDN IP resolves for different DNS servers around the world. If you see a user in Brazil hitting a server in Texas, that's a routing problem worth investigating.

Edge servers don't just cache — they also terminate TLS, compress responses, and sometimes even execute edge-side includes. Every one of these features adds processing overhead, so you don't want to enable them all blindly. Measure before and after.

One production gotcha: DNS-based routing can misidentify users if their ISP uses a DNS resolver far from their actual location. Mobile users on 4G/5G may appear to be at the core network location, not their phone's location. Anycast avoids this but can cause asymmetric routing if BGP routes change.

For a deeper debugging routine: run curl -w '%{http_code} %{time_total} %{time_connect} %{time_starttransfer}' -o /dev/null -s https://yourdomain.com/file from a server in the affected region. A long time_connect suggests routing latency between user and edge. A long time_starttransfer may indicate origin response delay.

Another subtle point: CDN providers often have multiple tiers of routing — some use latency-based routing that measures real-time conditions via probes. That can shift traffic between PoPs dynamically, so a user's edge may change hour by hour. That's fine for static content but can cause issues for stateful edge compute. Plan accordingly.

Here's a practical way to test routing from multiple locations using a simple Python script:

cdn_routing_test.pyPYTHON

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

import subprocess
import json

locations = [
    ("London", "8.8.8.8"),
    ("Sydney", "1.1.1.1"),
    ("Sao Paulo", "208.67.222.222"),
]

domain = "example.com"

for city, dns in locations:
    result = subprocess.run(
        [\"dig\", \"@\" + dns, domain, \"+short\"],\n        capture_output=True, text=True\n    )\n    ip = result.stdout.strip()\n    print(f\"{city} via {dns}: {ip}\")"
      },
      "callout": {
        "type": "info",
        "title": "DNS vs Anycast Routing",
        "text": "DNS-based routing is simple but subject to resolver location. Anycast is more deterministic but can shift with BGP. Choose based on your traffic pattern: anycast for consistency, DNS for fine-grained control."
      },
      "production_insight": "DNS-based routing can be tricked by DNS resolvers far from the user.\nAnycast avoids that but introduces BGP dependency.\nAlways test from real mobile networks; cloud instances don't reflect mobile routing.\nRaise a ticket with your CDN provider if geo-IP data is inaccurate — they can adjust their database.\nFor anycast, monitor BGP announcements — a misconfiguration can blackhole traffic for entire regions.",
      "key_takeaway": "CDN routing is only as accurate as the geo-IP database.\nAlways validate routing with real-world tests.\nAnycast is more deterministic but not immune to BGP changes.\nUse `dig` from representative locations to audit which edge IPs are resolved.\nMobile users may be routed incorrectly due to carrier DNS — test with real devices."
    },
    {
      "heading": "Caching Strategies and Cache Control Headers",
      "content": "CDNs respect HTTP cache headers set by the origin. The most important is `Cache-Control: public, max-age=3600`. That tells the CDN (and browsers, unless you also set `s-maxage`) to store a copy for one hour. During that hour, the CDN serves the cached copy without contacting your origin. After TTL expires, the CDN revalidates: if the content hasn't changed, it gets a new TTL via a 304 Not Modified response.\n\nYou can also use `Etag` or `Last-Modified` headers for validation. That way even after TTL expires, the CDN can ask your origin \"Still the same?\" and save bandwidth. But the round trip still happens — so longer TTLs are better for performance.\n\nThe trick is balancing freshness and load. Too short TTL means your origin gets hammered. Too long means users see stale content. The answer is versioned URLs: change the file name or path on every deploy. Then you can set max-age to a year — the CDN never needs to revalidate.\n\nA common mistake: setting `Cache-Control: public, max-age=0` thinking it forces revalidation each time. It actually disables caching entirely plus forces revalidation on every request — worst of both worlds. Use `no-cache` if you want revalidation without storing.\n\nAnother nuance: `s-maxage` overrides `max-age` for shared caches like CDNs, while `max-age` applies to browser caches. So you can set `Cache-Control: public, max-age=3600, s-maxage=86400` — browsers cache for 1 hour, CDN for 24 hours. That's useful when you trust the CDN's invalidation more than browser cache clearing.\n\nAlso watch out for `Vary: Accept-Encoding`. That's fine — split cache keys by encoding. But `Vary: User-Agent` is dangerous. It creates a separate cache entry per browser version, destroying your hit ratio. Only use `Vary` when you absolutely must, and prefer stripping it via CDN configuration.\n\nTwo modern directives: Cache-Control: immutable tells the browser not to revalidate on reload for the TTL duration. And stale-while-revalidate allows serving stale content while fetching fresh in background. Both improve perceived performance without risking stale data.\n\nHere's a practical code snippet to set headers correctly from a Java backend:",
      "code": {
        "language": "java",
        "filename": "CacheHeaderFilter.java",
        "code": "package io.thecodeforge.cdn;\n\nimport javax.servlet.*;\nimport javax.servlet.http.HttpServletResponse;\nimport java.io.IOException;\n\npublic class CacheHeaderFilter implements Filter {\n    @Override\n    public void doFilter(ServletRequest request, ServletResponse response,\n                         FilterChain chain) throws IOException, ServletException {\n        HttpServletResponse resp = (HttpServletResponse) response;\n        // Set CDN and browser TTL: browser 1 hour, CDN 1 day\n        resp.setHeader(\"Cache-Control\", \"public, max-age=3600, s-maxage=86400\");\n        resp.setHeader(\"Vary\", \"Accept-Encoding\");\n        resp.setDateHeader(\"Expires\", System.currentTimeMillis() + 3600000L);\n        resp.setHeader(\"CDN-Cache-Control\", \"max-age=86400\");\n        chain.doFilter(request, response);\n    }\n}"
      },
      "callout": {
        "type": "warning",
        "title": "Avoid Vary: User-Agent",
        "text": "It creates dozens of cache entries per URL, killing hit ratio. Only use Vary for Accept-Encoding or when you absolutely need to differentiate by user-agent."
      },
      "decision_tree": {
        "title": "Choosing a Cache Strategy",
        "items": [
          {
            "condition": "Assets change rarely (e.g., image files for years)",
            "result": "Use versioned URLs + long max-age (e.g., 1 year). No purges needed."
          },
          {
            "condition": "Assets change on deploy but not between users (e.g., JS bundles)",
            "result": "Use versioned URL (hash in filename) + max-age=year. Purge only if security patch."
          },
          {
            "condition": "Assets change frequently and need immediate freshness (e.g., pricing)",
            "result": "Use short TTL + cache tags + purge on content update. Monitor purge propagation."
          },
          {
            "condition": "Assets are personalized (e.g., user dashboard)",
            "result": "Consider not caching (Cache-Control: no-store) or use Edge Side Includes for fragments."
          }
        ]
      },
      "production_insight": "s-maxage overrides max-age for CDN caches — use it to set a longer CDN TTL than browser TTL.\nVersioned URLs are the simplest way to achieve permanent cacheability without stale content.\nBad Cache-Control can destroy your CDN hit ratio — one header disables caching entirely.\nTest cache headers with curl -I and look for both Cache-Control and X-Cache headers.\nIf you see 'MISS' on every request, your caching policy is not working.",
      "key_takeaway": "Set Cache-Control carefully: public, max-age for browser, s-maxage for CDN.\nVersioned URLs allow max-age=31536000 — eliminate revalidation.\nAvoid Vary: User-Agent — use Accept-Encoding only.\nAlways verify with curl headers before declaring it working."
    },
    {
      "heading": "Cache Invalidation and Purge Strategies",
      "content": "Even with perfect TTLs, you'll eventually need to invalidate cached content — a security patch, a pricing update, a typo fix. CDNs provide purge APIs to remove content from all edge servers. But purging is not instant.\n\nPropagation delay is real. Akamai's purge can take 1-5 seconds, Cloudflare's up to 30 seconds. For large files, propagation can take minutes. The purge command marks the content as stale, but each edge server must fetch the new version on the next request. If a user hits an edge that hasn't received the purge signal yet, they'll get the old content.\n\nTo handle this, use cache tags (if your CDN supports them). Tag each asset with a group label (e.g., \"product-images\"). When you update product images, purge by tag instead of individual URLs. That's faster and less error-prone.\n\nAnother strategy: use versioned URLs. Instead of purging, just change the version number. Old URLs become orphaned and expire naturally via TTL. This adds complexity to your build system but eliminates purge delays entirely.\n\nIf you must purge, always verify. Use the CDN provider's API to check purge status. Also test from multiple geographic locations. Most CDNs return a surge header indicating the purge's effect.\n\nOne production scenario: you purge a URL, but your origin also has caching headers that serve stale content. Your CDN then caches the stale version again. Always flush your origin cache before purging the CDN.\n\nFor automated pipelines, integrate purge commands into your deployment script. But add a manual gate for critical assets — accidental purge of a million URLs can cause a stampede to your origin.",
      "code": {
        "language": "python",
        "filename": "cdn_purge.py",
        "code": "import requests\nimport time\n\n# Example using Fastly purge API\napi_key = \"your-api-key\"\nservice_id = \"your-service-id\"\nurl_to_purge = \"https://cdn.example.com/images/product.jpg\"\n\nheaders = {\n    \"Fastly-Key\": api_key,\n    \"Accept\": \"application/json\"\n}\n\npurge_response = requests.post(\n    f\"https://api.fastly.com/service/{service_id}/purge/{url_to_purge}\",\n    headers=headers\n)\n\nif purge_response.status_code == 200:\n    purge_id = purge_response.json().get(\"id\")\n    print(f\"Purge initiated: {purge_id}\")\n    # Poll for completion\n    status_url = f\"https://api.fastly.com/purge/{purge_id}\"\n    while True:\n        status = requests.get(status_url, headers=headers).json()\n        if status[\"status\"] == \"completed\":\n            print(\"Purge completed globally\")\n            break\n        time.sleep(1)\nelse:\n    print(f\"Purge failed: {purge_response.status_code}\")"
      },
      "callout": {
        "type": "info",
        "title": "Purge Propagation Times",
        "text": "Cloudflare: ~30 seconds. Akamai: 1-5 seconds. Fastly: <150ms. Know your provider's SLA and build verification checks accordingly."
      },
      "production_insight": "Purge propagation is not instant — verify from multiple edge locations.\nCache tags let you invalidate groups of related content in one call.\nAlways flush origin cache before purging CDN or you may re-cache stale content.\nAutomate purge in deployment pipelines but add a manual gate for bulk purges.\nMonitor purge API response times — slow purges may indicate CDN provider issues.\nUse purge API with a callback or poll for completion to avoid assuming success.",
      "key_takeaway": "Purge = mark stale, not delete — propagation takes time.\nVersioned URLs avoid purge entirely.\nUse cache tags for group invalidation.\nAlways verify purge with curl from multiple regions.\nFlush origin first, then CDN."
    },
    {
      "heading": "CDN Security: DDoS Protection and WAF",
      "content": "CDNs are often the first line of defense against DDoS attacks. By absorbing traffic at edge servers, they shield your origin from huge volumes. But not all CDN security features are equal.\n\nMost CDNs offer Web Application Firewall (WAF) rules that inspect HTTP requests for OWASP Top 10 threats — SQL injection, XSS, path traversal. But these rules can introduce false positives. A legitimate customer request might be blocked because it contains the word \"DROP\" in a parameter. Tune your WAF rules carefully and use logging-only mode initially.\n\nAnother key feature: rate limiting. You can configure the CDN to block IPs that exceed a certain request rate. This prevents brute-force attacks and API abuse. But be careful — a mobile app with many real users behind a single NAT IP can trigger rate limits. Use the CDN's advanced rate limiting that can consider headers like User-Agent or custom tokens.\n\nAlso, CDNs can terminate TLS at the edge, which offloads encryption overhead from your origin. But this means the CDN sees your decrypted traffic. If you have compliance requirements (PCI, HIPAA), you may need to use end-to-end encryption where the CDN only passes encrypted traffic through.\n\nA common mistake: assuming CDN caching also caches security headers. If you set CSP or HSTS headers on your origin, ensure the CDN forwards them. Some CDNs strip headers by default. Use curl to compare response headers from origin and edge.",
      "code": {
        "language": "shell",
        "filename": "check_security_headers.sh",
        "code": "# Compare headers from origin (bypass CDN) vs edge\n# Direct to origin\ncurl -I https://origin-server.example.com/resource | grep -E '^(content-security-policy|strict-transport-security|x-frame-options)'\n# Via CDN\ncurl -I https://cdn.example.com/resource | grep -E '^(content-security-policy|strict-transport-security|x-frame-options)'"
      },
      "callout": {
        "type": "warning",
        "title": "CDN Strips Security Headers",
        "text": "Many CDNs strip HSTS and CSP headers by default. Check your CDN's \"honor origin headers\" setting and whitelist security headers."
      },
      "production_insight": "CDN WAF can block legitimate traffic — always test in log-only mode first.\nRate limiting at edge protects origin but can break mobile users behind NAT.\nTLS termination at edge offloads encryption but exposes decrypted traffic to CDN.\nSecurity headers must be explicitly passed through — CDNs often strip them.\nDDoS absorption capacity varies: Cloudflare offers unlimited, others charge for cleanup.",
      "key_takeaway": "CDN is a security shield but not a silver bullet.\nAlways test WAF rules in log-only mode before enforcement.\nEnsure security headers are forwarded from origin.\nUnderstand your CDN's DDoS coverage limits.\nRate limit carefully — test with real user traffic patterns."
    },
    {
      "heading": "Origin Shielding and Tiered Caching",
      "content": "When a cache miss occurs at a edge PoP, the CDN requests the content from your origin. If many edges miss simultaneously (e.g., after a purge), your origin gets hammered. Origin shielding solves this by inserting a middle-tier cache.\n\nOrigin shielding works like this: all edges that miss will ask a designated shield PoP (or regional hub) for the content. The shield PoP checks its cache first; if it has it, it serves. Only if the shield also misses does it go to your origin. This drastically reduces origin load.\n\nTiered caching takes this further: a hierarchy of caches — edge -> regional -> national -> origin. Each tier adds latency but multiplies cache efficiency. For global sites, this can reduce origin traffic by 80-90%.\n\nThe trade-off: additional latency on cache misses. A miss that would have gone directly to origin now takes two hops (edge to shield to origin). But in practice, the shield is often much closer to origin than the edge is, so the penalty is minimal.\n\nConfigure shield locations based on your origin's geography. If your origin is in Frankfurt, use a shield in Frankfurt. If you have multiple origins, use multiple shields. Some CDNs auto-select the shield based on latency.\n\nMonitor shield hit ratio separately from edge hit ratio. If shield miss ratio is high, your origin is still taking too many requests. Consider increasing shield TTL or adding more shield layers.\n\nOne gotcha: shield PoPs have their own IPs. If your origin firewall allows only CDN IPs, make sure shield IPs are included. Also, for signed URLs, ensure the shield can authenticate with your origin.",
      "code": {
        "language": "python",
        "filename": "cdn_tier_analysis.py",
        "code": "# Analyze CDN logs to identify origin load patterns\nimport gzip\nfrom collections import Counter\n\ndef analyze_shield_effectiveness(log_path):\n    edge_misses = 0\n    shield_misses = 0\n    with gzip.open(log_path, 'rt') as f:\n        for line in f:\n            parts = line.split()\n            # Assuming log format: timestamp edge_ip cache_status url\n            cache_status = parts[2]  # e.g., 'HIT', 'MISS', 'SHIELD_MISS'\n            if cache_status == 'MISS':\n                edge_misses += 1\n            elif cache_status == 'SHIELD_MISS':\n                shield_misses += 1\n    print(f\"Edge misses: {edge_misses}\")\n    print(f\"Shield misses (origin hits): {shield_misses}\")\n    print(f\"Origin load reduced by {(1 - shield_misses/max(edge_misses,1))*100:.1f}%\")"
      },
      "callout": {
        "type": "mental_model",
        "title": "Origin Shielding Mindset",
        "hook": "Think of it as a single queue in front of your origin rather than hundreds of distracted customers.",
        "bullets": [
          "All edge misses converge to one shield PoP instead of hitting origin directly.",
          "Shield PoP acts as a second-level cache, absorbing repeated misses.",
          "Reduces origin spikes after purge or traffic surge.",
          "Adds one extra hop on miss but protects origin from hammering.",
          "Configure shield close to origin for minimal added latency."
        ]
      },
      "production_insight": "Origin shielding reduces origin load by 80-90% during traffic spikes.\nShield miss ratio should be below 10% — if higher, shield is ineffective.\nEnsure shield IPs are in your origin firewall whitelist.\nMonitor shield performance separately from edge performance.\nTiered caching adds latency on cold cache but is worth it for global scale.",
      "key_takeaway": "Origin shielding = single point of origin contact, not a crowd.\nTiered caching multiplies cache efficiency but adds hop latency.\nMonitor shield miss ratio as key performance indicator.\nConfigure shield location based on origin geography.\nAlways whitelist shield IPs in origin firewall."
    },
    {
      "heading": "CDN Logging and Analytics for Production Debugging",
      "content": "To troubleshoot and optimize a CDN, you need detailed logs. Most CDN providers offer access logs that record every request: timestamp, client IP, edge location, cache status (HIT/MISS), response size, and latency. Enable these logs and ship them to your analytics pipeline.\n\nAnalyze logs to find patterns: which URLs have the most misses? Which regions see the highest latency? Which user agents are causing cache fragmentation? These insights drive configuration changes.\n\nSet up dashboards for key metrics: cache hit ratio over time, origin traffic volume, top missed URLs, average TTFB by region. Alert on anomalies — for example, a sudden drop in hit ratio across all regions often means a deployment broke cache headers.\n\nOne often-overlooked metric: purge requests per day. A high purge rate indicates your caching strategy is failing — you're treating purge as a crutch instead of fixing TTLs or versioning.\n\nAlso log the `X-Cache` header from your own applications if you proxy through the CDN. That lets you correlate user-reported issues with cache status at the time.\n\nFor large-scale logs, consider using AWS Athena or Google BigQuery to query CDN logs efficiently. Raw log files can be terabytes, but columnar queries make analysis fast and cheap.",
      "code": {
        "language": "python",
        "filename": "cdn_log_parser.py",
        "code": "import gzip\nfrom collections import defaultdict\n\ndef parse_cdn_log(filepath):\n    hits = 0\n    misses = 0\n    url_misses = defaultdict(int)\n\n    with gzip.open(filepath, 'rt') as f:\n        for line in f:\n            parts = line.split()\n            cache_status = parts[6]  # assuming column 6 is X-Cache\n            url = parts[3]           # assuming column 3 is request URL\n            if cache_status == 'HIT':\n                hits += 1\n            else:\n                misses += 1\n                url_misses[url] += 1\n\n    print(f'Hit ratio: {hits/(hits+misses)*100:.1f}%')\n    print('Top 10 missed URLs:')\n    for url, count in sorted(url_misses.items(), key=lambda x: -x[1])[:10]:\n        print(f'{url}: {count}')\n\nparse_cdn_log('cdn_access.log.gz')"
      },
      "callout": {
        "type": "tip",
        "title": "Log Retention and Cost",
        "text": "CDN logs can be huge. For a site with 1M requests/day, logs can be several GB per day. Set retention to 30 days for raw logs, or aggregate metrics to save costs. Use services like Cloudflare's Logpush to stream directly to your analytics platform."
      },
      "production_insight": "CDN logs are gold for debugging but costly to store — set retention wisely.\nAlert on sudden hit ratio drops — indicates a bad deployment.\nHigh purge rate means your caching strategy is broken.\nUse columnar query engines for fast analysis of large logs.\nCorrelate X-Cache headers with user reports for targeted investigations.",
      "key_takeaway": "Enable CDN access logs immediately — they are essential for debugging.\nMonitor hit ratio, purge rate, and origin traffic proactively.\nUse tools like Athena or BigQuery for scalable log analysis.\nSet up alerts for anomalies — don't wait for users to complain.\nCorrelate backend logs with CDN logs to trace full request path."
    },
    {
      "heading": "CDN for API Caching and Dynamic Content Acceleration",
      "content": "APIs present a different caching challenge than static assets. Most API responses are dynamic — they depend on the authenticated user, query parameters, or real-time data. But even dynamic APIs can benefit from CDN caching. The trick is to cache at the right granularity: short TTL for personalized data, longer TTL for public endpoints like /products or /pricing.\n\nCDNs now offer surrogate keys (cache tags) and dynamic content optimization. You can tag responses with categories, and invalidate them selectively. Also CDNs can use Edge Side Includes (ESI) to assemble a page from cached fragments and dynamic parts served from origin.\n\nAnother technique: GraphQL CDN caching. Because GraphQL uses a single endpoint with varying queries, caching requires normalized cache keys. Some CDNs support automatic cache key generation based on query hash. Use persisted queries for maximum cacheability.\n\nFor APIs, set Cache-Control: public, s-maxage=60 for a 60-second CDN cache. That absorbs traffic spikes without serving stale data for long. For authenticated APIs, don't cache responses or use authorized edge caching with signed URLs.\n\nCommon mistake: caching API responses that include user-specific data. If one user sees another's data, that's a security incident. Always inspect the response for user-specific fields before enabling CDN cache.\n\nHere's an example of setting cache headers in a Python Flask API:",
      "code": {
        "language": "python",
        "filename": "api_cache_config.py",
        "code": "from flask import Flask, jsonify, make_response\n\napp = Flask(__name__)\n\n@app.route('/api/products')\ndef get_products():\n    # Public endpoint — cacheable\n    products = fetch_products()\n    response = make_response(jsonify(products))\n    response.headers['Cache-Control'] = 'public, s-maxage=60'\n    response.headers['Surrogate-Key'] = 'products'\n    return response\n\n@app.route('/api/orders')\ndef get_orders():\n    # Authenticated — not cacheable\n    response = make_response(jsonify(get_user_orders()))\n    response.headers['Cache-Control'] = 'no-store'\n    return response"
      },
      "callout": {
        "type": "tip",
        "title": "API Cache Granularity",
        "text": "Use s-maxage for CDN cache only (browser doesn't cache). Combine with Cache-Control: no-store for browser and s-maxage=60 for CDN. That way only the CDN caches, not the client."
      },
      "production_insight": "API caching at CDN edge can reduce origin load by 50-70%.\nBut caching personalized API responses causes data leaks — always audit response content.\nUse cache tags to invalidate API responses by resource type.\nMonitor API cache hit ratio: if below 30%, caching may not be effective.\nRule: start with short TTL (30s) and increase based on hit ratio and freshness requirements.",
      "key_takeaway": "API caching reduces origin load but requires careful granularity.\nNever cache personalized responses without authorization check.\nUse s-maxage for CDN-only caching, max-age for browser.\nSurrogate keys enable selective invalidation of dynamic content.\nTest with X-Cache header to verify caching is actually working."
    },
    {
      "heading": "CDN Cost Optimization — Understanding Your Bill",
      "content": "CDN costs can spiral if you don't monitor carefully. Most providers charge by total data transfer (egress) from edge to users, plus request counts. But there are hidden costs: origin fetch fees (when cache misses cause the CDN to pull from origin), purge API calls (sometimes metered), and advanced features like WAF or DDoS protection.\n\nKey levers to control cost:\n- Increase cache hit ratio: every percentage point saved reduces origin traffic. Target >95% for static assets.\n- Enable compression (gzip, Brotli) — reduces transfer size by 60-80%.\n- Use image optimization (WebP, AVIF) — CDN can resize and convert images on the fly, reducing bytes.\n- Set proper TTLs — longer TTLs mean fewer revalidations and lower request costs.\n- Enable origin shielding — reduces origin bandwidth by consolidating misses.\n- Monitor bandwidth by geographic region — some providers charge more for certain PoPs.\n\nAlso watch for 'surge' pricing: if your traffic spikes (e.g., viral content), some CDNs apply higher rates. Negotiate enterprise agreements if you expect spikes.\n\nCheck your bill regularly and set up cost alerts. A misconfigured asset that bypasses CDN can cost thousands overnight.",
      "code": {
        "language": "python",
        "filename": "cdn_cost_analysis.py",
        "code": "# Simple cost estimation based on CDN logs\nimport gzip\nfrom collections import defaultdict\n\ndef estimate_cost(log_path, cost_per_gb=0.085):\n    total_bytes = 0\n    with gzip.open(log_path, 'rt') as f:\n        for line in f:\n            parts = line.split()\n            # Assuming column 4 = response size in bytes\n            response_size = int(parts[4])\n            total_bytes += response_size\n    total_gb = total_bytes / (1024**3)\n    cost = total_gb * cost_per_gb\n    print(f\"Total data transfer: {total_gb:.2f} GB\")\n    print(f\"Estimated cost: ${cost:.2f}\")\n    return cost\n\ndef compare_with_optimization(log_path, improvement_factor=0.3):\n    cost = estimate_cost(log_path)\n    saved = cost * improvement_factor\n    print(f\"With 30% optimization: ${saved:.2f} savings\")\n    \ncompare_with_optimization('cdn_access.log.gz')"
      },
      "callout": {
        "type": "info",
        "title": "Hidden Cost: Origin Fetch Fees",
        "text": "Some CDNs charge for data transferred from origin to CDN (origin fetch). This can equal or exceed edge egress costs if hit ratio is low. Reducing misses directly cuts this cost."
      },
      "production_insight": "CDN bills can explode if cache hit ratio drops below 80%.\nImage optimization at the edge can cut bandwidth by 50%.\nCompression (Brotli) reduces transfer size by up to 70% — ensure both CDN and origin enable it.\nNegotiate enterprise contracts if you expect traffic spikes.\nSet billing alerts to catch cost anomalies early.",
      "key_takeaway": "CDN cost = (data transfer + request count) × cache efficiency.\nEvery % increase in hit ratio reduces cost.\nCompression and image optimization are cheapest performance gains.\nMonitor bills weekly — a misconfiguration can cost thousands.\nUse CDN analytics to identify high-cost, low-hit assets."
    }
  ]

Why Your DNS Isn't the Only Thing Routing Requests

Most devs think CDN routing is just DNS geolocation. It's not. When you request a file, your ISP's DNS resolver returns an IP based on your region — that's true. But the CDN's real routing happens at Layer 4 and 7. The edge server that gets your request doesn't just serve cached files. It runs a health check on every upstream path. If the nearest POP is under DDoS or has a failing disk, it routes you to the next-closest node — in milliseconds. That's why you see 30ms latency jumps during a regional outage: the network is actively failing over. The key: your request never hits the origin unless all edge nodes miss. That's why you can survive a datacenter fire. The CDN is a self-healing mesh. Treat it like one. Don't hardcode a single edge IP. That breaks the entire failover model.

check-cdn-routing.shBASH

// io.thecodeforge
# Example: trace the path your request takes
# Shows how CloudFront routes through edge POPs
curl -v -o /dev/null -s https://cdn.thecodeforge.io/assets/bundle.js \
    -w "Connected to %{remote_ip}:%{remote_port}\n" \
    --resolve 'cdn.thecodeforge.io:443:52.84.120.1' 2>&1 | grep -E "(Connected|HTTP/2|server)"

# Output:
# * Connected to cdn.thecodeforge.io (52.84.120.1) port 443 (#0)
# * HTTP/2 200
# * server: CloudFront

Output

* Connected to cdn.thecodeforge.io (52.84.120.1) port 443 (#0)

* HTTP/2 200

* server: CloudFront

Production Trap:

If you pin a static IP in your security group, you're bypassing the CDN's failover. Use DNS names or the CDN's origin-facing IP ranges (published per provider).

Key Takeaway

CDN routing is a live traffic cop, not a static map. Always resolve hostnames at request time.

Caching Is a Contract — You're Probably Breaking It

Your 24-hour TTL says 'this content is immutable for a day.' But you're updating that JS bundle every sprint. The CDN doesn't know that. It holds the old version until the TTL expires. That's not a bug — it's contract law. The Cache-Control header is a promise between you and the edge. If you set max-age=86400, you are legally obligated to not change the content for 24 hours. Break it? Users get stale assets. Fix: use content-addressed URLs. Hash your filenames. bundle.a1b2c3.js becomes unique per build. Now you can set a year-long TTL. Old versions expire naturally. The CDN never serves stale content. One pattern: include a version hash in the path — /v2/assets/ — not query strings. CDNs treat query params differently; some bypass cache entirely. Versioned paths are deterministic. Your cache hit ratio goes from 60% to 95%.

cdn-cache-buster.goGO

// io.thecodeforge
package main

import (
	"crypto/sha256"
	"fmt"
	"io/fs"
	"os"
	"path/filepath"
)

// HashBuildAssets creates unique filenames for cache-busting
func HashBuildAssets(root string) error {
	return filepath.WalkDir(root, func(path string, d fs.DirEntry, err error) error {
		if err != nil || d.IsDir() {
			return err
		}
		data, err := os.ReadFile(path)
		if err != nil {
			return err
		}
		hash := fmt.Sprintf("%x", sha256.Sum256(data))[:12]
		ext := filepath.Ext(path)
		newName := fmt.Sprintf("%s.%s%s", path[:len(path)-len(ext)], hash, ext)
		os.Rename(path, newName)
		fmt.Printf("Renamed: %s -> %s\n", path, newName)
		return nil
	})
}

func main() {
	if err := HashBuildAssets("./dist"); err != nil {
		panic(err)
	}
}

Output

Renamed: ./dist/app.js -> ./dist/app.3f7a2b1c0d9e.js

Renamed: ./dist/style.css -> ./dist/style.8f4e2d1a0b3c.css

Production Trap:

Most CDNs ignore Cache-Control when you use query-string versioning (e.g., ?v=2). Always version by path. Check your CDN's docs on 'cache key behavior' before deploying.

Key Takeaway

Hash your filenames. Set max-age to one year. Never change content under a static URL.

● Production incidentPOST-MORTEMseverity: high

The Great Image Refresh Failure — 24-Hour Stale Cache Nightmare

Symptom

Product images on the homepage didn't update for a full day after a scheduled refresh. Users complained about outdated product visuals.

Assumption

The team assumed the CDN would automatically pick up new images because they were published on the origin with the same URL.

Root cause

The CDN was configured with a default TTL of 24 hours for image assets, and no cache invalidation was triggered after the upload. The origin server had new images, but the CDN edge servers never requested them because the cache was still considered valid. Even after a manual purge was issued, propagation to all 200+ edge PoPs took over 30 minutes, and the team did not verify that the purge completed globally.

Fix

1. Set a shorter TTL for frequently changing assets (e.g., 1 hour). 2. After publishing new images, issue a CDN purge API call for the affected URLs. 3. Monitor purge status via CDN provider's dashboard to confirm propagation. 4. Implement cache-busting with versioned URLs (e.g., image.png?v=2). 5. Use cache tags to group related assets and purge all at once. 6. Automate the purge verification by checking X-Cache headers on sample edge nodes after deployment.

Key lesson

Always know your CDN's default TTL for each asset type.
Assume a cached asset will stay cached until explicitly invalidated.
Use versioned URLs or fingerprinting to force cache refresh on content change.
Always verify purge propagation — don't trust the API response alone.
Implement monitoring for cache invalidation completion to catch partial failures.
Automate cache invalidation tests in your CI/CD pipeline — simulate a user request from multiple regions to confirm old content is gone.
Set up an alert when purge takes longer than the provider's SLA (typically 5 minutes).

Production debug guideSymptom → Action matrix for common CDN issues12 entries

Symptom · 01

User reports slow page load from a specific region

→

Fix

Use curl with --resolve flag to test from different edge locations. Check CDN provider's latency map.

Symptom · 02

Content is stale — users see old version of a file

→

Fix

Verify cache TTL headers (Cache-Control max-age). Check if purge was issued and confirm propagation via CDN provider's debug header.

Symptom · 03

Some users get 403 Forbidden or access denied

→

Fix

Check CDN origin configuration (IP whitelist, signed URLs). Ensure origin server allows CDN IPs. Validate authentication headers are passed correctly.

Symptom · 04

Mixed Content error in browser (HTTP vs HTTPS)

→

Fix

Ensure CDN forces HTTPS redirect. Check origin serves all assets over HTTPS. Update any hardcoded HTTP links in HTML.

Symptom · 05

High error rate (5xx) from CDN

→

Fix

Check origin server health and response times. Verify CDN origin timeout settings (default 30s may be too short). Look for origin overload or misconfigured keep-alive.

Symptom · 06

Content not cached despite Cache-Control: public

→

Fix

Inspect response for Set-Cookie header or Vary: Cookie — these disable CDN caching. Also check Cache-Control: no-store, no-cache, or private. Use browser dev tools or curl -I to see raw headers.

Symptom · 07

Cache hit ratio below 70%

→

Fix

Analyze cache keys: look for dynamic query parameters (e.g., ?t=timestamp), random session IDs in URLs, or missing Vary headers. Configure CDN to ignore irrelevant query parameters.

Symptom · 08

Unexpected high bandwidth bill

→

Fix

Check cache hit ratio and object size distribution. Enable CDN logging to identify uncached large files. Verify that compression (gzip/brotli) is enabled on both origin and CDN.

Symptom · 09

Purge did not take effect globally

→

Fix

Check CDN provider's purge propagation status. Use curl with a debug header to force cache refresh from different regions. Verify origin serves new content before re-purge.

Symptom · 10

Cache-Control headers are overwritten by CDN

→

Fix

Review CDN configuration for 'honor origin' settings. Some CDNs force a default TTL; configure CDN to respect origin headers or adjust origin headers to match.

Symptom · 11

Mobile users experience high latency despite CDN

→

Fix

Check if CDN has PoPs in the mobile user's region. Test with a real device or emulate mobile network throttling. Verify that the CDN supports HTTP/2 or HTTP/3 for multiplexing on slow connections.

Symptom · 12

CDN is serving stale content after tag-based purge

→

Fix

Confirm that the cache tag was correctly applied on the origin response (Surrogate-Key header). Some CDNs require the tag to be present in the response; purge by tag may silently ignore untagged objects.

★ CDN Quick Debug Cheat SheetCommands and actions to quickly diagnose CDN-related performance and caching issues.

Check if content is served from CDN or origin−

Immediate action

Inspect response headers for X-Cache or CF-Cache-Status

Commands

curl -I https://example.com/image.png | grep -i 'x-cache'

curl -s -o /dev/null -w '%{http_code} %{time_total}\n' https://example.com/image.png

Fix now

If X-Cache: MISS or DYNAMIC, cache policy is not set. Add Cache-Control header with public, max-age=86400.

High latency from specific region+

Stale content after purge+

Low cache hit ratio+

SSL/TLS handshake failure+

Cache-Control header not being respected by CDN+

High origin load despite CDN+

Purge tag not working+

Naren Founder & Principal Engineer

20+ years shipping production systems from the metal up. Notes here come from systems that actually shipped.

✓ Verified

production tested

May 24, 2026

last updated

1,554

articles · all by Naren

🔥

That's Computer Networks. Mark it forged?

5 min read · try the examples if you haven't