Intermediate 5 min · March 05, 2026

Availability vs Reliability — Why 5 Nines Broke Checkout

Q: What is the difference between availability and reliability in simple terms?

Availability is whether the system is ON and reachable — like a light switch. Reliability is whether the system returns the correct result — like a vending machine that gives you the right snack. A system can be available but unreliable (on but returning errors) or reliable but unavailable (offline but never corrupting data). In production you need both.

Q: How many nines of availability do most production systems target?

The baseline is 99.99% (four nines) — 52.6 minutes of downtime per year. That's achievable with careful redundancy and automated failover. Five nines (99.999% — 5.26 minutes/year) is the gold standard for critical systems but comes with exponentially higher cost and complexity. Anything above that (six nines) is usually marketing — it's technically possible but rarely worth the investment.

Q: Can a system be reliable but not available?

Yes. A system that is down cannot serve requests, so it is not available. But the data it holds may be perfectly consistent and correct. Example: a replica database that has been shut down for maintenance. The data is in a reliable state (no corruption), but the system is not available. In production, you'd bring it back online to restore availability.

Q: What is an error budget and how does it relate to reliability?

An error budget is the amount of unreliability your system is allowed within a given period. For a 99.9% SLO, the error budget is 0.1% of total requests (or time). Once the budget is consumed, the team stops shipping new features and focuses on reliability improvements. Error budgets align product velocity with reliability — you can't ship fast if your system keeps breaking.

Q: How do you measure reliability with synthetic transactions?

Synthetic transactions are scripted API calls that simulate real user behaviour — e.g., 'add item to cart, checkout, confirm order.' They validate not just the HTTP status but also the content of the response. If the order confirmation page shows a wrong total, the transaction fails. Synthetic transactions run on a schedule (every 60 seconds) from multiple geographic locations. They catch reliability failures that would otherwise go unnoticed until users report them.

TCP health checks passed while 10% of users hit checkout errors.

Naren Founder & Principal Engineer

20+ years shipping large-scale distributed systems. Lessons pulled from things that broke in production.

✓ Production

production tested

July 18, 2026

last updated

2,466

articles · all by Naren

Before you start⏱ 25 min

✓Solid grasp of fundamentals
✓Comfortable reading code examples
✓Basic production concepts

● Production Incident 🔎 Debug Guide ⚙ Triage Commands

⚡Quick Answer

Availability: Is the system responding? Measured by uptime and nines.
Reliability: Is the system returning correct results? Measured by error rate and correctness.
Performance insight: 99.99% availability allows 52.6 minutes of downtime per year; reliability failures often hide within that window.
Production insight: A system can be 99.999% available but still corrupt data silently — availability doesn't guarantee correctness.
Biggest mistake: Treating availability and reliability as interchangeable — they optimize for different failure modes.

✦ Definition~90s read

What is Availability and Reliability?

Availability measures whether a system is up and reachable. It's a binary property: the system either responds to requests or it doesn't. Engineers track availability as a percentage of uptime over a given period — typically a month or a year.

★

Imagine a vending machine in your school hallway.

Uptime is calculated using this formula:

Availability = (Total Time – Downtime) / Total Time × 100%

Nines are a shorthand: 99% (two nines) means ~3.65 days of downtime per year. 99.999% (five nines) means ~5.26 minutes. Each extra nine costs exponentially more in infrastructure and operational complexity.

Production systems aim for four nines (99.99% – 52.6 minutes/year) as a baseline. Five nines is the gold standard for critical financial or healthcare systems. Anything above that is usually marketing fluff — achieving six nines (99.9999% – 31.5 seconds/year) requires fully redundant, geographically distributed infrastructure and near-instant failover.

Plain-English First

Imagine a vending machine in your school hallway. Availability is whether the machine is ON and ready when you walk up to it — does it respond? Reliability is whether it actually gives you the right snack every time without jamming or eating your money. A machine can be 'on' (available) but still mess up your order (unreliable). You want both — a machine that's always on AND always gets it right. That's exactly what engineers mean when they talk about these two ideas in software systems.

Every time you open Netflix, tap 'Pay' on your phone, or check your bank balance, you're silently depending on someone's backend system to be up and working correctly. When those systems fail — even for seconds — real money is lost, real users churn, and real engineers get paged at 3am. Availability and reliability are the two most fundamental promises a system makes to its users, and understanding the difference between them is what separates junior engineers from architects who design systems that actually survive the real world.

The problem is that most teams treat availability and reliability as the same thing and optimize for only one. They slap a load balancer in front of their app, call it 'highly available', and ship it — only to discover their distributed system now silently returns wrong data under load, or drops 0.3% of transactions without anyone noticing for weeks. High availability without reliability is a liar's guarantee. Your system is 'up', but it's quietly betraying your users.

By the end of this article you'll be able to calculate availability from nines, explain the difference between availability and reliability in a system design interview without freezing, identify the architectural patterns that serve each goal, and spot the common trade-offs that teams get wrong when they chase one metric at the expense of the other. Let's build the mental model from the ground up.

What Is Availability?

Uptime is calculated using this formula:

Availability = (Total Time – Downtime) / Total Time × 100%

io/thecodeforge/AvailabilityCalculator.javaJAVA

package io.thecodeforge;

import java.time.Duration;
import java.time.LocalDateTime;

public class AvailabilityCalculator {

    /**
     * Computes availability percentage for a given period and downtime.
     * @param totalPeriod total time of observation e.g., 365 days in milliseconds
     * @param downtimeMs total downtime in milliseconds
     * @return availability as a percentage
     */
    public static double availability(long totalPeriod, long downtimeMs) {
        if (totalPeriod <= 0) throw new IllegalArgumentException("totalPeriod must be > 0");
        long uptime = totalPeriod - downtimeMs;
        return (double) uptime / totalPeriod * 100.0;
    }

    public static void main(String[] args) {
        long yearMs = Duration.ofDays(365).toMillis();
        long fiveMinMs = Duration.ofMinutes(5).toMillis();
        double avail = availability(yearMs, fiveMinMs);
        System.out.printf("5 minutes downtime per year gives %.5f%% availability%n", avail);
    }
}

Output

5 minutes downtime per year gives 99.99905% availability

🔥Mental Model: The Light Switch

Think of availability like a light switch. When you flip it, the light either turns on or it doesn't. That's availability. It doesn't care if the light is dim or flickering — just whether it's on. In production, a server can be 'on' but returning wrong data — that's where reliability comes in.

📊 Production Insight

A TCP health check will pass even if the application is deadlocked.

To truly measure availability, use application-layer health endpoints.

Rule: every health check must exercise at least one downstream dependency.

🎯 Key Takeaway

Availability = uptime percentage.

Nines are a shorthand for downtime budgets.

Remember: availability tells you nothing about correctness.

thecodeforge.io

Availability Reliability

What Is Reliability?

Reliability measures whether the system produces the correct output. A system can be 100% available — responding to every request — yet be 0% reliable if every response is wrong. Reliability is probabilistic: we usually talk about the probability that the system returns the correct result for a given request.

Common reliability metrics

Error rate: ratio of failed requests to total requests.
Latency distribution: tail latencies (p99, p999) matter more than averages.
Data integrity: checksum mismatches, corruption rates.
Correctness under failure: tolerance of Byzantine faults.

Reliability is harder to guarantee than availability because it requires ensuring every component in the request path behaves correctly under all conditions — including partial failures, network partitions, and concurrent modifications.

Mental Model

Mental Model: The Vending Machine

Availability means the machine has power and accepts payments. Reliability means it gives you exactly the snack you selected and returns the correct change.

Available: machine is on, touchscreen responsive.
Reliable: pressing A3 gives you a Snickers, not a bag of chips or nothing.
An available but unreliable machine eats your money — users hate it.
A reliable but unavailable machine sits dark — users can't use it.
Production systems must be both: available to accept traffic, reliable to serve correct data.

📊 Production Insight

Error rate SLIs can mask silent data corruption — a request may return HTTP 200 with wrong data.

Instrument every data path with checksums and validation to catch silent failures.

Rule: never trust an optimistic error rate; always measure correctness with synthetic tests.

🎯 Key Takeaway

Reliability = correctness under load.

It's probabilistic, not binary.

Measure error rates AND data integrity — they're not the same thing.

How Availability and Reliability Relate — But Differ

Availability and reliability are two orthogonal axes. A system can be available and reliable (happy path), unavailable (off), available but unreliable (silently corrupt), or unavailable but reliable (correct data but inaccessible).

In distributed systems, they interact

CAP theorem: Network partitions force a trade-off between availability and consistency (a form of reliability).
Circuit breakers: When a dependency is unreliable, you can sacrifice availability (returning a cached or degraded response) to preserve overall system reliability.
Retries: They improve reliability by recovering from transient failures, but too many retries can degrade availability (thundering herd).

Senior engineers design systems with explicit availability targets and reliability targets — and they know which one to sacrifice when something breaks.

The most expensive production incidents often occur when teams optimised for availability at the expense of reliability: the system stayed up, but served corrupted data to thousands of users before anyone noticed.

⚠ Common Trap

Don't confuse 'the system is up' with 'the system is working.' An available system that returns wrong data is worse than an unavailable system — because you don't know you should be fixing it until users complain.

📊 Production Insight

During a partial network partition, one side may still be 'available' but miss updates — causing stale reads.

If your reliability SLI only checks 200 status, you'll miss the stale-data failure.

Rule: every reliability SLI must validate the response payload, not just the HTTP status.

🎯 Key Takeaway

Availability and reliability are independent axes.

CAP forces a trade-off during partitions.

Know which one to sacrifice when things break — your design should make that choice explicit.

When to Prioritise Availability vs Reliability

IfYour system serves live transactions (payments, orders)

→

UsePrioritise reliability: a wrong charge or shipment is worse than a brief outage.

IfYour system serves cached content (CDN, news feed)

→

UsePrioritise availability: stale content is acceptable; a blank page is not.

IfYou're designing a control-plane API (Kubernetes, deployment)

→

UseAvailability first: operators can retry if a command fails, but they can't work if the API is down.

IfYou're building a real-time collaboration tool

→

UseBoth matter equally. Partial failures cause subtle conflicts (reliability) and downtime causes user frustration (availability).

thecodeforge.io

Availability Reliability

Measuring Availability: Nines and Budgets

Availability is calculated from uptime. The classic formula:

Availability = (AGREED_UPTIME – DOWNTIME) / AGREED_UPTIME

'Agreed uptime' is typically the period your SLA covers — often 30 or 365 days. Downtime is any period where the service was not reachable by users.

nines example table: | Nines | Availability % | Downtime per year | |-------|----------------|--------------------| | 1 | 90% | 36.5 days | | 2 | 99% | 3.65 days | | 3 | 99.9% | 8.76 hours | | 4 | 99.99% | 52.6 minutes | | 5 | 99.999% | 5.26 minutes | | 6 | 99.9999% | 31.5 seconds |

Measuring correctly requires defining what counts as 'down.' Do you start the clock when the first user reports an issue, when monitoring alerts, or when the load balancer marks the instance unhealthy? Each choice changes the number.

Senior teams define availability measurement in their incident response playbook: clear start and stop conditions for downtime clock, and how partial degradation is counted.

🔥Gotcha: Counting Partial Degradation

If your service runs on 10 instances and 1 fails, is that 10% downtime? Most SLAs consider it 'degraded' but not 'down' unless the degraded throughput exceeds a threshold (e.g., >5% error rate). Define this in your SLO to avoid disputes.

📊 Production Insight

Teams often overcount availability by excluding scheduled maintenance from downtime calculations.

That's fine for internal SLOs, but users don't care if your downtime was 'planned.'

Rule: for external SLAs, include all downtime — planned or unplanned.

🎯 Key Takeaway

Uptime formula is simple — but defining 'down' is the hard part.

Decide measurement criteria before an incident.

Remember: availability is a binary measure of reachability, not health.

Measuring Reliability: SLIs, SLOs, and Error Budgets

Reliability measurement starts with Service Level Indicators (SLIs) — concrete metrics like request error rate, latency percentiles, or data freshness. Each SLI has a target Service Level Objective (SLO), e.g., '99.9% of requests return a correct response within 200ms.'

An error budget is the amount of unreliability you're allowed. For a 99.9% SLO (0.1% error budget) over 30 days, you can have about 43 minutes of errors. Once the budget is spent, you stop shipping features and focus on reliability.

Common reliability SLIs

Request success rate: (200 responses / total requests)
Latency SLO: % of requests under threshold (e.g., p99 < 500ms)
Data integrity: checksum mismatch rate
Freshness: time since last update for a data source

Reliability is harder to measure because you need sample payload validation, not just HTTP status. Many teams fake reliability by counting 200s as 'success' — but a 200 with wrong data is a failure. Real reliability measurement requires end-to-end synthetic transactions that validate response correctness.

Mental Model

Mental Model: The Error Budget as a Battery

Think of your error budget like a battery that charges at the start of each month. Every incident drains it. When it's empty, you stop adding new features and fix the reliability debt.

Error budget = 1 - SLO target (e.g., 0.1% if SLO is 99.9%)
Total allowed errors per month: request count × error budget
When budget is spent: rollback risky changes, invest in testing, add circuit breakers.
If you never spend your budget, you're probably over-engineering (too expensive).
If you frequently burn through it, your system quality needs a structural fix.

📊 Production Insight

Many teams only measure error rate on the critical path, ignoring background jobs or data pipelines.

A cron job that silently corrupts a database is a reliability failure that won't show in request error rates.

Rule: instrument every service boundary — not just customer-facing endpoints.

🎯 Key Takeaway

Reliability SLIs must validate correctness, not just HTTP status.

Error budgets decide when to stop shipping and start fixing.

A good SLO balances reliability cost against innovation velocity.

thecodeforge.io

Availability Reliability

Architectural Patterns for Both Availability and Reliability

Senior architects blend patterns that serve both goals. Here's how each pattern contributes:

For Availability: - Redundancy (active-passive or active-active) — eliminates single points of failure. - Load balancing with health checks — routes traffic only to healthy instances. - Multi-region deployment — survives entire cloud provider failures. - Graceful degradation — when a dependency fails, serve a fallback response rather than a 500.

For Reliability: - Idempotent APIs — safe retry without double-booking. - Circuit breakers — stop calling a flaky dependency before it corrupts state. - Data validation layers — reject malformed data at every boundary. - Transactional outbox pattern — ensure atomicity between service and database.

The intersection is where most incidents hide. For example, a multi-region failover (availability pattern) can cause temporary data inconsistency (reliability failure) if the secondary region hasn't caught up on replication. That's why chaos engineering drills exercise both availability and reliability scenarios.

🔥Senior Tip

Never implement an availability pattern (like failover) without also verifying the reliability implications. Test what happens to data consistency during failover — and measure the error rate during the transition.

📊 Production Insight

Active-active load balancing improves availability but introduces split-brain risk for stateful services.

If both copies accept writes without coordination, they might diverge — a reliability failure.

Rule: if you're running active-active, you must implement conflict resolution or use a shared data store.

🎯 Key Takeaway

Availability patterns improve uptime but can harm reliability.

Reliability patterns protect correctness but add latency.

Always test the interaction between the two sets of patterns.

Choosing Patterns for Your Service Type

IfStateless service (e.g., API gateway)

→

UseAvailability patterns dominate: add replicas, load balancer, health checks. Reliability is mainly about correct request routing.

IfStateful service with external DB (e.g., order service)

→

UseBoth matter: use idempotency, circuit breakers, database retry logic. Availability requires DB redundancy.

IfStateful service with embedded data (e.g., caching node)

→

UseMostly reliability: data corruption is the biggest risk. Use replication, consistency checks, validation.

IfCritical data pipeline (e.g., batch batch processing)

→

UseReliability first: use checkpointing, idempotent processing, dead letter queues. Availability is secondary — job can be retried.

Ways to Achieve High Availability — The Boring Stuff That Saves Your Weekend

High availability isn't magic. It's redundancy with teeth. You need multiple copies of everything — servers, networks, data centers — and you need them to fail over automatically. Manual failover is just extended downtime with extra stress.

Start with active-active clusters. Two or more instances serve traffic simultaneously. If one dies, the others absorb the load. No cold start, no DNS propagation wait. Next, use load balancers with health checks. Don't just round-robin blindly. Probe your backend for real response codes. A 200 that returns garbage is still a failure.

Database HA is harder. Use synchronous replication for zero data loss, but accept the latency hit. Asynchronous replication is faster but risks a gap. Know which you need before the pager goes off.

Pro tip: test your failover monthly. Nothing humbles a team like discovering your replica's credentials expired six months ago.

HealthCheckProbe.pyPYTHON

// io.thecodeforge — system-design tutorial

import requests
import time

BACKENDS = ["10.0.1.10:8080", "10.0.1.11:8080"]
HEALTH_PATH = "/healthz"

def check_backend(host):
    try:
        resp = requests.get(f"http://{host}{HEALTH_PATH}", timeout=2)
        return resp.status_code == 200
    except (requests.ConnectionError, requests.Timeout):
        return False

while True:
    for backend in BACKENDS:
        if not check_backend(backend):
            print(f"ALERT: {backend} is dead — draining from LB pool")
        else:
            print(f"OK: {backend} healthy")
    time.sleep(5)

Output

OK: 10.0.1.10:8080 healthy

OK: 10.0.1.11:8080 healthy

ALERT: 10.0.1.11:8080 is dead — draining from LB pool

⚠ Production Trap:

Your health check endpoint must test dependencies. If /healthz only returns 'OK' while the database connection pool is exhausted, you'll failover to a broken node. Probe deeper.

🎯 Key Takeaway

Redundancy without automatic failover is just expensive anxiety. Test your recovery path until it hurts.

System Availability vs Asset Reliability — Stop Confusing the Two

Your system is available when users can reach it. Your asset is reliable when its components don't break. These are not the same thing, and treating them like synonyms will get you a false sense of safety.

A single server with 99.99% uptime is reliable — it rarely fails. But it's not available if it's the only thing running and it goes down for maintenance. Availability demands architectural redundancy. Reliability demands component quality and robust software. You can have a reliable hard drive (low failure rate) but still have unavailable data if you only have one copy.

In practice: focus on MTBF (Mean Time Between Failures) for reliability. That tells you how often components break. Focus on MTTR (Mean Time To Recover) for availability. That tells you how fast you recover. Cut MTBF with better components and testing. Cut MTTR with automated pipelines and runbooks.

Most outages are not caused by the hardware failing — they're caused by the team taking 45 minutes to rewrite a config file because no one automated the recovery.

AvailabilityVsReliabilityCalc.pyPYTHON

// io.thecodeforge — system-design tutorial

# MTBF: Mean Time Between Failures (reliability)
# MTTR: Mean Time To Recover (availability)

MTBF_HOURS = 2000   # One failure every ~83 days
MTTR_HOURS = 0.5    # 30 minutes to recover

availability = MTBF_HOURS / (MTBF_HOURS + MTTR_HOURS)
print(f"Availability: {availability:.5f}%)  

# Same system, slower recovery
MTTR_SLOW = 2.0     # 2 hours to recover (manual process)
availability_slow = MTBF_HOURS / (MTBF_HOURS + MTTR_SLOW)
print(f"With slow MTTR: {availability_slow:.5f}%")

Output

Availability: 0.99975%

With slow MTTR: 0.99900%

🔥Senior Shortcut:

When you see a vendor claim '99.999% reliable hardware', ask about their system availability architecture. One is a component spec. The other is an operational truth.

🎯 Key Takeaway

Reliability is about how often things break. Availability is about how fast you recover. Optimize the metric that hurts you most.

● Production incidentPOST-MORTEMseverity: high

The 3AM Pager: 5 Nines Available, 10% Error Rate

Symptom

Checkout failures for 10% of users. Monitoring showed all servers responding, pings succeeding, and load balancer reporting healthy backends. No alerts fired because health checks only verified TCP connectivity, not application logic.

Assumption

If the system is up and responding quickly, it must be working correctly. The team assumed that 99.999% uptime automatically meant reliability.

Root cause

A memcached node returned stale, corrupted serialised objects. The deserialisation logic threw exceptions on half the reads. The server processes themselves were alive — the JVM didn't crash — so TCP health checks passed. Only a synthetic transaction test would have caught it.

Fix

1. Add application-layer health checks that exercise the full checkout flow against a shadow database. 2. Implement circuit breaker on cache reads — after 3 deserialisation failures, fall through to the database. 3. Add SLI for checkout success rate and alert when it drops below 99.5%. This caught the issue within 30 seconds on the next occurrence.

Key lesson

Availability and reliability are not the same metric.
Health checks at the TCP level can hide application-level failures.
Always measure what matters: an SLI for correctness beats any uptime dashboard.
If you only monitor for availability, you'll miss reliability failures until the users complain.

Production debug guideWhen you get paged at 2AM, use this symptom-action guide to quickly classify whether you're facing an availability problem or a reliability problem.4 entries

Symptom · 01

All servers respond to ping but some requests fail with 500s or timeouts

→

Fix

Check application-level health endpoints. Run synthetic transaction probes. If probes fail but ping succeeds, you have a reliability issue — not an availability issue.

Symptom · 02

Server doesn't respond at all or load balancer marks it unhealthy

→

Fix

Availability problem. Check OS resources, process existence, and network connectivity. Restart service or failover to a redundant instance.

Symptom · 03

Error rate spikes on one host while others are fine

→

Fix

Isolate the host — likely reliability failure (e.g., corrupted cache, disk error). Remove from rotation and investigate root cause. If all hosts spike simultaneously, check dependency health.

Symptom · 04

Uptime dashboard shows 99.999% but customer complaints about wrong data

→

Fix

Review SLI definitions. You're measuring availability, not reliability. Instrument the data path with checksum or validation middleware. Alert on data integrity violations.

★ The Availability vs Reliability Quick Debug Cheat SheetUse this when the on-call phone buzzes. It'll save you 20 minutes of guessing.

Server not reachable−

Immediate action

Check if process is running and port is open.

Commands

curl -I http://localhost:8080/health

systemctl status my-service

Fix now

Restart the service or trigger failover to replica.

Server reachable but errors returned+

No errors in logs but users complain about stale data+

Availability vs Reliability at a Glance

Dimension	Availability	Reliability
Definition	System is reachable and responds to requests	System returns correct and consistent results
Primary Metric	Uptime percentage (nines)	Error rate, latency percentiles, data integrity
Measurement Method	Health checks, uptime monitors	Synthetic transactions, log analysis, checksums
Worst Failure Mode	System unreachable (outage)	Silent data corruption (users trust wrong data)
Typical SLO	99.9% – 99.999% uptime	< 0.1% error rate, p99 < 500ms
Improvement Patterns	Redundancy, failover, multi-region	Idempotency, circuit breakers, validation layers
Cost Driver	Infrastructure redundancy (more servers, regions)	Development rigor (testing, observability, retries)

⚙ Quick Reference

3 commands from this guide

File	Command / Code	Purpose
iothecodeforgeAvailabilityCalculator.java	public class AvailabilityCalculator {	What Is Availability?
HealthCheckProbe.py	BACKENDS = ["10.0.1.10:8080", "10.0.1.11:8080"]	Ways to Achieve High Availability
AvailabilityVsReliabilityCalc.py	MTBF_HOURS = 2000 # One failure every ~83 days	System Availability vs Asset Reliability

Key takeaways

Availability is about reachability; reliability is about correctness. Never confuse them.

Measure availability with uptime nines; measure reliability with error rates and data integrity SLIs.

High availability without reliability is a lie

your system is up but serving wrong data.

Error budgets tell you when to stop shipping features and start fixing reliability.

Architectural patterns for availability (redundancy, failover) can harm reliability if not designed carefully.

Always include application-layer health checks and synthetic transaction tests

they catch reliability failures that TCP checks miss.

INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01SENIOR

Explain the difference between availability and reliability with a concr...

Q02SENIOR

How do you measure availability for a distributed system with multiple m...

Q03SENIOR

What's the relationship between the CAP theorem and the availability vs ...

Q04SENIOR

You have an SLO of 99.9% reliability (success rate). Your team ships a n...

Q01 of 04SENIOR

Explain the difference between availability and reliability with a concrete production example.

ANSWER

Availability is whether the system is reachable; reliability is whether it returns the correct result. For example, a payment gateway that returns HTTP 200 but never actually charges the card is 100% available but 0% reliable. In a real incident, we had a cache node that corrupted serialised objects — the server was up (available) but on half the reads the deserialisation threw an exception (unreliable). Only application-layer synthetic tests caught it.

FAQ · 5 QUESTIONS

Frequently Asked Questions

What is the difference between availability and reliability in simple terms?

How many nines of availability do most production systems target?

Can a system be reliable but not available?

What is an error budget and how does it relate to reliability?

How do you measure reliability with synthetic transactions?

Naren Founder & Principal Engineer

20+ years shipping large-scale distributed systems. Lessons pulled from things that broke in production.

✓ Verified

production tested

July 18, 2026

last updated

2,466

articles · all by Naren

🔥

That's Fundamentals. Mark it forged?

5 min read · try the examples if you haven't