Intermediate 8 min · March 06, 2026

ASP.NET Core Health Checks: Liveness Probe Timeout Restarts

Q: How do I add health checks to an existing ASP.NET Core app without breaking anything?

Add `builder.Services.AddHealthChecks()` in Program.cs and call `app.MapHealthChecks('/healthz')` before `app.Run()`. This adds a new endpoint and touches nothing else in your app. You can start with zero checks registered — it just returns 'Healthy' — and add real checks incrementally. There's no risk of breaking existing routes.

Q: What NuGet packages do I need for health checks in ASP.NET Core?

The core health check middleware is built into `Microsoft.AspNetCore.Diagnostics.HealthChecks`, which ships with the ASP.NET Core SDK — no extra package needed. For the visual UI dashboard you need `AspNetCore.HealthChecks.UI`, `AspNetCore.HealthChecks.UI.Client`, and a storage package like `AspNetCore.HealthChecks.UI.InMemory.Storage`. Community packages like `AspNetCore.HealthChecks.SqlServer` exist for common dependencies but the custom `IHealthCheck` approach shown above gives you more control.

Q: Can I use health checks with .NET Framework or only .NET Core?

The built-in `Microsoft.Extensions.Diagnostics.HealthChecks` middleware is an ASP.NET Core feature introduced in version 2.2 and is not available in .NET Framework. If you're on .NET Framework, you'd need to hand-roll a similar pattern using an HTTP handler or a NuGet package like `Polly` combined with a custom endpoint. Upgrading to .NET 6+ is the practical path to getting the full health check ecosystem.

Q: How do I add a health check for a background service?

Create a singleton class (e.g., `WorkerHealthStatus`) that holds the current health state with thread-safe methods. Have your `BackgroundService` update this object periodically or on failure. Then implement an `IHealthCheck` that reads from the same singleton. Register both the singleton and the health check in DI. This decouples the actual worker work from the health check invocation, keeping the health check lightweight.

When database slowed, liveness probe timed out in 5s, causing all pods to restart in a minute.

Naren Founder & Principal Engineer

20+ years shipping production .NET services in enterprise systems. Written from production experience, not tutorials.

✓ Production

production tested

July 18, 2026

last updated

2,466

articles · all by Naren

Before you start⏱ 25 min

✓Solid grasp of fundamentals
✓Comfortable reading code examples
✓Basic production concepts

● Production Incident 🔎 Debug Guide ⚙ Triage Commands

⚡Quick Answer

ASP.NET Core health checks let external systems (K8s, load balancers) query app and dependency status in a standard way.
Split liveness (pod alive) and readiness (can serve traffic) onto separate endpoints using tag filtering.
Custom IHealthCheck classes with timeouts prevent hung checks from blocking probes – always fail fast.
HTTP status mapping via ResultStatusCodes controls how Degraded vs Unhealthy affects infrastructure decisions.
The Health Checks UI dashboard gives ops teams visual history and drill-down per check.
Never put database checks on the liveness probe – that causes restart storms during partial outages.

✦ Definition~90s read

What is Health Checks in ASP.NET Core?

ASP.NET Core Health Checks are a built-in middleware pipeline that exposes the operational status of your application via configurable HTTP endpoints. They exist to solve a specific problem: orchestrators like Kubernetes, Docker Swarm, or Azure App Service need a deterministic way to decide if your container should receive traffic (readiness) or be killed and restarted (liveness).

★

Imagine a hospital with a dashboard showing every patient's vital signs — heart rate, blood pressure, oxygen — all on one screen.

Without them, your app is a black box — the orchestrator can only check if the process is running, not if it's actually serving requests. Health checks give you a structured, standardized mechanism to report on dependencies like databases, message queues, or external APIs, and to control the HTTP status code (200 vs 503) that triggers orchestration decisions.

The pipeline is extensible via IHealthCheck implementations, supports grouping (e.g., 'ready' vs 'live' endpoints), and integrates with the HealthCheckService for programmatic checks. Alternatives include rolling your own endpoint with manual status reporting, but you lose the built-in registration, caching, and response writer customization.

Don't use health checks for monitoring or alerting — that's what Prometheus metrics or Application Insights are for. Health checks are for orchestration decisions, not observability.

Plain-English First

Imagine a hospital with a dashboard showing every patient's vital signs — heart rate, blood pressure, oxygen — all on one screen. A doctor glances at it and instantly knows who needs attention. ASP.NET Core health checks are exactly that dashboard for your application. Instead of patients, you're monitoring your database connection, your message queue, your disk space, and any other system your app depends on. One endpoint, one glance, and you know if everything is healthy or something is about to crash.

⚙ Browser compatibility

Latest versions — ✓ supported

Chrome	Firefox	Safari	Edge
✓	✓	✓	✓

When your app is running in production, 'it deployed successfully' is just the beginning. Kubernetes needs to know whether to send traffic to your pod. Your load balancer needs to decide if an instance should be taken out of rotation. Your ops team needs an alert before a full outage hits — not after. Without a structured health check system, you're flying blind. You're relying on a user to tell you something is broken, which is the worst possible monitoring strategy.

Health checks solve a specific, painful problem: how do external systems and internal teams get a reliable, machine-readable signal about whether your application and all of its dependencies are functioning correctly? Before ASP.NET Core 2.2, teams would hand-roll ping endpoints, scatter try-catch blocks across random controllers, and end up with inconsistent, unreliable status pages. The built-in health check middleware standardises all of that — with a clean model for registering checks, aggregating results, and exposing them over HTTP.

By the end of this article you'll know how to register built-in and custom health checks, gate them by tags for different audiences (liveness vs readiness), wire up the visual Health Checks UI dashboard, and avoid the three mistakes that catch almost every developer the first time. You'll also have copy-paste-ready code patterns you can drop into a real project today.

Why ASP.NET Core Health Checks Are Not Optional for Liveness Probes

ASP.NET Core health checks expose endpoints that return the operational status of your application — typically as a 200 OK or 503 Service Unavailable. The core mechanic is simple: you register one or more checks (e.g., database connectivity, disk space, external API reachability) and the middleware aggregates their results into a single response. Kubernetes liveness probes call this endpoint to decide whether to restart the pod.

The critical property that bites teams: health checks have a configurable timeout. If a check hangs (e.g., a database query deadlocks or an HTTP call to a downstream service times out after 30 seconds), the entire health check endpoint can block for that duration. Kubernetes liveness probes have their own timeout (default 1 second). When the health check exceeds the probe timeout, Kubernetes marks the pod as unhealthy and restarts it — even if the app is perfectly fine. This creates a restart loop that can cascade across replicas.

You need health checks when running in an orchestrator like Kubernetes that relies on liveness probes for self-healing. Without them, a deadlocked thread or a slow dependency can silently degrade your service. With them, you get automatic recovery — but only if you set timeouts aggressively (e.g., 500ms) and avoid blocking operations inside checks. Use them to detect catastrophic failures, not to monitor latency.

⚠ Timeout Mismatch Kills Pods

If your health check timeout exceeds the Kubernetes liveness probe period (default 10s) or timeout (1s), every slow check triggers a restart — not a warning.

📊 Production Insight

A team had a health check that called an external payment API with a 30-second timeout. When that API slowed down, every liveness probe timed out after 1 second, Kubernetes restarted all pods, and the service was down for 5 minutes.

Symptom: pods in CrashLoopBackOff with no application exceptions — only 'liveness probe failed' events.

Rule of thumb: set health check timeouts to ≤ 500ms and never call external services synchronously in a liveness check.

🎯 Key Takeaway

Health checks must complete within the Kubernetes probe timeout — default 1 second — or they cause restarts.

Never put slow or external dependencies in a liveness check; use readiness checks for that.

Always set a CancellationToken and a short timeout on every health check registration.

thecodeforge.io

Health Checks Aspnet Core

How the Health Check Pipeline Actually Works

Before writing a single line of code, it's worth understanding the architecture — because once you see it, every API decision makes sense.

ASP.NET Core's health check system has three layers. First, you register one or more IHealthCheck implementations with the DI container via AddHealthChecks(). Each check is a small class with a single method — CheckHealthAsync — that returns a HealthCheckResult of Healthy, Degraded, or Unhealthy.

Second, the framework aggregates those results. When the health endpoint is hit, it runs all registered checks (or a filtered subset by tag), collects every result, and computes an overall status. If any check is Unhealthy, the aggregate is Unhealthy. If any is Degraded but none are Unhealthy, the aggregate is Degraded.

Third, the middleware serialises that result and returns an HTTP response. By default it just writes 'Healthy' or 'Unhealthy' as plain text. But you can swap in a custom response writer to return rich JSON — which is exactly what production systems need.

The key insight here is separation of concerns: the check logic, the aggregation logic, and the serialisation logic are all independent. That's what makes the system so composable.

Program.csCSHARP

// Program.cs — Minimal API style (NET 6+)
// This is the absolute foundation. Every health check setup starts here.

var builder = WebApplication.CreateBuilder(args);

// Step 1: Register the health check services with the DI container.
// AddHealthChecks() returns an IHealthChecksBuilder you can chain onto.
builder.Services.AddHealthChecks()
    // Register a named check. The name appears in the JSON response
    // so ops teams know exactly WHICH check failed.
    .AddCheck("self", () => HealthCheckResult.Healthy("App is running"))
    
    // Tags let you group checks for different audiences.
    // 'live' = Kubernetes liveness probe (is the process alive?)
    // 'ready' = Kubernetes readiness probe (can it serve traffic?)
    .AddCheck(
        name: "startup-warmup",
        check: () => HealthCheckResult.Healthy("Warm-up complete"),
        tags: new[] { "live" }
    );

var app = builder.Build();

// Step 2: Map the health check endpoints.
// /healthz/live — only runs checks tagged 'live'
// /healthz/ready — only runs checks tagged 'ready'
// /healthz       — runs ALL checks (useful for ops dashboards)
app.MapHealthChecks("/healthz", new HealthCheckOptions
{
    // ResponseWriter controls what gets written to the HTTP response body.
    // WriteResponse is a static helper we define in the next section.
    ResponseWriter = HealthCheckResponseWriter.WriteResponse
});

app.MapHealthChecks("/healthz/live", new HealthCheckOptions
{
    // Predicate filters which checks run on this endpoint.
    // Here we only run checks tagged 'live'.
    Predicate = check => check.Tags.Contains("live"),
    ResponseWriter = HealthCheckResponseWriter.WriteResponse
});

app.MapHealthChecks("/healthz/ready", new HealthCheckOptions
{
    Predicate = check => check.Tags.Contains("ready"),
    ResponseWriter = HealthCheckResponseWriter.WriteResponse
});

app.Run();

Output

// GET /healthz

// HTTP 200 OK

{

"status": "Healthy",

"totalDuration": "00:00:00.0012345",

"entries": {

"self": {

"status": "Healthy",

"description": "App is running",

"duration": "00:00:00.0001234"

"startup-warmup": {

"status": "Healthy",

"description": "Warm-up complete",

"duration": "00:00:00.0000987"

}

🔥Why Two Endpoints?

Kubernetes uses both a liveness probe (/healthz/live) and a readiness probe (/healthz/ready). Liveness asks 'is the process crashed?' — if it fails, K8s restarts the pod. Readiness asks 'is this pod ready to receive traffic?' — if it fails, K8s removes it from the load balancer but doesn't restart it. Mixing all your checks on one endpoint means a slow database query could trigger a pod restart, which is almost never what you want.

📊 Production Insight

The default response writer returns only 'Healthy' or 'Unhealthy' as plain text.

Engineers in production need to know which check failed and why.

Always replace the default writer with a custom JSON writer that includes entry details and exceptions.

🎯 Key Takeaway

Health checks have three independent layers: registration, aggregation, serialisation.

Understand how they connect before writing your first check.

Separate concerns mean you can replace any layer without touching the others.

Writing a Real Custom Health Check — Database + External API

The built-in lambda-style checks are fine for demos, but production systems need proper IHealthCheck implementations. This is where the pattern gets genuinely powerful.

A well-written health check does three things: it detects a real failure condition (not just 'can I reach the host'), it includes diagnostic data in the result so engineers can debug without reading logs, and it fails fast — it has a timeout so a slow dependency doesn't hold up your entire health endpoint.

Let's build two concrete examples: a SQL Server check that validates query execution (not just connection), and an external HTTP API check that confirms the downstream service is actually responding correctly.

Notice the pattern in both checks: the try/catch returns Unhealthy with the exception message as the description. That description surfaces in the JSON response, which means your on-call engineer sees the actual error message — not just a red dot on a dashboard.

SqlServerHealthCheck.csCSHARP

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

// SqlServerHealthCheck.cs
// A production-grade database health check that validates the connection
// AND confirms the database can execute a real query — not just ping.

using Microsoft.Extensions.Diagnostics.HealthChecks;
using Microsoft.Data.SqlClient;

public class SqlServerHealthCheck : IHealthCheck
{
    private readonly string _connectionString;
    
    // Inject the connection string via DI rather than hard-coding it.
    // In production this comes from IConfiguration / environment variables.
    public SqlServerHealthCheck(IConfiguration configuration)
    {
        _connectionString = configuration.GetConnectionString("DefaultConnection")
            ?? throw new InvalidOperationException("DefaultConnection string is not configured.");
    }

    public async Task<HealthCheckResult> CheckHealthAsync(
        HealthCheckContext context,
        CancellationToken cancellationToken = default)
    {
        try
        {
            // Use a short timeout — health checks should fail fast.
            // 5 seconds is a reasonable maximum for a DB ping.
            using var cts = CancellationTokenSource.CreateLinkedTokenSource(cancellationToken);
            cts.CancelAfter(TimeSpan.FromSeconds(5));

            await using var connection = new SqlConnection(_connectionString);
            await connection.OpenAsync(cts.Token);

            // Run a trivial query — SELECT 1 confirms the DB engine is
            // accepting queries, not just that the TCP port is open.
            await using var command = connection.CreateCommand();
            command.CommandText = "SELECT 1";
            await command.ExecuteScalarAsync(cts.Token);

            // Include useful diagnostic data in the result.
            // This appears in the JSON response and in health check UI.
            var data = new Dictionary<string, object>
            {
                { "database", connection.Database },
                { "server", connection.DataSource }
            };

            return HealthCheckResult.Healthy(
                description: "SQL Server is reachable and accepting queries.",
                data: data
            );
        }
        catch (OperationCanceledException)
        {
            // Distinguish a timeout from a general failure —
            // timeouts and connection errors need different ops responses.
            return HealthCheckResult.Unhealthy(
                description: "SQL Server health check timed out after 5 seconds."
            );
        }
        catch (Exception ex)
        {
            // The exception message goes into the description field
            // so it shows up directly in your monitoring dashboard.
            return HealthCheckResult.Unhealthy(
                description: $"SQL Server check failed: {ex.Message}",
                exception: ex
            );
        }
    }
}

// ─────────────────────────────────────────────────────────────────────────────
// ExternalPaymentApiHealthCheck.cs
// Checks that a critical downstream HTTP dependency is healthy.
// Uses a named HttpClient registered via IHttpClientFactory — the correct
// pattern for health checks, which must not create HttpClient instances
// directly (causes socket exhaustion).

public class ExternalPaymentApiHealthCheck : IHealthCheck
{
    private readonly IHttpClientFactory _httpClientFactory;

    public ExternalPaymentApiHealthCheck(IHttpClientFactory httpClientFactory)
    {
        _httpClientFactory = httpClientFactory;
    }

    public async Task<HealthCheckResult> CheckHealthAsync(
        HealthCheckContext context,
        CancellationToken cancellationToken = default)
    {
        try
        {
            // Use the named client configured in Program.cs.
            // Named clients have pre-configured BaseAddress, timeout, etc.
            var httpClient = _httpClientFactory.CreateClient("PaymentApiClient");

            // Hit the payment API's own health endpoint rather than a
            // business endpoint — avoids triggering real business logic.
            var response = await httpClient.GetAsync("/health", cancellationToken);

            if (response.IsSuccessStatusCode)
            {
                return HealthCheckResult.Healthy(
                    description: $"Payment API responded with {(int)response.StatusCode}."
                );
            }

            // Degraded = the service is reachable but not fully healthy.
            // This is useful when a dependency is slow or partially down.
            return HealthCheckResult.Degraded(
                description: $"Payment API returned unexpected status: {(int)response.StatusCode}."
            );
        }
        catch (HttpRequestException ex)
        {
            return HealthCheckResult.Unhealthy(
                description: $"Cannot reach Payment API: {ex.Message}",
                exception: ex
            );
        }
        catch (TaskCanceledException)
        {
            return HealthCheckResult.Unhealthy(
                description: "Payment API health check timed out."
            );
        }
    }
}

// ─────────────────────────────────────────────────────────────────────────────
// Program.cs — registering both custom checks

builder.Services.AddHttpClient("PaymentApiClient", client =>
{
    client.BaseAddress = new Uri("https://api.paymentprovider.com");
    // Set a tight timeout — do not rely on the default 100s HttpClient timeout.
    client.Timeout = TimeSpan.FromSeconds(8);
});

builder.Services.AddHealthChecks()
    .AddCheck<SqlServerHealthCheck>(
        name: "sql-server",
        failureStatus: HealthStatus.Unhealthy,  // a DB failure = fully unhealthy
        tags: new[] { "ready", "db" }
    )
    .AddCheck<ExternalPaymentApiHealthCheck>(
        name: "payment-api",
        failureStatus: HealthStatus.Degraded,   // payment API down = degraded, not dead
        tags: new[] { "ready", "external" }
    );

Output

// GET /healthz/ready — when SQL Server is fine but Payment API is slow

// HTTP 200 OK (Degraded still returns 200 by default — see Gotchas)

{

"status": "Degraded",

"totalDuration": "00:00:00.2341567",

"entries": {

"sql-server": {

"status": "Healthy",

"description": "SQL Server is reachable and accepting queries.",

"duration": "00:00:00.0234567",

"data": {

"database": "AppDb",

"server": "prod-sql-01.internal"

}

"payment-api": {

"status": "Degraded",

"description": "Payment API returned unexpected status: 503.",

"duration": "00:00:00.2107000",

"data": {}

}

⚠ Watch Out: Never new up HttpClient in a health check

Creating new HttpClient() inside CheckHealthAsync is a classic socket exhaustion bug — health checks run frequently (every few seconds in K8s), so you'll blow through available sockets fast. Always inject IHttpClientFactory and call CreateClient(). It's one extra line of setup in Program.cs and it eliminates the entire problem.

📊 Production Insight

Health checks that create HttpClient directly cause socket exhaustion.

Factory-managed clients reuse connections and respect DNS changes.

Rule: always use IHttpClientFactory for any HTTP-dependent health check.

🎯 Key Takeaway

Write checks that detect real failure, include diagnostic data, and fail fast.

Timeouts are mandatory — a hung check blocks all other checks.

Never new up HttpClient inside CheckHealthAsync.

thecodeforge.io

Health Checks Aspnet Core

Custom JSON Response Writer and the Health Checks UI Dashboard

The default health check response is a single word — 'Healthy' or 'Unhealthy'. That's fine for Kubernetes probes, but it's useless for a human engineer trying to diagnose a problem. You need a JSON response that includes every check name, its status, its description, and how long it took.

ASP.NET Core lets you swap in a custom ResponseWriter — a delegate of type Func<HttpContext, HealthReport, Task>. You write it once, pass it to every HealthCheckOptions instance, and every endpoint automatically returns rich JSON.

For a visual dashboard, the AspNetCore.HealthChecks.UI NuGet package gives you a ready-made React UI that polls your health endpoints and shows a live status board. It's genuinely useful for ops teams — and it takes about ten minutes to set up.

The UI package needs a separate configuration section in appsettings.json that lists the health check URIs to monitor. This means the UI can monitor multiple services, not just the current app — making it a lightweight centralised health dashboard.

HealthCheckResponseWriter.csCSHARP

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

// HealthCheckResponseWriter.cs
// A reusable JSON response writer that returns rich diagnostic output.
// Reference this from every MapHealthChecks call.

using System.Text.Json;
using Microsoft.Extensions.Diagnostics.HealthChecks;

public static class HealthCheckResponseWriter
{
    public static Task WriteResponse(HttpContext context, HealthReport report)
    {
        // Always return JSON — never let this endpoint return HTML.
        context.Response.ContentType = "application/json; charset=utf-8";

        // Map each health check entry to a serialisable anonymous object.
        var responseBody = new
        {
            status = report.Status.ToString(),
            totalDuration = report.TotalDuration.ToString(),
            entries = report.Entries.ToDictionary(
                entry => entry.Key,   // check name e.g. "sql-server"
                entry => new
                {
                    status = entry.Value.Status.ToString(),
                    description = entry.Value.Description,
                    duration = entry.Value.Duration.ToString(),
                    // Serialise the exception message if one was captured.
                    // This is invaluable for on-call debugging.
                    exception = entry.Value.Exception?.Message,
                    data = entry.Value.Data
                }
            )
        };

        // Use camelCase to match the convention of JSON APIs everywhere.
        var jsonOptions = new JsonSerializerOptions
        {
            WriteIndented = true,
            PropertyNamingPolicy = JsonNamingPolicy.CamelCase
        };

        return context.Response.WriteAsync(
            JsonSerializer.Serialize(responseBody, jsonOptions)
        );
    }
}

// ─────────────────────────────────────────────────────────────────────────────
// Program.cs additions for Health Checks UI
// Install: dotnet add package AspNetCore.HealthChecks.UI
//          dotnet add package AspNetCore.HealthChecks.UI.Client
//          dotnet add package AspNetCore.HealthChecks.UI.InMemory.Storage

using HealthChecks.UI.Client;   // provides UIResponseWriter

var builder = WebApplication.CreateBuilder(args);

builder.Services.AddHealthChecks()
    .AddCheck<SqlServerHealthCheck>("sql-server", tags: new[] { "ready", "db" })
    .AddCheck<ExternalPaymentApiHealthCheck>("payment-api", tags: new[] { "ready", "external" });

// Register the UI services and configure in-memory storage for check history.
builder.Services
    .AddHealthChecksUI(settings =>
    {
        // How often the UI polls the health endpoint (in seconds).
        settings.SetEvaluationTimeInSeconds(15);
        
        // Maximum number of history entries to retain per endpoint.
        settings.MaximumHistoryEntriesPerEndpoint(50);
        
        // Register the endpoint the UI will poll.
        // The name shows up as a label in the UI dashboard.
        settings.AddHealthCheckEndpoint(
            name: "Production App",
            uri: "/healthz"
        );
    })
    .AddInMemoryStorage();   // stores check history in-process (use SQL for multi-instance)

var app = builder.Build();

// The /healthz endpoint uses the UI client's response writer.
// UIResponseWriter.WriteHealthCheckUIResponse outputs the exact JSON format
// the UI dashboard expects — richer than our custom writer.
app.MapHealthChecks("/healthz", new HealthCheckOptions
{
    ResponseWriter = UIResponseWriter.WriteHealthCheckUIResponse
});

// Serve the Health Checks UI dashboard at /healthchecks-ui
// Restrict this to internal networks in production!
app.MapHealthChecksUI(options =>
{
    options.UIPath = "/healthchecks-ui";
    options.ApiPath = "/healthchecks-api";
});

app.Run();

// ─────────────────────────────────────────────────────────────────────────────
// appsettings.json — required for multi-service UI monitoring
// (When using AddHealthCheckEndpoint() in code, this section is optional
// but useful for environment-specific overrides via environment variables.)
/*
{
  "HealthChecksUI": {
    "HealthChecks": [
      {
        "Name": "Production App",
        "Uri": "https://myapp.internal/healthz"
      },
      {
        "Name": "Background Worker",
        "Uri": "https://worker.internal/healthz"
      }
    ],
    "EvaluationTimeInSeconds": 15,
    "MaximumHistoryEntriesPerEndpoint": 50
  }
}
*/

Output

// Navigate to https://localhost:5001/healthchecks-ui

// You'll see a dashboard with:

// - A green/yellow/red status badge per registered service

// - A timeline chart showing health history

// - Drill-down per check showing description, duration, exception

// The /healthz JSON response looks like:

{

"status": "Healthy",

"totalDuration": "00:00:00.0342100",

"entries": {

"sql-server": {

"status": "Healthy",

"description": "SQL Server is reachable and accepting queries.",

"duration": "00:00:00.0234100",

"exception": null,

"data": { "database": "AppDb", "server": "prod-sql-01.internal" }

"payment-api": {

"status": "Healthy",

"description": "Payment API responded with 200.",

"duration": "00:00:00.0108000",

"exception": null,

"data": {}

}

💡Pro Tip: Secure your health UI in production

The /healthchecks-ui endpoint exposes infrastructure details — server names, connection strings in exception messages, latency data. Gate it behind a network policy or add app.MapHealthChecksUI().RequireAuthorization('InternalOnly') with an IP-restriction policy. Exposing it publicly is a real security risk.

📊 Production Insight

Health Checks UI exposes server names and exception details.

Without network or authorization gating, you leak internal topology.

Rule: treat the UI endpoint as internal infrastructure — never public.

🎯 Key Takeaway

Custom ResponseWriter gives you full control over JSON shape.

UI dashboard is ten-minute setup for a live ops board.

Always secure the UI endpoint behind network policy or auth.

HTTP Status Codes, Failure Thresholds and the ResultStatusCodes Gotcha

Here's something that surprises almost everyone the first time: by default, ASP.NET Core returns HTTP 200 for both Healthy and Degraded results, and HTTP 503 only for Unhealthy. That means Kubernetes readiness probes — which interpret anything other than 2xx as a failure — won't remove a degraded pod from the load balancer. If 'degraded' for you means 'stop sending traffic here', you need to override this.

You control the HTTP status code mapping via HealthCheckOptions.ResultStatusCodes. It's a dictionary from HealthStatus to HTTP status code. Changing Degraded to map to 503 tells K8s to remove the pod from rotation when any check is degraded.

There's also the FailureStatus concept — set per check registration, not per endpoint. It controls what status gets reported when a check throws an exception or returns Unhealthy. Setting failureStatus: HealthStatus.Degraded on a non-critical check means that check can fail without taking the whole service offline.

These two levers together give you very fine-grained control over how dependency failures propagate to your infrastructure.

HealthCheckStatusCodeConfig.csCSHARP

100

101

// Program.cs — Demonstrating ResultStatusCodes and FailureStatus configuration
// This is the production-ready pattern for a Kubernetes-hosted service.

var builder = WebApplication.CreateBuilder(args);

builder.Services.AddHealthChecks()
    // Critical dependency: DB down = service is Unhealthy
    .AddCheck<SqlServerHealthCheck>(
        name: "sql-server",
        failureStatus: HealthStatus.Unhealthy,
        tags: new[] { "ready" }
    )
    // Important but non-critical: cache down = service is Degraded
    // The service can still serve traffic without Redis, just slower.
    .AddCheck<RedisCacheHealthCheck>(
        name: "redis-cache",
        failureStatus: HealthStatus.Degraded,  // downgrade the severity
        tags: new[] { "ready" }
    )
    // External dependency: payment API down = Degraded (we can queue transactions)
    .AddCheck<ExternalPaymentApiHealthCheck>(
        name: "payment-api",
        failureStatus: HealthStatus.Degraded,
        tags: new[] { "ready" }
    );

var app = builder.Build();

// Liveness endpoint — only the self-check.
// A liveness failure triggers a pod RESTART. Keep this minimal.
// Do NOT include DB or external checks here — a slow DB causes
// restart loops, which makes an outage dramatically worse.
app.MapHealthChecks("/healthz/live", new HealthCheckOptions
{
    Predicate = _ => false,   // run NO registered checks — just return 200
    ResponseWriter = HealthCheckResponseWriter.WriteResponse
});

// Readiness endpoint — all 'ready' tagged checks.
// A readiness failure removes the pod from the load balancer.
app.MapHealthChecks("/healthz/ready", new HealthCheckOptions
{
    Predicate = check => check.Tags.Contains("ready"),
    ResponseWriter = HealthCheckResponseWriter.WriteResponse,

    // THE KEY CHANGE: map Degraded to 503 so K8s stops sending traffic
    // when any dependency is struggling, even if not fully failed.
    ResultStatusCodes =
    {
        [HealthStatus.Healthy]   = StatusCodes.Status200OK,
        [HealthStatus.Degraded]  = StatusCodes.Status503ServiceUnavailable,
        [HealthStatus.Unhealthy] = StatusCodes.Status503ServiceUnavailable
    }
});

// Full check endpoint — for ops dashboards and manual inspection.
// Returns 200 even when Degraded so the dashboard doesn't show false alarms.
app.MapHealthChecks("/healthz", new HealthCheckOptions
{
    ResponseWriter = UIResponseWriter.WriteHealthCheckUIResponse
    // No ResultStatusCodes override — uses default (Degraded = 200)
});

app.Run();

// ─────────────────────────────────────────────────────────────────────────────
// RedisCacheHealthCheck.cs — a lightweight example showing Degraded usage

using StackExchange.Redis;

public class RedisCacheHealthCheck : IHealthCheck
{
    private readonly IConnectionMultiplexer _redis;

    public RedisCacheHealthCheck(IConnectionMultiplexer redis)
    {
        _redis = redis;
    }

    public Task<HealthCheckResult> CheckHealthAsync(
        HealthCheckContext context,
        CancellationToken cancellationToken = default)
    {
        // IsConnected is synchronous — no await needed for a connection check.
        if (_redis.IsConnected)
        {
            return Task.FromResult(
                HealthCheckResult.Healthy("Redis is connected.")
            );
        }

        // Return Degraded here — the service can operate without cache,
        // but performance will degrade. Ops should know.
        // The FailureStatus on the registration (Degraded) means even
        // if this throws an exception, it reports as Degraded not Unhealthy.
        return Task.FromResult(
            HealthCheckResult.Degraded("Redis is not connected. Operating without cache.")
        );
    }
}

Output

// GET /healthz/ready — when Redis is disconnected

// HTTP 503 Service Unavailable <-- Kubernetes now removes this pod from rotation

{

"status": "Degraded",

"totalDuration": "00:00:00.0089234",

"entries": {

"sql-server": {

"status": "Healthy",

"description": "SQL Server is reachable and accepting queries."

"redis-cache": {

"status": "Degraded",

"description": "Redis is not connected. Operating without cache."

"payment-api": {

"status": "Healthy",

"description": "Payment API responded with 200."

}

// GET /healthz/live — always 200 while the process is running

// HTTP 200 OK

{

"status": "Healthy",

"totalDuration": "00:00:00.0001234",

"entries": {}

}

⚠ Watch Out: Putting DB checks on the liveness probe

If your SQL Server is slow and you've put the DB health check on /healthz/live, Kubernetes will interpret the timeout as a dead process and restart your pod. Now every instance is restarting simultaneously while the DB recovers — turning a degraded situation into a complete outage. Liveness should only check 'is the process itself alive?'. Readiness handles dependency checks.

📊 Production Insight

Degraded returns HTTP 200 by default — Kubernetes ignores it and keeps sending traffic.

You must explicitly map Degraded to 503 on the readiness endpoint to make K8s react.

Rule: never assume the default status codes match your infrastructure's expectation.

🎯 Key Takeaway

ResultStatusCodes maps HealthStatus to HTTP codes per endpoint.

FailureStatus maps check failures to HealthStatus per registration.

Combine both levers for fine-grained control over dependency failure propagation.

Health Checks for Background Services and Worker Processes

Not all work happens in request-response cycles. Your app probably runs background services — hosted services that process messages, poll queues, or perform periodic maintenance. If one of those workers stalls, the health endpoint should know about it, even if the main web process is still accepting requests.

The solution is to share state between your BackgroundService and an IHealthCheck implementation, usually via a thread-safe flag or a shared object registered as a singleton. The background service writes its status (last processed timestamp, queue depth, error count), and the health check reads it.

This pattern keeps the health check lightweight and decouples worker logic from health reporting. You get accurate visibility into background activity without making the health check itself execute business logic.

WorkerHealthCheck.csCSHARP

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

// BackgroundQueueProcessor.cs
// A BackgroundService that updates a shared health status object.

public class BackgroundQueueProcessor : BackgroundService
{
    private readonly ILogger<BackgroundQueueProcessor> _logger;
    private readonly WorkerHealthStatus _status;

    public BackgroundQueueProcessor(
        ILogger<BackgroundQueueProcessor> logger,
        WorkerHealthStatus status)  // registered as singleton
    {
        _logger = logger;
        _status = status;
    }

    protected override async Task ExecuteAsync(CancellationToken stoppingToken)
    {
        _logger.LogInformation("Queue processor started.");
        _status.SetHealthy("Worker running, processing queue");

        while (!stoppingToken.IsCancellationRequested)
        {
            try
            {
                // Simulate processing a batch of messages
                await Task.Delay(1000, stoppingToken);

                // Update health status with last run time
                _status.SetHealthy(
                    $"Queue processed at {DateTime.UtcNow:O}",
                    new Dictionary<string, object>
                    {
                        ["lastRun"] = DateTime.UtcNow,
                        ["processedCount"] = Interlocked.Increment(ref _processedCount)
                    }
                );
            }
            catch (OperationCanceledException)
            {
                // Graceful shutdown
                _status.SetDegraded("Worker stopping");
                break;
            }
            catch (Exception ex)
            {
                _logger.LogError(ex, "Queue processing failed");
                _status.SetUnhealthy($"Queue processing failed: {ex.Message}");
                // Optionally wait before retrying to avoid tight failure loops
                await Task.Delay(5000, stoppingToken);
            }
        }

        _status.SetDegraded("Worker stopped");
    }

    private long _processedCount;
}

// WorkerHealthStatus.cs
// Thread-safe health status holder for background workers.

public class WorkerHealthStatus
{
    private HealthStatus _status = HealthStatus.Unhealthy;
    private string _description = "Not started";
    private Dictionary<string, object> _data = new();
    private readonly object _lock = new();

    public void SetHealthy(string description, Dictionary<string, object> data = null)
    {
        lock (_lock)
        {
            _status = HealthStatus.Healthy;
            _description = description;
            _data = data ?? new Dictionary<string, object>();
        }
    }

    public void SetDegraded(string description)
    {
        lock (_lock)
        {
            _status = HealthStatus.Degraded;
            _description = description;
        }
    }

    public void SetUnhealthy(string description)
    {
        lock (_lock)
        {
            _status = HealthStatus.Unhealthy;
            _description = description;
        }
    }

    public HealthCheckResult GetResult()
    {
        lock (_lock)
        {
            return new HealthCheckResult(_status, _description, data: _data);
        }
    }
}

// BackgroundWorkerHealthCheck.cs
// IHealthCheck that reads from the shared status object.

public class BackgroundWorkerHealthCheck : IHealthCheck
{
    private readonly WorkerHealthStatus _status;

    public BackgroundWorkerHealthCheck(WorkerHealthStatus status)
    {
        _status = status;
    }

    public Task<HealthCheckResult> CheckHealthAsync(
        HealthCheckContext context,
        CancellationToken cancellationToken = default)
    {
        // Just delegate to the shared status object — no async work needed.
        return Task.FromResult(_status.GetResult());
    }
}

// Program.cs — registration for worker health check

builder.Services.AddSingleton<WorkerHealthStatus>();
builder.Services.AddHostedService<BackgroundQueueProcessor>();

builder.Services.AddHealthChecks()
    .AddCheck<BackgroundWorkerHealthCheck>(
        name: "queue-worker",
        failureStatus: HealthStatus.Degraded,
        tags: new[] { "ready", "background" }
    );

Output

// GET /healthz/ready — when worker is processing

// HTTP 200 OK

{

"status": "Healthy",

"entries": {

"sql-server": { "status": "Healthy" },

"payment-api": { "status": "Healthy" },

"queue-worker": {

"status": "Healthy",

"description": "Queue processed at 2026-04-22T14:35:10.123Z",

"data": {

"lastRun": "2026-04-22T14:35:10.123Z",

"processedCount": 42

}

// GET /healthz/ready — when worker has failed

// HTTP 503 Service Unavailable (if configured via ResultStatusCodes)

{

"status": "Unhealthy",

"entries": {

"sql-server": { "status": "Healthy" },

"payment-api": { "status": "Healthy" },

"queue-worker": {

"status": "Unhealthy",

"description": "Queue processing failed: Connection to message bus refused",

"data": {}

}

🔥Why Use a Shared Status Object?

The worker health check doesn't call the message queue every time it's invoked — that would be slow and could overwhelm the queue during an outage. Instead, the background service writes its status periodically, and the health check reads the latest value. This decouples check execution from actual monitoring cost.

📊 Production Insight

Background workers can stall silently while the web layer stays healthy.

A shared status object lets the health check see worker failures immediately.

Rule: always add a health check for each critical BackgroundService — don't assume it's running just because the process is up.

🎯 Key Takeaway

Use a singleton shared status object between BackgroundService and IHealthCheck.

Workers update status periodically; health checks read it — no heavy lifting in the check.

This pattern avoids worker health checks that depend on the very system they monitor.

Separate Readiness from Liveness or Your Pods Will Cycle Forever

You cannot use the same health check endpoint for both readiness and liveness probes in any serious Kubernetes deployment. Liveness tells the orchestrator 'kill me and restart me'. Readiness says 'don't send traffic yet'. If your database goes down and both probes point to the same endpoint, Kubernetes kills the pod instead of just removing it from the service load balancer. That means a transient DB timeout becomes a full pod restart — and your team gets paged at 3 AM for a connection pool blip. The fix is cheap: two endpoints. One lightweight liveness check that only verifies the HTTP pipeline is alive (static file, in-memory, no dependencies). One full readiness check that pings your database, cache, and critical downstream APIs. ASP.NET Core supports this natively with separate MapHealthChecks calls. Map /healthz to a simple 'always healthy' check. Map /ready to your real dependency probes. Configure your orchestrator to use /ready for readiness and /healthz for liveness. Your SRE team will thank you.

Program.csCSHARP

// io.thecodeforge
var builder = WebApplication.CreateBuilder(args);
builder.Services.AddHealthChecks()
    .AddCheck<AlwaysHealthyProbe>("liveness")
    .AddCheck<DatabaseAndCacheProbe>("readiness");

var app = builder.Build();

// Liveness: lightweight, no I/O
app.MapHealthChecks("/healthz", new HealthCheckOptions
{
    Predicate = check => check.Name == "liveness"
});

// Readiness: full dependency check
app.MapHealthChecks("/ready", new HealthCheckOptions
{
    Predicate = check => check.Name == "readiness",
    ResponseWriter = WriteCustomJsonResponse
});

app.Run();

Output

GET /healthz -> 200 OK (always)

GET /ready -> 503 if DB down, 200 if all healthy

⚠ Production Trap:

If you use a single health endpoint for both probes, a database outage kills your pod instead of gracefully draining traffic. That turns a 5-second DB reconnect into a full container restart cycle.

🎯 Key Takeaway

Always run two health endpoints: a zero-dependency /healthz for liveness and a full dependency /ready for readiness. Your orchestrator needs the distinction to avoid pointless restarts.

The Docker HEALTHCHECK Command That Actually Prevents a CrashLoopBackOff

Your Dockerfile HEALTHCHECK command is not the same as Kubernetes probes. Docker runs this command inside the container every N seconds. If it fails three times in a row, Docker marks the container as unhealthy. Kubernetes sees that status and eventually kills the pod. The trap? Developers write HEALTHCHECK curl --fail http://localhost:5000/healthz without considering the startup grace period. On slow hardware or when the database is cold-starting, that curl fails for the first 10 seconds. Docker decides the container is unhealthy immediately. Kubernetes gets confused. Now you have a CrashLoopBackOff that is actually just a slow start. Fix it with a retry loop and a generous --retry flag. Or better, write a health check endpoint that returns 503 until the app signals ready. Then HEALTHCHECK can use a simple curl with --retry-connrefused and --retry 5. This gives your app time to warm up EF Core connections, prime caches, and validate database access before the orchestrator decides to kill it.

DockerfileDOCKERFILE

# io.thecodeforge
FROM mcr.microsoft.com/dotnet/aspnet:8.0 AS base
WORKDIR /app
EXPOSE 8080

FROM mcr.microsoft.com/dotnet/sdk:8.0 AS build
WORKDIR /src
COPY ["App.csproj", "."]
RUN dotnet restore
COPY . .
RUN dotnet publish -c Release -o /app/publish

FROM base AS final
WORKDIR /app
COPY --from=build /app/publish .

# Retry 5 times with 3s interval to allow slow startup
HEALTHCHECK --interval=15s --timeout=3s --start-period=30s --retries=3 \
  CMD curl --fail --retry 5 --retry-connrefused --retry-delay 3 http://localhost:8080/ready || exit 1

ENTRYPOINT ["dotnet", "App.dll"]

Output

Docker ps output shows healthy after 30s startup period, even if /ready returns 503 briefly

🔥The Startup-Period Gotcha:

Docker HEALTHCHECK has a --start-period flag. Use it. Without it, your container fails health checks during warmup and Kubernetes marks it as unhealthy before your app has even opened a connection pool.

🎯 Key Takeaway

Always set --start-period in Docker HEALTHCHECK to match your app's cold-start time. Combine it with --retry-connrefused so slow database initialization doesn't trigger a premature restart.

Readiness and Liveness Probes for Kubernetes

In Kubernetes, health checks are configured via probes that determine how your application is treated. Liveness probes indicate whether the container is running; if they fail, Kubernetes restarts the pod. Readiness probes indicate whether the container is ready to serve traffic; if they fail, the pod is removed from service endpoints. It's critical to separate these concerns to avoid unnecessary restarts during startup or temporary unavailability.

For ASP.NET Core, you can expose different endpoints for liveness and readiness. A common pattern is to have /health/ready for readiness (checking dependencies like databases and caches) and /health/live for liveness (a simple check that the process is alive). Configure the health checks middleware accordingly:

```csharp app.UseHealthChecks("/health/live", new HealthCheckOptions { Predicate = _ => false // No checks, just returns 200 if process is up });

app.UseHealthChecks("/health/ready", new HealthCheckOptions { Predicate = check => check.Tags.Contains("ready"), ResponseWriter = WriteResponse }); ```

In your Kubernetes deployment YAML, define probes referencing these endpoints:

``yaml livenessProbe: httpGet: path: /health/live port: 80 initialDelaySeconds: 5 periodSeconds: 10 readinessProbe: httpGet: path: /health/ready port: 80 initialDelaySeconds: 10 periodSeconds: 5 ``

This separation ensures that a temporary database outage doesn't cause a pod restart, but only removes it from service rotation until the database recovers.

Startup.csCSHARP

app.UseHealthChecks("/health/live", new HealthCheckOptions
{
    Predicate = _ => false
});

app.UseHealthChecks("/health/ready", new HealthCheckOptions
{
    Predicate = check => check.Tags.Contains("ready"),
    ResponseWriter = WriteResponse
});

⚠ Don't mix liveness and readiness

📊 Production Insight

In production, set appropriate initialDelaySeconds and periodSeconds to account for startup time and dependency latency.

🎯 Key Takeaway

Separate liveness and readiness probes in Kubernetes to avoid restarts during temporary unavailability.

HealthCheckService for Programmatic Health Verification

ASP.NET Core provides the HealthCheckService class that allows you to programmatically run health checks from within your application. This is useful for scenarios like exposing health status via a custom API, triggering health checks on demand, or integrating with monitoring systems.

To use HealthCheckService, inject it into your controller or service:

```csharp [ApiController] [Route("api/[controller]")] public class HealthController : ControllerBase { private readonly HealthCheckService _healthCheckService;

public HealthController(HealthCheckService healthCheckService) { _healthCheckService = healthCheckService; }

[HttpGet] public async Task Get() { var result = await _healthCheckService.CheckHealthAsync(); var status = result.Status == HealthStatus.Healthy ? "Healthy" : "Unhealthy"; return Ok(new { status, totalDuration = result.TotalDuration }); } } ```

You can also filter checks by tags or pass a cancellation token. For example, to run only checks tagged as "database":

``csharp var result = await _healthCheckService.CheckHealthAsync( check => check.Tags.Contains("database")); ``

This programmatic approach gives you full control over when and how health checks are executed, enabling custom reporting or conditional logic based on health status.

HealthController.csCSHARP

[ApiController]
[Route("api/[controller]")]
public class HealthController : ControllerBase
{
    private readonly HealthCheckService _healthCheckService;

    public HealthController(HealthCheckService healthCheckService)
    {
        _healthCheckService = healthCheckService;
    }

    [HttpGet]
    public async Task<IActionResult> Get()
    {
        var result = await _healthCheckService.CheckHealthAsync();
        var status = result.Status == HealthStatus.Healthy ? "Healthy" : "Unhealthy";
        return Ok(new { status, totalDuration = result.TotalDuration });
    }
}

💡Use HealthCheckService for custom endpoints

📊 Production Insight

In production, consider caching health check results for a short duration to avoid overwhelming dependencies with frequent checks.

🎯 Key Takeaway

HealthCheckService enables programmatic health verification, giving you flexibility to integrate health checks into custom APIs or monitoring workflows.

Custom Health Check Patterns: Database, Cache, External APIs

Real-world applications depend on various external services. Custom health checks allow you to verify each dependency's availability. Common patterns include checking databases, caches (like Redis), and external APIs.

Database Health Check

Use Entity Framework Core or raw ADO.NET to test connectivity. For example, a simple SQL Server health check:

```csharp public class SqlServerHealthCheck : IHealthCheck { private readonly string _connectionString;

public SqlServerHealthCheck(IConfiguration configuration) { _connectionString = configuration.GetConnectionString("DefaultConnection"); }

public async Task CheckHealthAsync(HealthCheckContext context) { try { using var connection = new SqlConnection(_connectionString); await connection.OpenAsync(); using var command = connection.CreateCommand(); command.CommandText = "SELECT 1"; await command.ExecuteScalarAsync(); return HealthCheckResult.Healthy(); } catch (Exception ex) { return HealthCheckResult.Unhealthy("Database is not reachable", ex); } } } ```

Cache Health Check (Redis)

For Redis, use the IDatabase.PingAsync() method:

```csharp public class RedisHealthCheck : IHealthCheck { private readonly IConnectionMultiplexer _redis;

public RedisHealthCheck(IConnectionMultiplexer redis) { _redis = redis; }

public async Task CheckHealthAsync(HealthCheckContext context) { try { var db = _redis.GetDatabase(); await db.PingAsync(); return HealthCheckResult.Healthy(); } catch (Exception ex) { return HealthCheckResult.Unhealthy("Redis is not reachable", ex); } } } ```

External API Health Check

Use HttpClient to call a health endpoint of an external service:

```csharp public class ExternalApiHealthCheck : IHealthCheck { private readonly HttpClient _httpClient;

public ExternalApiHealthCheck(HttpClient httpClient) { _httpClient = httpClient; }

public async Task CheckHealthAsync(HealthCheckContext context) { try { var response = await _httpClient.GetAsync("https://api.example.com/health"); if (response.IsSuccessStatusCode) return HealthCheckResult.Healthy(); return HealthCheckResult.Degraded("API returned non-success status"); } catch (Exception ex) { return HealthCheckResult.Unhealthy("API is not reachable", ex); } } } ```

``csharp services.AddHealthChecks() .AddCheck("SQL Server", tags: new[] { "ready" }) .AddCheck("Redis", tags: new[] { "ready" }) .AddCheck("External API", tags: new[] { "ready" }); ``

Tagging checks as "ready" allows you to use them only for readiness probes.

SqlServerHealthCheck.csCSHARP

public class SqlServerHealthCheck : IHealthCheck
{
    private readonly string _connectionString;

    public SqlServerHealthCheck(IConfiguration configuration)
    {
        _connectionString = configuration.GetConnectionString("DefaultConnection");
    }

    public async Task<HealthCheckResult> CheckHealthAsync(HealthCheckContext context)
    {
        try
        {
            using var connection = new SqlConnection(_connectionString);
            await connection.OpenAsync();
            using var command = connection.CreateCommand();
            command.CommandText = "SELECT 1";
            await command.ExecuteScalarAsync();
            return HealthCheckResult.Healthy();
        }
        catch (Exception ex)
        {
            return HealthCheckResult.Unhealthy("Database is not reachable", ex);
        }
    }
}

🔥Tag your health checks

📊 Production Insight

In production, add timeouts to health checks to prevent them from hanging indefinitely, and consider using a circuit breaker pattern for external API checks.

🎯 Key Takeaway

Custom health checks for databases, caches, and external APIs ensure your application accurately reports its dependency health.

● Production incidentPOST-MORTEMseverity: high

The Restart Storm That Took Down Three Services

Symptom

All pods across three services started restarting in rapid succession. Requests returned 503 errors. The database team reported slow queries due to an unplanned index rebuild, but the apps were crashing, not just slowing down.

Assumption

The team assumed that if the health check returned Unhealthy, Kubernetes would handle it gracefully. They thought setting a high failureThreshold on the liveness probe would buy them time.

Root cause

The liveness probe (/healthz/live) ran the same SQL Server health check as the readiness probe. When the database slowed down, the health check timed out after 5 seconds. Kubernetes saw the timeout, interpreted it as a dead process, and restarted the pod. With multiple replicas, all restarts happened within the same minute, causing complete downtime.

Fix

Changed the liveness endpoint to run zero checks (Predicate = _ => false) so it always returns 200 as long as the process is alive. Moved the database check exclusively to the readiness endpoint. Set the readiness probe's failureThreshold to 3 to tolerate transient slowness before removing pods from rotation.

Key lesson

Liveness probes must only check if the process itself is alive, not its dependencies.

Production debug guideCommon symptoms and the exact actions to diagnose them5 entries

Symptom · 01

Health endpoint returns 200 but pod keeps restarting

→

Fix

Check which endpoint Kubernetes is using as liveness probe. If it includes dependency checks, reconfigure to only use a no-check liveness endpoint.

Symptom · 02

Health endpoint times out after 30 seconds

→

Fix

Add an explicit CancellationTokenSource with short timeout inside CheckHealthAsync. The default CancellationToken may not enforce a timeout.

Symptom · 03

Health check says Degraded but K8s doesn't stop traffic

→

Fix

Verify ResultStatusCodes mapping on the readiness endpoint. Degraded returns 200 by default – override to map Degraded to 503.

Symptom · 04

JSON response missing exception details

→

Fix

Ensure your ResponseWriter serializes entry.Value.Exception?.Message. The default writer omits exception data.

Symptom · 05

Health Checks UI shows 'Unhealthy' but app works fine

→

Fix

Check if the UI endpoint URI is correct. If using different ports or authentication, the UI might receive a 401 or 404, which it interprets as Unhealthy.

★ Quick Health Check Debug Cheat SheetCommands and fixes for the most common health check issues in production

Liveness probe causing restarts−

Immediate action

Run `kubectl describe pod <pod-name>` and check the Liveness probe section to see which endpoint and threshold is configured.

Commands

kubectl get pods --field-selector=status.phase=Running -o custom-columns=NAME:.metadata.name,LIVENESS:.spec.containers[0].livenessProbe.httpGet.path

curl -w '%{http_code}' http://localhost:5000/healthz/live

Fix now

Change liveness probe path to /healthz/live and configure that endpoint to run no checks (Predicate = _ => false).

Health check times out after 30 seconds+

Degraded not stopping K8s traffic+

Health Checks UI shows 'Unhealthy' but app is fine+

Liveness vs Readiness Probes in Kubernetes

Aspect	Liveness Probe (/healthz/live)	Readiness Probe (/healthz/ready)
Purpose	Is the process itself alive and not deadlocked?	Is the pod ready to receive user traffic?
K8s action on failure	Restarts the pod	Removes pod from load balancer rotation
Recommended checks	Self-check only (return 200 if process runs)	DB, cache, external APIs, message queues
Risk of including DB checks	High — slow DB causes restart storm	Safe — slow DB just pauses traffic to that pod
Typical HTTP success code	200 OK	200 OK (Healthy) or 503 (if Degraded = 503)
Run frequency in K8s	Every 10-30 seconds	Every 10-30 seconds
FailureStatus recommendation	N/A — no checks to configure	Unhealthy for critical, Degraded for non-critical

⚙ Quick Reference

9 commands from this guide

File	Command / Code	Purpose
Program.cs	var builder = WebApplication.CreateBuilder(args);	How the Health Check Pipeline Actually Works
SqlServerHealthCheck.cs	using Microsoft.Extensions.Diagnostics.HealthChecks;	Writing a Real Custom Health Check
HealthCheckResponseWriter.cs	using System.Text.Json;	Custom JSON Response Writer and the Health Checks UI Dashboa
HealthCheckStatusCodeConfig.cs	var builder = WebApplication.CreateBuilder(args);	HTTP Status Codes, Failure Thresholds and the ResultStatusCo
WorkerHealthCheck.cs	public class BackgroundQueueProcessor : BackgroundService	Health Checks for Background Services and Worker Processes
Dockerfile	FROM mcr.microsoft.com/dotnet/aspnet:8.0 AS base	The Docker HEALTHCHECK Command That Actually Prevents a Cras
Startup.cs	app.UseHealthChecks("/health/live", new HealthCheckOptions	Readiness and Liveness Probes for Kubernetes
HealthController.cs	[ApiController]	HealthCheckService for Programmatic Health Verification
SqlServerHealthCheck.cs	public class SqlServerHealthCheck : IHealthCheck	Custom Health Check Patterns

Key takeaways

Split liveness and readiness onto separate endpoints with tag filtering

putting database checks on the liveness probe is the #1 cause of Kubernetes restart storms during partial outages.

Always set a hard timeout inside CheckHealthAsync

a health check that hangs for 100 seconds is worse than one that fails fast and returns Unhealthy after 5 seconds.

Use ResultStatusCodes to map Degraded to HTTP 503 on the readiness endpoint if you want Kubernetes to stop routing traffic to a pod when any dependency is struggling.

The FailureStatus per-check registration and the ResultStatusCodes per-endpoint configuration are independent levers

FailureStatus controls what HealthStatus gets reported, ResultStatusCodes controls what HTTP code that status maps to.

Background services need their own health checks via a shared status object

don't assume a healthy web layer means background workers are still running.

INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01SENIOR

What's the difference between a liveness probe and a readiness probe, an...

Q02SENIOR

A colleague says 'our health check endpoint returned Degraded so Kuberne...

Q03SENIOR

You have five microservices. The payment service depends on a shared Red...

Q01 of 03SENIOR

What's the difference between a liveness probe and a readiness probe, and how do ASP.NET Core health check tags help you implement both correctly?

ANSWER

Liveness probes check if the process is alive – if they fail, Kubernetes restarts the pod. Readiness probes check if the pod can serve traffic – if they fail, Kubernetes removes it from the load balancer. You implement both by mapping separate endpoints (e.g., /healthz/live and /healthz/ready) and using the Predicate option with tag filtering. Liveness endpoints should run zero or only self-checks, readiness endpoints should run all dependency checks. Tags like "live" and "ready" let you group checks cleanly.

FAQ · 4 QUESTIONS

Frequently Asked Questions

How do I add health checks to an existing ASP.NET Core app without breaking anything?

What NuGet packages do I need for health checks in ASP.NET Core?

Can I use health checks with .NET Framework or only .NET Core?

How do I add a health check for a background service?

Naren Founder & Principal Engineer

20+ years shipping production .NET services in enterprise systems. Written from production experience, not tutorials.

✓ Verified

production tested

July 18, 2026

last updated

2,466

articles · all by Naren

🔥

That's ASP.NET. Mark it forged?

8 min read · try the examples if you haven't