ASP.NET Core Health Checks: Build Production-Ready Monitoring
When your app is running in production, 'it deployed successfully' is just the beginning. Kubernetes needs to know whether to send traffic to your pod. Your load balancer needs to decide if an instance should be taken out of rotation. Your ops team needs an alert before a full outage hits — not after. Without a structured health check system, you're flying blind. You're relying on a user to tell you something is broken, which is the worst possible monitoring strategy.
Health checks solve a specific, painful problem: how do external systems and internal teams get a reliable, machine-readable signal about whether your application and all of its dependencies are functioning correctly? Before ASP.NET Core 2.2, teams would hand-roll ping endpoints, scatter try-catch blocks across random controllers, and end up with inconsistent, unreliable status pages. The built-in health check middleware standardises all of that — with a clean model for registering checks, aggregating results, and exposing them over HTTP.
By the end of this article you'll know how to register built-in and custom health checks, gate them by tags for different audiences (liveness vs readiness), wire up the visual Health Checks UI dashboard, and avoid the three mistakes that catch almost every developer the first time. You'll also have copy-paste-ready code patterns you can drop into a real project today.
How the Health Check Pipeline Actually Works
Before writing a single line of code, it's worth understanding the architecture — because once you see it, every API decision makes sense.
ASP.NET Core's health check system has three layers. First, you register one or more IHealthCheck implementations with the DI container via AddHealthChecks(). Each check is a small class with a single method — CheckHealthAsync — that returns a HealthCheckResult of Healthy, Degraded, or Unhealthy.
Second, the framework aggregates those results. When the health endpoint is hit, it runs all registered checks (or a filtered subset by tag), collects every result, and computes an overall status. If any check is Unhealthy, the aggregate is Unhealthy. If any is Degraded but none are Unhealthy, the aggregate is Degraded.
Third, the middleware serialises that result and returns an HTTP response. By default it just writes 'Healthy' or 'Unhealthy' as plain text. But you can swap in a custom response writer to return rich JSON — which is exactly what production systems need.
The key insight here is separation of concerns: the check logic, the aggregation logic, and the serialisation logic are all independent. That's what makes the system so composable.
// Program.cs — Minimal API style (NET 6+) // This is the absolute foundation. Every health check setup starts here. var builder = WebApplication.CreateBuilder(args); // Step 1: Register the health check services with the DI container. // AddHealthChecks() returns an IHealthChecksBuilder you can chain onto. builder.Services.AddHealthChecks() // Register a named check. The name appears in the JSON response // so ops teams know exactly WHICH check failed. .AddCheck("self", () => HealthCheckResult.Healthy("App is running")) // Tags let you group checks for different audiences. // 'live' = Kubernetes liveness probe (is the process alive?) // 'ready' = Kubernetes readiness probe (can it serve traffic?) .AddCheck( name: "startup-warmup", check: () => HealthCheckResult.Healthy("Warm-up complete"), tags: new[] { "live" } ); var app = builder.Build(); // Step 2: Map the health check endpoints. // /healthz/live — only runs checks tagged 'live' // /healthz/ready — only runs checks tagged 'ready' // /healthz — runs ALL checks (useful for ops dashboards) app.MapHealthChecks("/healthz", new HealthCheckOptions { // ResponseWriter controls what gets written to the HTTP response body. // WriteResponse is a static helper we define in the next section. ResponseWriter = HealthCheckResponseWriter.WriteResponse }); app.MapHealthChecks("/healthz/live", new HealthCheckOptions { // Predicate filters which checks run on this endpoint. // Here we only run checks tagged 'live'. Predicate = check => check.Tags.Contains("live"), ResponseWriter = HealthCheckResponseWriter.WriteResponse }); app.MapHealthChecks("/healthz/ready", new HealthCheckOptions { Predicate = check => check.Tags.Contains("ready"), ResponseWriter = HealthCheckResponseWriter.WriteResponse }); app.Run();
// HTTP 200 OK
{
"status": "Healthy",
"totalDuration": "00:00:00.0012345",
"entries": {
"self": {
"status": "Healthy",
"description": "App is running",
"duration": "00:00:00.0001234"
},
"startup-warmup": {
"status": "Healthy",
"description": "Warm-up complete",
"duration": "00:00:00.0000987"
}
}
}
Writing a Real Custom Health Check — Database + External API
The built-in lambda-style checks are fine for demos, but production systems need proper IHealthCheck implementations. This is where the pattern gets genuinely powerful.
A well-written health check does three things: it detects a real failure condition (not just 'can I reach the host'), it includes diagnostic data in the result so engineers can debug without reading logs, and it fails fast — it has a timeout so a slow dependency doesn't hold up your entire health endpoint.
Let's build two concrete examples: a SQL Server check that validates query execution (not just connection), and an external HTTP API check that confirms the downstream service is actually responding correctly.
Notice the pattern in both checks: the try/catch returns Unhealthy with the exception message as the description. That description surfaces in the JSON response, which means your on-call engineer sees the actual error message — not just a red dot on a dashboard.
// SqlServerHealthCheck.cs // A production-grade database health check that validates the connection // AND confirms the database can execute a real query — not just ping. using Microsoft.Extensions.Diagnostics.HealthChecks; using Microsoft.Data.SqlClient; public class SqlServerHealthCheck : IHealthCheck { private readonly string _connectionString; // Inject the connection string via DI rather than hard-coding it. // In production this comes from IConfiguration / environment variables. public SqlServerHealthCheck(IConfiguration configuration) { _connectionString = configuration.GetConnectionString("DefaultConnection") ?? throw new InvalidOperationException("DefaultConnection string is not configured."); } public async Task<HealthCheckResult> CheckHealthAsync( HealthCheckContext context, CancellationToken cancellationToken = default) { try { // Use a short timeout — health checks should fail fast. // 5 seconds is a reasonable maximum for a DB ping. using var cts = CancellationTokenSource.CreateLinkedTokenSource(cancellationToken); cts.CancelAfter(TimeSpan.FromSeconds(5)); await using var connection = new SqlConnection(_connectionString); await connection.OpenAsync(cts.Token); // Run a trivial query — SELECT 1 confirms the DB engine is // accepting queries, not just that the TCP port is open. await using var command = connection.CreateCommand(); command.CommandText = "SELECT 1"; await command.ExecuteScalarAsync(cts.Token); // Include useful diagnostic data in the result. // This appears in the JSON response and in health check UI. var data = new Dictionary<string, object> { { "database", connection.Database }, { "server", connection.DataSource } }; return HealthCheckResult.Healthy( description: "SQL Server is reachable and accepting queries.", data: data ); } catch (OperationCanceledException) { // Distinguish a timeout from a general failure — // timeouts and connection errors need different ops responses. return HealthCheckResult.Unhealthy( description: "SQL Server health check timed out after 5 seconds." ); } catch (Exception ex) { // The exception message goes into the description field // so it shows up directly in your monitoring dashboard. return HealthCheckResult.Unhealthy( description: $"SQL Server check failed: {ex.Message}", exception: ex ); } } } // ───────────────────────────────────────────────────────────────────────────── // ExternalPaymentApiHealthCheck.cs // Checks that a critical downstream HTTP dependency is healthy. // Uses a named HttpClient registered via IHttpClientFactory — the correct // pattern for health checks, which must not create HttpClient instances // directly (causes socket exhaustion). public class ExternalPaymentApiHealthCheck : IHealthCheck { private readonly IHttpClientFactory _httpClientFactory; public ExternalPaymentApiHealthCheck(IHttpClientFactory httpClientFactory) { _httpClientFactory = httpClientFactory; } public async Task<HealthCheckResult> CheckHealthAsync( HealthCheckContext context, CancellationToken cancellationToken = default) { try { // Use the named client configured in Program.cs. // Named clients have pre-configured BaseAddress, timeout, etc. var httpClient = _httpClientFactory.CreateClient("PaymentApiClient"); // Hit the payment API's own health endpoint rather than a // business endpoint — avoids triggering real business logic. var response = await httpClient.GetAsync("/health", cancellationToken); if (response.IsSuccessStatusCode) { return HealthCheckResult.Healthy( description: $"Payment API responded with {(int)response.StatusCode}." ); } // Degraded = the service is reachable but not fully healthy. // This is useful when a dependency is slow or partially down. return HealthCheckResult.Degraded( description: $"Payment API returned unexpected status: {(int)response.StatusCode}." ); } catch (HttpRequestException ex) { return HealthCheckResult.Unhealthy( description: $"Cannot reach Payment API: {ex.Message}", exception: ex ); } catch (TaskCanceledException) { return HealthCheckResult.Unhealthy( description: "Payment API health check timed out." ); } } } // ───────────────────────────────────────────────────────────────────────────── // Program.cs — registering both custom checks builder.Services.AddHttpClient("PaymentApiClient", client => { client.BaseAddress = new Uri("https://api.paymentprovider.com"); // Set a tight timeout — do not rely on the default 100s HttpClient timeout. client.Timeout = TimeSpan.FromSeconds(8); }); builder.Services.AddHealthChecks() .AddCheck<SqlServerHealthCheck>( name: "sql-server", failureStatus: HealthStatus.Unhealthy, // a DB failure = fully unhealthy tags: new[] { "ready", "db" } ) .AddCheck<ExternalPaymentApiHealthCheck>( name: "payment-api", failureStatus: HealthStatus.Degraded, // payment API down = degraded, not dead tags: new[] { "ready", "external" } );
// HTTP 200 OK (Degraded still returns 200 by default — see Gotchas)
{
"status": "Degraded",
"totalDuration": "00:00:00.2341567",
"entries": {
"sql-server": {
"status": "Healthy",
"description": "SQL Server is reachable and accepting queries.",
"duration": "00:00:00.0234567",
"data": {
"database": "AppDb",
"server": "prod-sql-01.internal"
}
},
"payment-api": {
"status": "Degraded",
"description": "Payment API returned unexpected status: 503.",
"duration": "00:00:00.2107000",
"data": {}
}
}
}
Custom JSON Response Writer and the Health Checks UI Dashboard
The default health check response is a single word — 'Healthy' or 'Unhealthy'. That's fine for Kubernetes probes, but it's useless for a human engineer trying to diagnose a problem. You need a JSON response that includes every check name, its status, its description, and how long it took.
ASP.NET Core lets you swap in a custom ResponseWriter — a delegate of type Func. You write it once, pass it to every HealthCheckOptions instance, and every endpoint automatically returns rich JSON.
For a visual dashboard, the AspNetCore.HealthChecks.UI NuGet package gives you a ready-made React UI that polls your health endpoints and shows a live status board. It's genuinely useful for ops teams — and it takes about ten minutes to set up.
The UI package needs a separate configuration section in appsettings.json that lists the health check URIs to monitor. This means the UI can monitor multiple services, not just the current app — making it a lightweight centralised health dashboard.
// HealthCheckResponseWriter.cs // A reusable JSON response writer that returns rich diagnostic output. // Reference this from every MapHealthChecks call. using System.Text.Json; using Microsoft.Extensions.Diagnostics.HealthChecks; public static class HealthCheckResponseWriter { public static Task WriteResponse(HttpContext context, HealthReport report) { // Always return JSON — never let this endpoint return HTML. context.Response.ContentType = "application/json; charset=utf-8"; // Map each health check entry to a serialisable anonymous object. var responseBody = new { status = report.Status.ToString(), totalDuration = report.TotalDuration.ToString(), entries = report.Entries.ToDictionary( entry => entry.Key, // check name e.g. "sql-server" entry => new { status = entry.Value.Status.ToString(), description = entry.Value.Description, duration = entry.Value.Duration.ToString(), // Serialise the exception message if one was captured. // This is invaluable for on-call debugging. exception = entry.Value.Exception?.Message, data = entry.Value.Data } ) }; // Use camelCase to match the convention of JSON APIs everywhere. var jsonOptions = new JsonSerializerOptions { WriteIndented = true, PropertyNamingPolicy = JsonNamingPolicy.CamelCase }; return context.Response.WriteAsync( JsonSerializer.Serialize(responseBody, jsonOptions) ); } } // ───────────────────────────────────────────────────────────────────────────── // Program.cs additions for Health Checks UI // Install: dotnet add package AspNetCore.HealthChecks.UI // dotnet add package AspNetCore.HealthChecks.UI.Client // dotnet add package AspNetCore.HealthChecks.UI.InMemory.Storage using HealthChecks.UI.Client; // provides UIResponseWriter var builder = WebApplication.CreateBuilder(args); builder.Services.AddHealthChecks() .AddCheck<SqlServerHealthCheck>("sql-server", tags: new[] { "ready", "db" }) .AddCheck<ExternalPaymentApiHealthCheck>("payment-api", tags: new[] { "ready", "external" }); // Register the UI services and configure in-memory storage for check history. builder.Services .AddHealthChecksUI(settings => { // How often the UI polls the health endpoint (in seconds). settings.SetEvaluationTimeInSeconds(15); // Maximum number of history entries to retain per endpoint. settings.MaximumHistoryEntriesPerEndpoint(50); // Register the endpoint the UI will poll. // The name shows up as a label in the UI dashboard. settings.AddHealthCheckEndpoint( name: "Production App", uri: "/healthz" ); }) .AddInMemoryStorage(); // stores check history in-process (use SQL for multi-instance) var app = builder.Build(); // The /healthz endpoint uses the UI client's response writer. // UIResponseWriter.WriteHealthCheckUIResponse outputs the exact JSON format // the UI dashboard expects — richer than our custom writer. app.MapHealthChecks("/healthz", new HealthCheckOptions { ResponseWriter = UIResponseWriter.WriteHealthCheckUIResponse }); // Serve the Health Checks UI dashboard at /healthchecks-ui // Restrict this to internal networks in production! app.MapHealthChecksUI(options => { options.UIPath = "/healthchecks-ui"; options.ApiPath = "/healthchecks-api"; }); app.Run(); // ───────────────────────────────────────────────────────────────────────────── // appsettings.json — required for multi-service UI monitoring // (When using AddHealthCheckEndpoint() in code, this section is optional // but useful for environment-specific overrides via environment variables.) /* { "HealthChecksUI": { "HealthChecks": [ { "Name": "Production App", "Uri": "https://myapp.internal/healthz" }, { "Name": "Background Worker", "Uri": "https://worker.internal/healthz" } ], "EvaluationTimeInSeconds": 15, "MaximumHistoryEntriesPerEndpoint": 50 } } */
// You'll see a dashboard with:
// - A green/yellow/red status badge per registered service
// - A timeline chart showing health history
// - Drill-down per check showing description, duration, exception
// The /healthz JSON response looks like:
{
"status": "Healthy",
"totalDuration": "00:00:00.0342100",
"entries": {
"sql-server": {
"status": "Healthy",
"description": "SQL Server is reachable and accepting queries.",
"duration": "00:00:00.0234100",
"exception": null,
"data": { "database": "AppDb", "server": "prod-sql-01.internal" }
},
"payment-api": {
"status": "Healthy",
"description": "Payment API responded with 200.",
"duration": "00:00:00.0108000",
"exception": null,
"data": {}
}
}
}
HTTP Status Codes, Failure Thresholds and the ResultStatusCodes Gotcha
Here's something that surprises almost everyone the first time: by default, ASP.NET Core returns HTTP 200 for both Healthy and Degraded results, and HTTP 503 only for Unhealthy. That means Kubernetes readiness probes — which interpret anything other than 2xx as a failure — won't remove a degraded pod from the load balancer. If 'degraded' for you means 'stop sending traffic here', you need to override this.
You control the HTTP status code mapping via HealthCheckOptions.ResultStatusCodes. It's a dictionary from HealthStatus to HTTP status code. Changing Degraded to map to 503 tells K8s to remove the pod from rotation when any check is degraded.
There's also the FailureStatus concept — set per check registration, not per endpoint. It controls what status gets reported when a check throws an exception or returns Unhealthy. Setting failureStatus: HealthStatus.Degraded on a non-critical check means that check can fail without taking the whole service offline.
These two levers together give you very fine-grained control over how dependency failures propagate to your infrastructure.
// Program.cs — Demonstrating ResultStatusCodes and FailureStatus configuration // This is the production-ready pattern for a Kubernetes-hosted service. var builder = WebApplication.CreateBuilder(args); builder.Services.AddHealthChecks() // Critical dependency: DB down = service is Unhealthy .AddCheck<SqlServerHealthCheck>( name: "sql-server", failureStatus: HealthStatus.Unhealthy, tags: new[] { "ready" } ) // Important but non-critical: cache down = service is Degraded // The service can still serve traffic without Redis, just slower. .AddCheck<RedisCacheHealthCheck>( name: "redis-cache", failureStatus: HealthStatus.Degraded, // downgrade the severity tags: new[] { "ready" } ) // External dependency: payment API down = Degraded (we can queue transactions) .AddCheck<ExternalPaymentApiHealthCheck>( name: "payment-api", failureStatus: HealthStatus.Degraded, tags: new[] { "ready" } ); var app = builder.Build(); // Liveness endpoint — only the self-check. // A liveness failure triggers a pod RESTART. Keep this minimal. // Do NOT include DB or external checks here — a slow DB causes // restart loops, which makes an outage dramatically worse. app.MapHealthChecks("/healthz/live", new HealthCheckOptions { Predicate = _ => false, // run NO registered checks — just return 200 ResponseWriter = HealthCheckResponseWriter.WriteResponse }); // Readiness endpoint — all 'ready' tagged checks. // A readiness failure removes the pod from the load balancer. app.MapHealthChecks("/healthz/ready", new HealthCheckOptions { Predicate = check => check.Tags.Contains("ready"), ResponseWriter = HealthCheckResponseWriter.WriteResponse, // THE KEY CHANGE: map Degraded to 503 so K8s stops sending traffic // when any dependency is struggling, even if not fully failed. ResultStatusCodes = { [HealthStatus.Healthy] = StatusCodes.Status200OK, [HealthStatus.Degraded] = StatusCodes.Status503ServiceUnavailable, [HealthStatus.Unhealthy] = StatusCodes.Status503ServiceUnavailable } }); // Full check endpoint — for ops dashboards and manual inspection. // Returns 200 even when Degraded so the dashboard doesn't show false alarms. app.MapHealthChecks("/healthz", new HealthCheckOptions { ResponseWriter = UIResponseWriter.WriteHealthCheckUIResponse // No ResultStatusCodes override — uses default (Degraded = 200) }); app.Run(); // ───────────────────────────────────────────────────────────────────────────── // RedisCacheHealthCheck.cs — a lightweight example showing Degraded usage using StackExchange.Redis; public class RedisCacheHealthCheck : IHealthCheck { private readonly IConnectionMultiplexer _redis; public RedisCacheHealthCheck(IConnectionMultiplexer redis) { _redis = redis; } public Task<HealthCheckResult> CheckHealthAsync( HealthCheckContext context, CancellationToken cancellationToken = default) { // IsConnected is synchronous — no await needed for a connection check. if (_redis.IsConnected) { return Task.FromResult( HealthCheckResult.Healthy("Redis is connected.") ); } // Return Degraded here — the service can operate without cache, // but performance will degrade. Ops should know. // The FailureStatus on the registration (Degraded) means even // if this throws an exception, it reports as Degraded not Unhealthy. return Task.FromResult( HealthCheckResult.Degraded("Redis is not connected. Operating without cache.") ); } }
// HTTP 503 Service Unavailable <-- Kubernetes now removes this pod from rotation
{
"status": "Degraded",
"totalDuration": "00:00:00.0089234",
"entries": {
"sql-server": {
"status": "Healthy",
"description": "SQL Server is reachable and accepting queries."
},
"redis-cache": {
"status": "Degraded",
"description": "Redis is not connected. Operating without cache."
},
"payment-api": {
"status": "Healthy",
"description": "Payment API responded with 200."
}
}
}
// GET /healthz/live — always 200 while the process is running
// HTTP 200 OK
{
"status": "Healthy",
"totalDuration": "00:00:00.0001234",
"entries": {}
}
| Aspect | Liveness Probe (/healthz/live) | Readiness Probe (/healthz/ready) |
|---|---|---|
| Purpose | Is the process itself alive and not deadlocked? | Is the pod ready to receive user traffic? |
| K8s action on failure | Restarts the pod | Removes pod from load balancer rotation |
| Recommended checks | Self-check only (return 200 if process runs) | DB, cache, external APIs, message queues |
| Risk of including DB checks | High — slow DB causes restart storm | Safe — slow DB just pauses traffic to that pod |
| Typical HTTP success code | 200 OK | 200 OK (Healthy) or 503 (if Degraded = 503) |
| Run frequency in K8s | Every 10-30 seconds | Every 10-30 seconds |
| FailureStatus recommendation | N/A — no checks to configure | Unhealthy for critical, Degraded for non-critical |
🎯 Key Takeaways
- Split liveness and readiness onto separate endpoints with tag filtering — putting database checks on the liveness probe is the #1 cause of Kubernetes restart storms during partial outages.
- Always set a hard timeout inside CheckHealthAsync — a health check that hangs for 100 seconds is worse than one that fails fast and returns Unhealthy after 5 seconds.
- Use ResultStatusCodes to map Degraded to HTTP 503 on the readiness endpoint if you want Kubernetes to stop routing traffic to a pod when any dependency is struggling.
- The FailureStatus per-check registration and the ResultStatusCodes per-endpoint configuration are independent levers — FailureStatus controls what HealthStatus gets reported, ResultStatusCodes controls what HTTP code that status maps to.
⚠ Common Mistakes to Avoid
- ✕Mistake 1: Exposing /healthz publicly without securing it — The endpoint includes server names, database host names, and exception stack traces in its JSON response. An attacker can map your entire infrastructure topology from it. Fix: add
.RequireAuthorization()toMapHealthChecksand back it with an IP-restriction policy, or serve it only on a non-public internal port by binding to a separate address usingapp.MapHealthChecks(...).RequireHost('*.internal'). - ✕Mistake 2: Forgetting that Degraded returns HTTP 200 by default — Developers test their health check, see 'Degraded' in the JSON, assume Kubernetes will react, and are confused when degraded pods keep receiving traffic. The fix is to explicitly configure
ResultStatusCodesinHealthCheckOptionsand mapHealthStatus.DegradedtoStatusCodes.Status503ServiceUnavailableon the readiness endpoint — as shown in the code above. - ✕Mistake 3: Running all health checks on the liveness probe — This is the single most dangerous misconfiguration. When the database is slow, the liveness check times out, Kubernetes restarts every pod simultaneously, and a partial outage becomes a total one. The fix is to either use
Predicate = _ => falseon the liveness endpoint (returning 200 unconditionally while the process is up), or only tag lightweight self-checks with 'live' and never tag external dependency checks with it.
Interview Questions on This Topic
- QWhat's the difference between a liveness probe and a readiness probe, and how do ASP.NET Core health check tags help you implement both correctly?
- QA colleague says 'our health check endpoint returned Degraded so Kubernetes should have removed the pod from the load balancer, but it didn't'. What would you check first, and why?
- QYou have five microservices. The payment service depends on a shared Redis cluster. If Redis goes down, should the payment service report Unhealthy or Degraded? How does your answer change the Kubernetes behaviour, and how do you configure that distinction in ASP.NET Core?
Frequently Asked Questions
How do I add health checks to an existing ASP.NET Core app without breaking anything?
Add builder.Services.AddHealthChecks() in Program.cs and call app.MapHealthChecks('/healthz') before app.Run(). This adds a new endpoint and touches nothing else in your app. You can start with zero checks registered — it just returns 'Healthy' — and add real checks incrementally. There's no risk of breaking existing routes.
What NuGet packages do I need for health checks in ASP.NET Core?
The core health check middleware is built into Microsoft.AspNetCore.Diagnostics.HealthChecks, which ships with the ASP.NET Core SDK — no extra package needed. For the visual UI dashboard you need AspNetCore.HealthChecks.UI, AspNetCore.HealthChecks.UI.Client, and a storage package like AspNetCore.HealthChecks.UI.InMemory.Storage. Community packages like AspNetCore.HealthChecks.SqlServer exist for common dependencies but the custom IHealthCheck approach shown above gives you more control.
Can I use health checks with .NET Framework or only .NET Core?
The built-in Microsoft.Extensions.Diagnostics.HealthChecks middleware is an ASP.NET Core feature introduced in version 2.2 and is not available in .NET Framework. If you're on .NET Framework, you'd need to hand-roll a similar pattern using an HTTP handler or a NuGet package like Polly combined with a custom endpoint. Upgrading to .NET 6+ is the practical path to getting the full health check ecosystem.
Written and reviewed by senior developers with real-world experience across enterprise, startup and open-source projects. Every article on TheCodeForge is written to be clear, accurate and genuinely useful — not just SEO filler.