N+1 Queries Hide in Low CPU — APM Metrics That Expose Them
- APM gives you three telemetry types: metrics (aggregated numbers), traces (single request journeys), and logs (discrete events). All three are necessary for efficient debugging.
- The RED method (Rate, Errors, Duration) is the standard for service-level monitoring. Track p99 latency, not average — averages hide outliers, and outliers are what users notice.
- Distributed tracing shows you where time is spent across service boundaries. Without trace context propagation (W3C Trace-Context), each service starts a new trace and you lose the end-to-end view.
- APM gives you telemetry — metrics (numerical measurements), traces (request journeys), and logs (discrete events) — to find performance problems before users complain
- Core components: RED method (Rate, Errors, Duration) for services; USE method (Utilisation, Saturation, Errors) for resources; distributed tracing for microservices
- Performance cost: OpenTelemetry adds 2-5% CPU overhead when sampled at 1% (adjust sampling rate based on traffic)
- Production trap: Alerting on CPU usage alone — a 90% CPU alert fires while users are happy (pre-computed cache), and misses when a slow database query makes users wait (low CPU, high latency)
- Biggest mistake: No baseline for normal latency — you can't know p99 is bad if you never tracked p50 when the system was healthy
APM Debug Cheat Sheet
Slow API endpoint — can't tell if it's code, database, or network
curl http://apm-collector:14268/api/traces?service=my-api | jq '.data[].spans[] | {operationName, duration}'kubectl exec -it jaeger-query -- wget -O- 'http://localhost:16686/api/traces?service=api&limit=1' | jq '.data[0].spans[].duration'App server CPU at 100%, database CPU normal, latency spiking
async-profiler -d 30 -f /tmp/flamegraph.html <pid>top -H -p <pid>Database CPU at 100%, app server CPU normal, slow queries in log
pg_stat_statements (PostgreSQL) to find top queries by total time; EXPLAIN ANALYZE on the slow querySHOW PROCESSLIST; (MySQL) to see currently running queriesMemory usage growing over time — suspected leak
jmap -dump:live,format=b,file=/tmp/heap.hprof <pid>jcmd <pid> GC.heap_infoLatency p99 spikes but p50 is fine — 1% of requests are very slow
grep 'duration=.*ms' /var/log/api.log | awk '{print $5}' | sort -n | tail -20prometheus_query('histogram_quantile(0.99, rate(http_request_duration_seconds_bucket[5m]))')Production Incident
SELECT * FROM products LEFT JOIN reviews ON products.id = reviews.product_id WHERE products.id = ?. Added Review as an embedded collection on the Product object using the ORM's eager loading feature. Added an APM custom span around the database query to measure its contribution to total latency. Deployed a migration to add an index on reviews.product_id. After the fix, page latency dropped to 150ms even at peak traffic, and database CPU dropped to 25%.Production Debug GuideSymptom → Action mapping for common performance failures
Every time a user clicks 'Buy Now' and nothing happens, a customer is lost — possibly forever. Studies from Google and Akamai consistently show that a 100ms increase in page load time can drop conversion rates by 1%. At scale, that's not a UX annoyance; it's a revenue crisis. Yet most engineering teams only find out their app is slow after a flood of support tickets or, worse, a trending tweet. APM exists to flip that script.
The core problem APM solves is invisibility. Code runs inside servers you can't touch, across networks you don't control, on databases holding millions of rows. Without instrumentation, you're flying blind. A query that took 50ms in staging suddenly takes 4 seconds in production under real load — and you have no idea why. APM gives you the telemetry — metrics, traces, logs — to pinpoint the exact line of code, database call, or third-party API dragging your app down.
By the end you'll understand the three pillars of observability, know exactly which metrics to instrument first, set up Prometheus-based collection, configure meaningful alert thresholds (not just 'CPU > 90%'), and read a distributed trace to find hidden latency.
The Three Pillars — Metrics, Traces, Logs
APM rests on three types of telemetry data. Each answers a different question, and you need all three to debug effectively.
Metrics are numerical measurements over time — request rate, error rate, latency percentiles, CPU usage. They answer 'what is happening?' and are cheap to store and query. Metrics are aggregated (averages, sums, counts) and lose individual request details.
Traces track a single request's journey across services — every database call, RPC, and cache hit. They answer 'why is this specific request slow?' A trace is a tree of spans, each representing a unit of work. Traces are sampled (1-10% of requests) because storing every trace is expensive.
Logs are discrete timestamped events — 'User 123 logged in', 'Payment failed: insufficient funds'. They answer 'what happened at this exact moment?' Logs are high-cardinality but unstructured; parsing them at scale requires indexing.
The relationship: metrics tell you something is wrong (p99 latency spiked). Traces tell you where (database query slow). Logs tell you why (connection pool exhausted). Without all three, you're missing context.
package io.thecodeforge.apm; import io.opentelemetry.api.OpenTelemetry; import io.opentelemetry.api.common.Attributes; import io.opentelemetry.api.trace.Span; import io.opentelemetry.api.trace.Tracer; import io.opentelemetry.context.Scope; import io.opentelemetry.exporter.otlp.trace.OtlpGrpcSpanExporter; import io.opentelemetry.sdk.OpenTelemetrySdk; import io.opentelemetry.sdk.resources.Resource; import io.opentelemetry.sdk.trace.SdkTracerProvider; import io.opentelemetry.sdk.trace.export.BatchSpanProcessor; import io.opentelemetry.semconv.resource.attributes.ResourceAttributes; import java.util.concurrent.TimeUnit; /** * Production OpenTelemetry instrumentation for a Java service. * * This adds distributed tracing so you can see exactly where latency * is hiding — database calls, HTTP requests, or your own code. */ public class OpenTelemetryInstrumentation { private final Tracer tracer; public OpenTelemetryInstrumentation(String serviceName, String otlpEndpoint) { // Configure OTLP exporter — sends traces to collector (Jaeger, Tempo, etc.) OtlpGrpcSpanExporter spanExporter = OtlpGrpcSpanExporter.builder() .setEndpoint(otlpEndpoint) // e.g., "http://jaeger-collector:14250" .setTimeout(30, TimeUnit.SECONDS) .build(); Resource serviceResource = Resource.getDefault().toBuilder() .put(ResourceAttributes.SERVICE_NAME, serviceName) .put(ResourceAttributes.SERVICE_VERSION, "1.2.3") .build(); SdkTracerProvider tracerProvider = SdkTracerProvider.builder() .addSpanProcessor(BatchSpanProcessor.builder(spanExporter).build()) .setResource(serviceResource) .build(); OpenTelemetry openTelemetry = OpenTelemetrySdk.builder() .setTracerProvider(tracerProvider) .buildAndRegisterGlobal(); this.tracer = openTelemetry.getTracer(serviceName, "1.0.0"); } /** * Example: instrument a database query with custom span. * * This creates a child span under the current request trace. * In APM UI, you'll see exactly how long the database call took * and can correlate it with other spans in the same trace. */ public void executeDatabaseQuery(String query) { Span dbSpan = tracer.spanBuilder("DB Query") .setAttribute("db.statement", query) .setAttribute("db.system", "postgresql") .startSpan(); try (Scope scope = dbSpan.makeCurrent()) { // Execute actual database query here // connection.execute(query); System.out.println("Executing: " + query); } catch (Exception e) { dbSpan.recordException(e); dbSpan.setAttribute("error", true); throw e; } finally { dbSpan.end(); // Duration recorded here — visible in trace UI } } /** * Example: instrument an HTTP call to an external API. */ public void callExternalApi(String url) { Span httpSpan = tracer.spanBuilder("HTTP " + url) .setAttribute("http.url", url) .setAttribute("http.method", "GET") .startSpan(); try (Scope scope = httpSpan.makeCurrent()) { // Make the actual HTTP call // httpClient.get(url); System.out.println("Calling: " + url); } catch (Exception e) { httpSpan.recordException(e); httpSpan.setAttribute("error", true); throw e; } finally { httpSpan.end(); } } }
- Metrics: aggregated numbers (rate, errors, duration). Cheap to store, but lose individual request detail.
- Traces: single request journey across services. Expensive to store (sampled at 1-10%). Show exact latency breakdown.
- Logs: discrete events with high cardinality. Unstructured, need indexing for search. Best for debugging 'why' after trace identifies 'where'.
- OpenTelemetry: vendor-neutral API for generating telemetry; send to any backend (Jaeger, Prometheus, Datadog, New Relic).
- Rule: Start with RED metrics (Rate, Errors, Duration) for every service, then add traces for slow endpoints, then structured logs for errors.
The RED Method — Rate, Errors, Duration
The RED method (Rate, Errors, Duration) is the standard for service-level monitoring. For every service, track these three metrics, and you'll know instantly whether users are happy.
Rate is the number of requests per second. A sudden drop in rate (traffic falling off a cliff) often means the service is unavailable or rejecting requests. A sudden spike might indicate a DDoS attack or misconfigured client.
Errors is the proportion of requests that failed — HTTP 5xx, thrown exceptions, timeout, or any response that doesn't meet your SLO. Track error rate both as a raw count and as a percentage of total requests. A slow rise in error rate often indicates resource exhaustion (database connections, memory).
Duration is how long requests take, measured as latency percentiles — p50 (median), p95, p99. p99 is what matters for user experience: 1% of requests are slower than this. Average latency hides outliers: a service could have 1000 requests at 1ms and 1 request at 1000ms, average 2ms, but 0.1% of users had a terrible experience.
Instrument duration with a histogram: bucket boundaries at 1ms, 5ms, 10ms, 50ms, 100ms, 250ms, 500ms, 1000ms, 2500ms, 5000ms, 10000ms. This gives you percentiles without storing every latency value.
Common RED mistakes: measuring only average latency (hides p99 problems), not tracking errors by type (500 internal server error vs 404 not found are very different), and not breaking down rate by endpoint (a drop in /health is fine; a drop in /checkout is a crisis).
package io.thecodeforge.apm; import io.prometheus.client.CollectorRegistry; import io.prometheus.client.Counter; import io.prometheus.client.Histogram; import io.prometheus.client.exporter.HTTPServer; import java.io.IOException; /** * Production RED metrics (Rate, Errors, Duration) using Prometheus. * * These three metrics are enough to know if a service is healthy * from the user's perspective — without looking at CPU or memory. */ public class REDMetrics { // ─── RATE: Total requests per endpoint ─────────────────────────────────── // Counter represents requests total. Rate = increase over time. private static final Counter requestTotal = Counter.build() .name("http_requests_total") .labelNames("method", "endpoint", "status") .help("Total HTTP requests") .register(); // ─── ERRORS: Error counter (subset of requestTotal) ────────────────────── // Track errors separately for easier alerting, but also derived from requestTotal private static final Counter errorTotal = Counter.build() .name("http_errors_total") .labelNames("method", "endpoint", "error_type") .help("Total HTTP errors (status >= 500 or exception)") .register(); // ─── DURATION: Request latency histogram ───────────────────────────────── // Buckets chosen to capture p50 (5-10ms), p95 (50-100ms), p99 (250-500ms) // Adjust buckets based on your service's typical latency. private static final Histogram requestDuration = Histogram.build() .name("http_request_duration_seconds") .labelNames("method", "endpoint") .help("HTTP request latency in seconds") .buckets(0.001, 0.005, 0.01, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10) .register(); /** * Record metrics for a completed request. * Call this in your API framework's response filter/middleware. */ public static void recordRequest(String method, String endpoint, int statusCode, long durationMs) { String status = String.valueOf(statusCode); requestTotal.labels(method, endpoint, status).inc(); if (statusCode >= 500) { errorTotal.labels(method, endpoint, "http_5xx").inc(); } requestDuration.labels(method, endpoint).observe(durationMs / 1000.0); } /** * Record an exception that wasn't caught by normal status code handling. */ public static void recordException(String method, String endpoint, String exceptionType) { errorTotal.labels(method, endpoint, exceptionType).inc(); } /** * Start Prometheus metrics endpoint on port 8081 (separate from app port). * Scraped by Prometheus every 15 seconds. */ public static void startMetricsServer() throws IOException { HTTPServer server = new HTTPServer(8081); System.out.println("Prometheus metrics available at http://localhost:8081/metrics"); } }
summary quantiles if you need exact percentiles, but histograms are cheaper and recommended for production.Distributed Tracing — Following a Request Across Services
In a monolith, you can find a slow function with a profiler. In microservices, a single request might pass through API gateway → auth service → order service → payment service → inventory service. A 2-second latency could be 100ms in each of 20 services, or 1.9 seconds in a single database query. Distributed tracing tells you which.
A trace is a tree of spans. The root span covers the entire request from client to final response. Child spans cover sub-operations: HTTP calls to downstream services, database queries, cache lookups, even internal function calls.
Key fields: trace ID (same across all spans in a request), span ID (unique per operation), parent span ID (links child to parent), name (operation name: 'GET /products', 'SELECT * FROM orders'), start and end timestamps (duration = end - start), attributes (HTTP method, status code, DB statement), events (logs within a span: 'cache miss', 'retry attempt').
Implementation: instrument your HTTP client and server libraries to automatically propagate trace context via headers (W3C Trace-Context standard: traceparent, tracestate). Use OpenTelemetry auto-instrumentation agents for Java, Python, Node.js, Go. Manual instrumentation for business-critical spans.
Common tracing mistakes: not propagating trace context across asynchronous boundaries (message queues, background threads) — resulting in broken traces; sampling too aggressively (1% of 1% leaves 0.01% of requests traced); not storing traces long enough (7 days minimum for debugging weekly patterns); and not linking traces to logs (add trace ID to every log line).
package io.thecodeforge.apm; import io.opentelemetry.api.OpenTelemetry; import io.opentelemetry.api.trace.Span; import io.opentelemetry.api.trace.Tracer; import io.opentelemetry.context.Context; import io.opentelemetry.context.propagation.TextMapGetter; import io.opentelemetry.context.propagation.TextMapSetter; import java.net.http.HttpClient; import java.net.http.HttpRequest; import java.net.http.HttpResponse; /** * Distributed tracing with context propagation across service boundaries. * * The key challenge in distributed tracing is propagating the trace context * from caller to callee. OpenTelemetry's propagator handles automatically * when using instrumented clients. For custom protocols (message queues), * inject the context manually. */ public class DistributedTracing { private final Tracer tracer; private final OpenTelemetry openTelemetry; private final HttpClient httpClient; public DistributedTracing(OpenTelemetry openTelemetry) { this.openTelemetry = openTelemetry; this.tracer = openTelemetry.getTracer("api-service"); this.httpClient = HttpClient.newHttpClient(); } /** * Example: calling a downstream service with automatic trace propagation. * * When using OpenTelemetry-instrumented HTTP client, the trace context * is automatically injected into the `traceparent` header. * The downstream service extracts it and creates a child span. */ public String callOrderService(String orderId) throws Exception { // Start a child span for this HTTP call Span httpSpan = tracer.spanBuilder("HTTP POST /orders") .setAttribute("order.id", orderId) .startSpan(); try (var scope = httpSpan.makeCurrent()) { HttpRequest request = HttpRequest.newBuilder() .uri(java.net.URI.create("http://order-service/api/orders")) .header("Content-Type", "application/json") .POST(HttpRequest.BodyPublishers.ofString("{\"id\":\"" + orderId + "\"}")) .build(); // If using automatic instrumentation, the `traceparent` header // is added automatically. If manual, inject via: // openTelemetry.getPropagators().getTextMapPropagator() // .inject(Context.current(), request, (r, k, v) -> r.headers().put(k, v)); HttpResponse<String> response = httpClient.send(request, HttpResponse.BodyHandlers.ofString()); httpSpan.setAttribute("http.status_code", response.statusCode()); return response.body(); } catch (Exception e) { httpSpan.recordException(e); throw e; } finally { httpSpan.end(); } } /** * Extract trace context from incoming request headers. * OpenTelemetry's server instrumentation does this automatically. * * This is how a service knows it's part of an existing trace * rather than starting a new one. */ public void handleIncomingRequest(String traceparentHeader) { // Extract context from headers (auto-instrumented frameworks do this) TextMapGetter<MapHeaders> getter = new TextMapGetter<>() { @Override public String get(MapHeaders carrier, String key) { return carrier.get(key); } @Override public Iterable<String> keys(MapHeaders carrier) { return carrier.keys(); } }; Context extractedContext = openTelemetry.getPropagators().getTextMapPropagator() .extract(Context.current(), new MapHeaders(traceparentHeader), getter); // Start a span as child of extracted context Span span = tracer.spanBuilder("handle request") .setParent(extractedContext) .startSpan(); try (var scope = span.makeCurrent()) { // Process the request here System.out.println("Processing request with trace ID: " + span.getSpanContext().getTraceId()); } finally { span.end(); } } // Helper class for header propagation example static class MapHeaders { private final java.util.Map<String, String> headers = new java.util.HashMap<>(); MapHeaders(String traceparent) { headers.put("traceparent", traceparent); } String get(String key) { return headers.get(key); } Iterable<String> keys() { return headers.keySet(); } } }
traceparent header) is supported by all major tracing backends. Use it, not proprietary formats.| Service Type | Rate (R) | Errors (E) | Duration (D) | Key Alert |
|---|---|---|---|---|
| Web API (user-facing) | Requests/sec per endpoint | HTTP 5xx rate, exception rate | p99 latency per endpoint | p99 > 500ms for 5 minutes |
| Background Worker | Jobs processed/sec | Failed job rate | Job age (time from enqueue to completion) | Job age > 5 minutes |
| Database | Queries/sec | Deadlock rate, connection errors | p99 query latency | p99 > 100ms (if indexed properly) |
| Cache (Redis, Memcached) | Operations/sec (GET, SET) | Error rate, miss rate | p99 operation latency | p99 > 5ms or miss rate > 20% |
| Message Queue (Kafka) | Messages published/sec, consumed/sec | Consumer lag (offset difference) | Produce latency, consume latency | Lag > 10,000 messages for 10 minutes |
| Third-party API | Calls/sec | HTTP 5xx, timeout rate | p99 response time | Error rate > 5% or p99 > 2 seconds |
🎯 Key Takeaways
- APM gives you three telemetry types: metrics (aggregated numbers), traces (single request journeys), and logs (discrete events). All three are necessary for efficient debugging.
- The RED method (Rate, Errors, Duration) is the standard for service-level monitoring. Track p99 latency, not average — averages hide outliers, and outliers are what users notice.
- Distributed tracing shows you where time is spent across service boundaries. Without trace context propagation (W3C Trace-Context), each service starts a new trace and you lose the end-to-end view.
- Alert on p99 latency and error rate, not CPU usage. A 90% CPU alert fires while users are happy (pre-computed cache) and misses when a slow database query makes users wait (low CPU, high latency).
- Tail-based sampling captures all slow and failed requests without storing every successful trace. Sample 100% of errors, 100% of requests > 500ms, and 1% of normal requests.
⚠ Common Mistakes to Avoid
Interview Questions on This Topic
- QExplain the difference between p99 latency and average latency — and why p99 matters more for user experience.Mid-levelReveal
- QWalk me through how you would debug a sudden increase in p99 latency from 200ms to 3 seconds in a microservices architecture with 10 services.SeniorReveal
- QWhat is the difference between metrics, traces, and logs? Give a scenario where you need all three to debug an issue.Mid-levelReveal
- QHow would you design an alerting strategy for a new microservice? What metrics would you alert on, and what thresholds would you use?SeniorReveal
Frequently Asked Questions
What's the difference between APM and Observability?
APM (Application Performance Monitoring) is a product category — tools like Datadog, New Relic, Dynatrace that collect metrics, traces, and logs. Observability is a property of a system: how well you can understand its internal state from its external outputs (telemetry). You achieve observability by instrumenting your code with metrics, traces, and logs. APM tools are one way to achieve observability. OpenTelemetry (vendor-neutral) is the current standard for instrumentation, replacing vendor-specific agents.
How much overhead does APM instrumentation add?
OpenTelemetry adds 2-5% CPU overhead at 1% sampling rate for traces. Metrics histograms add negligible overhead (~0.5% CPU). Logging at INFO level adds ~1% CPU. The biggest overhead is trace export (network, serialisation). Always sample traces (1-10% for high-traffic services). Use async span processors (non-blocking). For extremely latency-sensitive systems (<50us p99), consider eBPF-based monitoring or kernel tracing instead of code instrumentation.
How long should I store metrics, traces, and logs?
Metrics: 30-90 days for aggregates, 7 days for raw data. Use downsampling: keep 1-minute resolution for 7 days, 5-minute for 30 days, 1-hour for 90 days. Traces: 7-14 days for debugging weekly patterns; errors and slow requests for 30 days. Logs: 30 days for general, 90 days for compliance (GDPR, PCI). Use tiered storage: hot (SSD) for 7 days, warm (SSD/HDD) for 30 days, cold (S3) for older data. OpenTelemetry collector supports routing traces to different backends based on attributes (e.g., errors → long-term).
What is tail-based sampling and when should I use it?
Head-based sampling decides at the start of the request (e.g., random 1%). Tail-based sampling makes the decision after the request completes. The OpenTelemetry collector buffers traces for a few seconds, then decides to keep or drop based on criteria: if duration > 500ms, keep; if error occurred, keep; otherwise, sample 1%. This ensures you have traces for all slow and failed requests (the ones you actually want to debug) without storing every successful 50ms request. Use tail-based sampling for high-traffic services (> 100 req/sec) where storing 100% of traces is expensive but you need to debug rare issues. The trade-off is added latency (traces held in buffer) and collector memory usage.
Developer and founder of TheCodeForge. I built this site because I was tired of tutorials that explain what to type without explaining why it works. Every article here is written to make concepts actually click.