Senior 10 min · May 23, 2026

Distributed Tracing in Spring Boot 3.x: Micrometer Tracing + Zipkin

Q: Is Spring Cloud Sleuth still supported for Spring Boot 2.x projects?

Spring Cloud Sleuth reached end-of-life and is no longer actively maintained. It works on Spring Boot 2.x but will not receive bug fixes or security patches. Projects still on Spring Boot 2.x should plan migration to Spring Boot 3.x and Micrometer Tracing. Spring Cloud 2022.x and later do not include Sleuth.

Q: Can I use Jaeger or Grafana Tempo instead of Zipkin?

Yes. Zipkin is one backend option. For Jaeger, use the micrometer-tracing-bridge-otel bridge with the OpenTelemetry Jaeger exporter. For Grafana Tempo, use the OTEL bridge with the OTLP exporter (opentelemetry-exporter-otlp). For cloud-native APM tools (Datadog, New Relic, Dynatrace), use their respective OTEL exporters or agents. The Micrometer Tracing API stays the same regardless of backend.

Q: How much latency does distributed tracing add to each request?

Span creation (start/end) adds sub-microsecond overhead per operation. MDC population is also sub-microsecond. The async reporter batches spans and sends them asynchronously, adding no meaningful latency to request processing under normal conditions. The measurable overhead is typically less than 0.1ms per request at 10% sampling. At 100% sampling, the overhead increases slightly due to more span objects created, but remains well under 1ms for typical applications.

Q: What happens to traces if Zipkin is unavailable?

The zipkin-reporter-brave uses an async OkHttp reporter with an in-memory queue. If Zipkin is unreachable, spans accumulate in the queue up to the configured buffer size (default: 16MB). Once the buffer is full, new spans are dropped silently — the application continues functioning normally, just without trace export. Zipkin unavailability does not cause application errors or degraded performance (by design — observability is not on the critical path).

Q: How do I add custom business data to spans for easier searching in Zipkin?

Use Observation.createNotStarted().lowCardinalityKeyValue('key', 'value') for low-cardinality data (method types, currencies, status codes) and highCardinalityKeyValue for unique identifiers (orderId, userId). Alternatively, use the Tracer directly: tracer.currentSpan().tag('payment.orderId', orderId). Low-cardinality tags are indexed for search in Zipkin; high-cardinality tags are stored but may not be indexed depending on your storage backend configuration.

Q: How do I test that trace propagation works correctly in unit and integration tests?

Spring Boot provides @AutoConfigureObservability for integration tests. Use MockMvc or WebTestClient with B3 headers manually set to verify the service creates a child span. Check MDC values in test log output. For verifying Zipkin export, use Zipkin's Docker image in test containers or use a mock Zipkin server (e.g., WireMock recording /api/v2/spans) and verify spans are posted with the correct parent-child relationships.

Q: What is the Observation API and how does it relate to tracing?

Observation is the unified instrumentation API in Micrometer that combines metrics, tracing, and log correlation into a single abstraction. When you create an Observation, Micrometer automatically creates a timer metric (for request duration histograms), a trace span (if Micrometer Tracing is on the classpath), and populates MDC for log correlation. Spring's own framework code (MVC handlers, WebClient, Feign, Kafka) uses the Observation API internally, which is why all of these get automatic tracing and metrics without additional configuration.

Master distributed tracing in Spring Boot 3.

Naren · Founder

Plain-English first. Then code. Then the interview question.

About

● Production Incident 🔎 Debug Guide ⚙ Triage Commands

⚡Quick Answer

Spring Boot 3.x uses Micrometer Tracing (not Spring Cloud Sleuth) with Zipkin via micrometer-tracing-bridge-brave or otel
TraceId and spanId are automatically propagated across Feign, WebClient, RestTemplate, and Kafka without code changes
Add %X{traceId} and %X{spanId} to your logback pattern to correlate logs with Zipkin traces
Set management.tracing.sampling.probability=1.0 for full sampling in dev; use 0.1 (10%) or lower in high-throughput production
Choose B3 propagation (default, Zipkin-native) or W3C TraceContext (OTEL standard) based on your ecosystem

✦ Definition~90s read

What is Distributed Tracing in Spring Boot 3.x?

Micrometer Tracing is the distributed tracing component of the Micrometer observability library, introduced as the successor to Spring Cloud Sleuth for Spring Boot 3.x. It provides a Tracer abstraction with two bridge implementations: micrometer-tracing-bridge-brave (wrapping Zipkin's Brave library) and micrometer-tracing-bridge-otel (wrapping OpenTelemetry SDK).

★

Imagine a restaurant where a customer order travels through the host, waiter, kitchen, and bar before arriving at the table.

The bridge pattern means application code uses Micrometer's Tracer/Span API while the bridge handles serialization format, propagation headers, and exporter protocol.

A trace represents a complete request journey across all participating services, identified by a traceId (128-bit hex string). Each unit of work within the trace is a span, identified by a spanId (64-bit hex string) and linked to its parent span via a parentSpanId.

Spans carry a start time, duration, service name, operation name, status (OK or ERROR), and arbitrary key-value tags. The collection of spans sharing a traceId forms a tree that Zipkin renders as a waterfall diagram.

Context propagation is the mechanism that threads a traceId through service boundaries. When service A calls service B over HTTP, the outgoing request includes B3 headers (X-B3-TraceId, X-B3-SpanId, X-B3-ParentSpanId, X-B3-Sampled) or W3C TraceContext headers (traceparent, tracestate).

Service B extracts these headers and creates a child span that continues the same trace. For message-based communication (Kafka, RabbitMQ), trace context is propagated via message headers using the same mechanism. Micrometer Tracing instruments all major Spring communication abstractions — WebClient, RestTemplate, Feign, Kafka — to perform this propagation automatically.

Plain-English First

Imagine a restaurant where a customer order travels through the host, waiter, kitchen, and bar before arriving at the table. Distributed tracing gives every person handling that order the same ticket number (traceId) and their own step number (spanId). Zipkin is the manager's dashboard that shows the full journey of every order — where it spent the most time, where it failed, and who handled it at each step — so you can find the bottleneck without interrogating every staff member individually.

A user reports that their checkout takes 8 seconds. Your payment microservice logs show no slow queries. Order service looks fine. Inventory service appears healthy. The request touches seven services — which one is actually slow? Without distributed tracing, answering this question means correlating timestamps across seven separate log files, praying that clocks are synchronized, and manually reconstructing the call graph. This is the problem that distributed tracing was built to solve.

Spring Cloud Sleuth was the go-to distributed tracing solution for Spring Boot 2.x. It automatically injected traceId and spanId into MDC, propagated trace context over HTTP headers, and exported spans to Zipkin with minimal configuration. But Sleuth is end-of-life as of Spring Boot 3.0 — it does not support the new Spring Framework 6 and Jakarta EE 10 baseline. The replacement is Micrometer Tracing, a vendor-neutral tracing facade that ships as part of the Micrometer observability stack.

Micrometer Tracing follows the same philosophy as Micrometer Metrics: one API, multiple backends. You write your code against the Micrometer Tracing API, and a bridge library translates to either OpenTelemetry (OTEL) or Brave (the Zipkin tracing library). This separation means you can switch from Zipkin to Jaeger, Grafana Tempo, or Honeycomb without changing application code — only the bridge dependency changes.

Auto-instrumentation in Spring Boot 3.x is comprehensive. Feign clients, WebClient, RestTemplate, Kafka producers/consumers, Spring MVC controllers, Spring WebFlux handlers, JDBC queries (via datasource-micrometer), and scheduled tasks all gain automatic span creation and context propagation with zero code changes. TraceId and spanId appear in MDC automatically, enabling log correlation without custom filters.

Sampling is the critical operational knob. Tracing every request at 1.0 probability generates enormous data volumes in production and adds measurable latency overhead from span serialization and network calls to the Zipkin collector. Configuring an appropriate sampling rate — and understanding tail-based vs head-based sampling — is essential knowledge for running tracing in production without impacting service performance or collector infrastructure costs.

This guide covers the complete Micrometer Tracing + Zipkin stack for Spring Boot 3.x: dependency configuration, MDC log integration, cross-service propagation mechanics, Feign and Kafka tracing, sampling strategies, Zipkin UI navigation, B3 vs W3C propagation, and the failure modes that trip up teams migrating from Sleuth.

Dependencies and Configuration: Migrating from Sleuth to Micrometer Tracing

The first step for any Spring Boot 3.x tracing setup is understanding the new dependency structure. Spring Cloud Sleuth is not compatible with Spring Boot 3.x — it depends on Spring Boot 2.x and Spring Framework 5. Any Sleuth dependency in a Spring Boot 3 project will cause classpath conflicts.

Micrometer Tracing is the direct replacement. The core tracing API is in micrometer-tracing. The backend bridge is either micrometer-tracing-bridge-brave (for Zipkin/Brave) or micrometer-tracing-bridge-otel (for OpenTelemetry). For Zipkin reporting, add zipkin-reporter-brave. Spring Boot 3.x provides auto-configuration for all of these through spring-boot-actuator — you just need the bridge and reporter on the classpath.

The spring-boot-starter-actuator must be present for the auto-configuration to activate. Spring Boot 3.x detects the tracing bridge on the classpath and automatically configures a Tracer, ObservationRegistry, and span export pipeline. The configuration properties moved from spring.sleuth. to management.tracing. and management.zipkin.tracing.*.

For projects using Spring Cloud's bill-of-materials, spring-cloud-starter-sleuth should be replaced with the Micrometer Tracing starters. The Spring Cloud 2022.x (Kilburn) and later releases do not include Sleuth — Micrometer Tracing is the standard tracing mechanism.

Micrometer Observations are the unified abstraction in Spring Boot 3.x. An Observation wraps a unit of work and automatically creates metrics (via Micrometer), traces (via Micrometer Tracing), and log correlation (via MDC). Spring's own framework instrumentation (MVC handlers, WebClient, Feign, Kafka) uses Observations internally, which is why auto-instrumentation works without code changes. Custom instrumentation should also use the Observation API rather than directly creating Spans for consistency.

Do Not Mix Sleuth and Micrometer Tracing

spring-cloud-starter-sleuth and micrometer-tracing-bridge-brave cannot coexist in the same application. If you are migrating a Spring Boot 2.x application to 3.x, remove all Sleuth dependencies first and replace them with the Micrometer Tracing equivalents.

Production Insight

The Spring Boot 3.x auto-configuration for Micrometer Tracing activates only when spring-boot-starter-actuator is present along with a tracing bridge. If traces are not appearing, check that actuator is not excluded from the classpath in production builds.

Key Takeaway

Spring Boot 3.x tracing requires micrometer-tracing-bridge-brave + zipkin-reporter-brave (or otel equivalents) alongside spring-boot-starter-actuator — remove Spring Cloud Sleuth entirely as it is incompatible with Spring Boot 3.x.

MDC Log Correlation: Linking Logs to Traces

Distributed tracing only delivers its full value when traces are correlated with logs. A Zipkin trace tells you which service was slow; the correlated logs tell you exactly what was happening inside that service at that moment. Micrometer Tracing automatically populates MDC (Mapped Diagnostic Context) with traceId, spanId, parentId, and sampled so any SLF4J-compatible logger can include them in log output.

The MDC keys in Spring Boot 3.x with Micrometer Tracing are: traceId (full 128-bit or 64-bit hex trace ID), spanId (current span ID), parentId (parent span ID, absent for root spans), and sampled (whether this trace is being exported to Zipkin). These replace Sleuth's X-B3-TraceId, X-B3-SpanId, X-B3-ParentSpanId MDC keys.

Configuring the log pattern is straightforward in Spring Boot: set logging.pattern.level or use a logback-spring.xml with %X{traceId} in the pattern. The key insight is that the MDC is populated for the duration of a span — if your code runs outside a span context (e.g., in a background thread that was not explicitly propagated), MDC values will be empty.

Thread propagation is the main MDC pitfall. When using @Async methods, CompletableFuture, or custom ExecutorService, MDC is not automatically copied to child threads. For @Async, the ContextPropagatingTaskDecorator from Micrometer Tracing wraps the task to copy the tracing context (and therefore MDC values) to the async thread. Configure this on your async executor.

For reactive WebFlux applications, MDC does not work with the traditional thread-local model because reactive chains may switch threads multiple times. Micrometer Tracing with WebFlux uses Reactor's Context for propagation. Spring Boot 3.x includes a Reactor Context-based MDC adapter that populates MDC when a subscriber processes a signal, but custom operators or blocking calls that escape the reactive context will not have MDC populated. Log statements inside Mono/Flux operators will have correct traceId values; log statements in manually spawned threads will not.

MDC Is Thread-Local: Async Threads Lose Trace Context

Code running in @Async methods, CompletableFuture.runAsync(), or custom thread pools will have empty MDC (no traceId in logs) unless ContextPropagatingTaskDecorator is configured on the executor. This is one of the most common log correlation failures.

Production Insight

In JSON-structured logging (Logstash encoder), MDC fields are automatically included in every log event as top-level JSON fields. This makes querying logs by traceId in Elasticsearch/Kibana or CloudWatch Logs Insights trivial: just filter on traceId = '0123abc...' to find all log events for a specific trace.

Key Takeaway

MDC traceId/spanId correlation bridges logs and traces — but it only works within the span's thread context. Always configure ContextPropagatingTaskDecorator for async executors and verify MDC propagation for each async mechanism your service uses.

Cross-Service Propagation: Feign, WebClient, and Kafka

Micrometer Tracing's value comes from seamless context propagation across service boundaries. The mechanism is different for each transport, and understanding how it works for each helps diagnose propagation failures.

For HTTP clients, Micrometer Tracing provides interceptors and filters that inject B3 or W3C headers into outgoing requests and extract them from incoming requests. For RestTemplate, configure a RestTemplateObservationInterceptor. Spring Boot 3.x auto-configures this when RestTemplate is created via RestTemplateBuilder. For WebClient, the auto-configured WebClientCustomizer adds a ClientRequestObservationConvention that instruments each request. For Feign clients, spring-cloud-starter-openfeign auto-configures tracing when Micrometer Tracing is on the classpath.

For Kafka, tracing requires spring-kafka 3.0+ with observation explicitly enabled. The KafkaTemplate is instrumented to add trace headers (b3 or traceparent) to ProducerRecords. On the consumer side, the KafkaListenerContainerFactory must have observation enabled so it extracts trace headers from ConsumerRecords and creates child spans. This creates a trace graph that spans across Kafka topics, showing the producer send as the parent of the consumer process.

For RabbitMQ (Spring AMQP), the RabbitTemplate and SimpleMessageListenerContainer are similarly instrumented via Micrometer Observations in Spring AMQP 3.0+. The same enablement pattern applies.

A common scenario is a gateway or API proxy that strips custom headers. If your Nginx, Envoy, or AWS ALB strips X-B3-* headers, downstream services receive requests without trace context and create new root traces. Configure your proxies to allow-list B3 or W3C propagation headers. For Envoy-based service meshes (Istio), Envoy propagates B3 headers but only to the next hop — your application code must still propagate them from incoming request to outgoing calls. Micrometer Tracing handles this automatically for instrumented clients.

Never Use new RestTemplate() in Spring Boot 3.x

Creating RestTemplate with new RestTemplate() bypasses Spring Boot's auto-configuration and will not have the Micrometer Tracing interceptor. Always inject RestTemplateBuilder and call builder.build() to get a properly instrumented instance.

Production Insight

Test end-to-end trace propagation through each transport type (HTTP, Kafka, RabbitMQ) by making a single request that traverses all of them, then looking up the traceId in Zipkin. If you see disconnected traces for the same traceId, you have a propagation gap at that transport boundary.

Key Takeaway

Context propagation is automatic for RestTemplateBuilder-created clients, WebClient.Builder-created clients, Feign, and Kafka (with observation-enabled=true) — but using new RestTemplate() or un-configured Kafka factories silently breaks propagation.

Sampling Strategies: Balancing Observability and Overhead

Sampling determines what fraction of requests generate traces. At management.tracing.sampling.probability=1.0, every request is traced. At 0.1, one in ten requests is traced. At 0.01, one in one hundred. The probability must balance observability needs against three costs: CPU overhead of span creation, network overhead of exporting spans to Zipkin, and storage costs in the Zipkin backend.

Head-based sampling (what Micrometer Tracing implements by default) makes the sampling decision at the trace root — typically the first service to receive a request. This decision is propagated downstream via the X-B3-Sampled header (B3) or the sampled flag in the traceparent header (W3C). All downstream services honor the sampling decision from the root, ensuring complete traces are either fully sampled or fully dropped — never partial.

The right sampling rate depends on request volume. For a service handling 1,000 RPS, a 10% sampling rate gives 100 traces per second — plenty for debugging. At 10,000 RPS, even 1% (100 traces/sec) may stress your Zipkin collector. At 100 RPS, consider 100% sampling for complete visibility. Rule of thumb: target 10-100 traces per second entering your Zipkin collector from the fleet as a whole.

Custom sampler beans give finer control. You can implement a Sampler that samples 100% of requests containing an X-Debug-Trace header (for on-demand tracing of specific users), 100% of requests that result in errors, and 5% of normal requests. This tail-informed head-based hybrid is a common production pattern: guaranteed error trace capture with reduced load from healthy request traffic.

For latency-sensitive services, measure tracing overhead empirically. Span creation is typically sub-microsecond. The cost comes from span export — the async OkHttp reporter in zipkin-reporter-brave buffers spans and sends them in batches, adding minimal latency to request processing. However, if the Zipkin collector is unavailable and the buffer fills, the reporter will drop spans rather than block — this is the correct behavior for a non-critical observability path.

Rate limiting samplers (not built-in but easy to implement) provide a fixed number of traces per second regardless of traffic volume. At 1,000 RPS, a 10-traces-per-second rate limiter provides 1% sampling naturally; at 100 RPS, it provides 10%. This is more predictable than probability-based sampling for resource planning.

Default Sampling Rate Is 10% in Spring Boot 3.x

Unlike Sleuth which defaulted to 10% sampling, Spring Boot 3.x also defaults to 10% (management.tracing.sampling.probability=0.1). In a low-traffic dev environment, you may trigger requests 5 times and never see a trace in Zipkin. Set 1.0 in dev environments.

Production Insight

Use 100% sampling for batch jobs, critical payment flows, and admin operations regardless of the global rate. Implement a SamplerFunction that checks request path or a custom header to bump specific routes to 1.0 probability.

Key Takeaway

Head-based sampling ensures complete traces are either fully captured or fully dropped — never partial. Set 100% in dev, tune production sampling based on target traces-per-second for your Zipkin infrastructure, and implement custom samplers for always-on error tracing.

B3 vs W3C Propagation: Choosing the Right Header Format

Trace context propagation headers carry the traceId, spanId, and sampling decision across service boundaries. Two competing standards exist: B3 (Zipkin's original format) and W3C TraceContext (the IETF standard, adopted by OpenTelemetry). Choosing the wrong one for your ecosystem causes disconnected traces at service boundaries.

B3 propagation uses multiple headers: X-B3-TraceId (32 hex chars for 128-bit, or 16 for 64-bit), X-B3-SpanId (16 hex chars), X-B3-ParentSpanId (16 hex chars), and X-B3-Sampled (1 or 0). B3 also has a single-header variant: b3: {traceId}-{spanId}-{samplingState}-{parentSpanId}. B3 is the default for the Brave bridge (micrometer-tracing-bridge-brave).

W3C TraceContext uses two headers: traceparent (format: 00-{traceId}-{spanId}-{flags}) and tracestate (vendor-specific additional state). The traceparent header is standardized by the W3C and is the default for OpenTelemetry. W3C is the default for the OTEL bridge (micrometer-tracing-bridge-otel).

The practical implication: if all your services use the Brave bridge, B3 is the natural choice. If any service uses an OpenTelemetry SDK (perhaps a Node.js or Python service instrumented with OTEL), those services will emit W3C headers by default. A Brave-based Spring service receiving a W3C traceparent header will not propagate the trace ID unless configured to understand W3C headers.

You can configure Micrometer Tracing to support both B3 and W3C simultaneously using a composite propagation format. This is the most flexible configuration for polyglot environments. Incoming requests are checked for both header formats; outgoing requests send both.

Istio/Envoy service meshes use B3 headers by default for their distributed tracing integration. If you are using Istio and your Spring services use the OTEL bridge with W3C propagation, trace context will not be linked to Istio's Kiali service graph. Align your propagation format with your service mesh's configuration.

Mismatched Propagation Formats Silently Break Trace Correlation

A service using B3 will not extract trace context from W3C traceparent headers and vice versa. The receiving service creates a new root trace, breaking the end-to-end trace chain. This is invisible in Zipkin — you just see two separate traces for what should be one request.

Production Insight

In a Kubernetes environment with Istio, use management.tracing.propagation.consume=[B3, W3C] and produce=[B3, W3C] on all services. This handles both Istio's B3 headers and any OTEL-instrumented services emitting W3C headers without needing to coordinate a fleet-wide propagation format migration.

Key Takeaway

B3 is Zipkin/Brave native; W3C is OpenTelemetry/IETF standard. Use the composite consume+produce configuration in polyglot environments to accept both formats, and align with your service mesh's tracing configuration.

Zipkin's UI is the primary tool for trace investigation but its features are not always obvious. Understanding how to navigate it effectively reduces the time to root-cause a production issue from minutes to seconds.

The search page (/) accepts service name, span name, remote service name, annotation, tag key-value, duration (min/max), and time range. The most powerful filter combination for debugging is: service=payment-service, minDuration=1000ms (finding all traces slower than 1 second). This immediately surfaces the slow outliers without scrolling through hundreds of normal traces.

The trace detail page shows a waterfall diagram. The total trace duration is the span of time from the leftmost span's start to the rightmost span's end. Each bar represents a span. The colors indicate the service (Zipkin assigns colors consistently per service name). A thick gap between two adjacent spans represents network latency between parent calling the child service. A span that is mostly empty with a small child span means the parent spent most time waiting — look at what happened between the child returning and the parent span ending.

Annotations (small dots on spans) represent timestamped events. cs (client send), cr (client receive), ss (server send), sr (server receive) are the standard Brave annotations from HTTP calls. The gap between cs and sr is network latency; the gap between sr and ss is the server processing time; the gap between ss and cr is network latency back.

The dependency graph (/dependency) shows the service-to-service call graph derived from trace data, with call counts and error rates on each edge. This is invaluable for identifying which upstream service most frequently calls the slow one, helping prioritize optimization work.

For production debugging, combine Zipkin search with log correlation: find a slow trace in Zipkin, copy the traceId, then query your log aggregator (Kibana, CloudWatch Logs Insights, Loki) for traceId='...' to see every log statement associated with that specific request across all services. This provides the full picture: the timing from Zipkin and the context from logs.

High-Cardinality Tags Can Overwhelm Zipkin Storage

Tags like userId, orderId, or sessionId are high-cardinality (many unique values). Adding them to every span can exhaust Zipkin's storage and slow down search queries. Use highCardinalityKeyValue for values needed for search but mark them appropriately, and configure Zipkin's storage backend (Elasticsearch/Cassandra) with appropriate retention policies.

Production Insight

Set up a Zipkin alert (or query in your log aggregator) that fires when error-tagged traces exceed a threshold. Query Zipkin's API hourly for traces with error annotations and push counts to your alerting system if you are not using a commercial APM tool.

Key Takeaway

Zipkin's minDuration filter + error annotation search are your two most powerful debugging tools. Correlate Zipkin traceId with log aggregator queries to combine timing data from Zipkin with contextual data from logs for complete root-cause analysis.

● Production incidentPOST-MORTEMseverity: high

The Invisible Span: Kafka Consumer Missing from Zipkin Traces

Symptom

In Zipkin, the order-service trace ended at the Kafka producer send. A completely separate trace appeared for fulfillment-service processing the same order message, with no parent-child relationship between them. The 4-second gap was invisible in Zipkin's dependency graph, making it impossible to trace an end-to-end order flow.

Assumption

The team assumed Micrometer Tracing automatically propagated trace context to Kafka messages the same way it did for HTTP calls, since Kafka support was listed in the documentation.

Root cause

The Kafka consumer was using a manually created @KafkaListener with a KafkaListenerContainerFactory that was not configured with the Micrometer observation integration. The spring-kafka version in use (2.9.x) required explicitly enabling observation: containerFactory.getContainerProperties().setObservationEnabled(true). Without this, the consumer created new root spans instead of child spans of the producer's trace.

Fix

Added observationEnabled=true to the KafkaListenerContainerFactory bean and upgraded to spring-kafka 3.0+ which enables observation by default when Micrometer Tracing is on the classpath. Also set spring.kafka.listener.observation-enabled=true in application.yml for auto-configured factories. After the fix, Zipkin showed the complete order flow from HTTP ingress through Kafka to fulfillment processing as a single trace.

Key lesson

Kafka tracing requires explicit opt-in via observation configuration on the container factory in older spring-kafka versions.
Always verify end-to-end trace propagation through each transport type (HTTP, Kafka, RabbitMQ) in a staging environment after initial setup — missing spans appear as disconnected traces, not errors.

Production debug guideSymptom → root cause → fix6 entries

Symptom · 01

No traces appear in Zipkin UI despite application receiving requests

→

Fix

First verify Zipkin is reachable from the application: check management.zipkin.tracing.endpoint is correctly set (default: http://localhost:9411/api/v2/spans). Confirm management.tracing.sampling.probability is greater than 0.0 — the default in Spring Boot 3.x is 0.1 (10% sampling), so low-traffic testing may result in no sampled requests. Set to 1.0 for debugging. Enable DEBUG logging for io.micrometer.tracing to see span creation and export events. Verify the micrometer-tracing-bridge-brave and zipkin-reporter-brave dependencies are both on the classpath.

Symptom · 02

traceId appears in logs but not in Zipkin for the same request

→

Fix

The trace is being created locally but not exported. This usually means the Zipkin reporter is failing silently. Enable DEBUG for zipkin2.reporter to see export failures. Check if the application can reach Zipkin's API endpoint with curl http://zipkin:9411/api/v2/spans. Verify there are no firewall rules blocking port 9411. Check if the Spring Boot app is being shut down before the async reporter flushes — Zipkin's OkHttp reporter batches spans and may lose the last batch on ungraceful shutdown; configure spring.lifecycle.timeout-per-shutdown-phase to allow flush time.

Symptom · 03

Spans from different services appear in Zipkin but not connected in the same trace

→

Fix

Context propagation is failing between services. Verify both services use the same propagation format: either B3 (default for Brave bridge) or W3C (default for OTEL bridge). If mixing bridges, configure explicit format: management.tracing.propagation.type=B3 on both. For Feign clients, verify spring-cloud-starter-openfeign includes the micrometer auto-instrumentation. For Kafka, ensure spring.kafka.listener.observation-enabled=true. Use curl -H 'X-B3-TraceId: <fixed-id>' to test manual propagation and see if downstream services include it in their spans.

Symptom · 04

MDC traceId is null in log output even though tracing is configured

→

Fix

The logging pattern must include %X{traceId} and %X{spanId}. Verify your logback-spring.xml or logback.xml uses these MDC keys. In Spring Boot 3.x with Micrometer Tracing, the MDC keys are traceId, spanId, parentId, and sampled — check your pattern uses exactly these names. If using async logging with AsyncAppender, MDC is not automatically copied to async threads — use MDCInsertingServletFilter or configure logback's MDC copying. For reactive WebFlux applications, MDC does not propagate automatically through reactive chains; use Reactor's contextWrite and MDC.put patterns.

Symptom · 05

Sampling rate set to 1.0 but only some requests appear in Zipkin

→

Fix

If an upstream service (API gateway, load balancer) sends X-B3-Sampled: 0, the downstream service honors that sampling decision and will not export the span regardless of local sampling configuration. This is correct behavior — sampling decisions are made at the trace root and propagated. Check your API gateway's sampling configuration. If you need to override upstream sampling decisions (not recommended), you can implement a custom Sampler bean that ignores incoming sampling flags.

Symptom · 06

Zipkin shows extremely high latency for a service but logs show fast response times

→

Fix

Check if clock skew exists between the services. Zipkin's waterfall view depends on accurate clock synchronization (NTP). A service with a clock 500ms ahead will appear to take 500ms longer. Verify NTP configuration on all nodes. Also check if the Zipkin span is capturing database or external call time that the application logs do not include. The Zipkin span start/end may correctly represent wall-clock time including connection pool wait time that does not appear in application-level log statements.

★ Debug Cheat SheetShell commands for diagnosing Micrometer Tracing and Zipkin issues

Traces not reaching Zipkin−

Immediate action

Test Zipkin API reachability and verify app exports spans

Commands

curl -s http://zipkin:9411/api/v2/services | jq .

curl -s -X POST http://zipkin:9411/api/v2/spans -H 'Content-Type: application/json' -d '[{"traceId":"aabbccdd","id":"aabbccdd","name":"test","timestamp":1716400000000000,"duration":1000}]'

Fix now

Set management.tracing.sampling.probability=1.0 and management.zipkin.tracing.endpoint=http://zipkin:9411/api/v2/spans in application.yml

Missing traceId in application logs+

Disconnected traces between services (context propagation failure)+

Kafka spans not showing in traces+

Tracing Solution Comparison: Spring Boot 2.x vs 3.x

Aspect	Spring Cloud Sleuth (Boot 2.x)	Micrometer Tracing (Boot 3.x)
Dependency	spring-cloud-starter-sleuth	micrometer-tracing-bridge-brave + zipkin-reporter-brave
Config prefix	spring.sleuth.*	management.tracing. + management.zipkin.tracing.
Bootstrap	Auto (spring-cloud)	Auto (spring-boot-actuator + bridge on classpath)
Feign support	Auto with spring-cloud-openfeign	Auto with spring-cloud-openfeign 4.x
Kafka tracing	Auto with spring-kafka	Requires spring.kafka.listener.observation-enabled=true
MDC keys	X-B3-TraceId, X-B3-SpanId	traceId, spanId, parentId
Default sampling	10% (0.1)	10% (0.1)
Backend options	Zipkin, Jaeger (via reporter)	Any (Zipkin, Jaeger, OTEL Collector, Grafana Tempo)
API	Brave Tracer directly	Micrometer Observation + Tracer abstraction
Spring Boot 3.x compatible	No	Yes

Key takeaways

Spring Boot 3.x uses Micrometer Tracing (micrometer-tracing-bridge-brave + zipkin-reporter-brave), not Spring Cloud Sleuth

remove Sleuth entirely as it is incompatible with Spring Boot 3.x

TraceId/spanId propagate automatically across Feign, WebClient.Builder, and RestTemplateBuilder

but only if using the Spring-configured builder instances, not manually constructed clients

Kafka tracing requires explicit opt-in

spring.kafka.listener.observation-enabled=true and spring.kafka.template.observation-enabled=true — without these, Kafka traces appear as disconnected root traces

MDC traceId/spanId correlation enables log-to-trace linking but is thread-local

configure ContextPropagatingTaskDecorator on async executors to propagate context to background threads

Default sampling is 10% in both Sleuth and Micrometer Tracing

set 1.0 in dev, tune production to 1-10% based on target trace volume, and implement custom samplers for always-on error tracing

B3 and W3C are mutually incompatible propagation formats

use composite consume=[B3, W3C] configuration in polyglot environments to prevent silent trace correlation breaks at service boundaries

Common mistakes to avoid

7 patterns

Using spring-cloud-starter-sleuth with Spring Boot 3.x

Symptom

Application fails to start with classpath conflicts, NoSuchMethodError, or IncompatibleClassChangeError related to Spring Framework 6 / Jakarta EE 10 changes

Fix

Remove spring-cloud-starter-sleuth entirely and add micrometer-tracing-bridge-brave + zipkin-reporter-brave instead

Using new RestTemplate() instead of RestTemplateBuilder.build()

Symptom

HTTP calls between services do not propagate traceId — downstream service starts a new root trace instead of a child span

Fix

Always inject RestTemplateBuilder and call builder.build(). The builder is auto-configured with Micrometer Tracing interceptors

Not enabling Kafka listener observation

Symptom

Kafka consumer spans appear as disconnected root traces in Zipkin instead of child spans of the producer's trace

Fix

Set spring.kafka.listener.observation-enabled=true and spring.kafka.template.observation-enabled=true in application.yml

Leaving sampling.probability=1.0 in production

Symptom

Zipkin collector overwhelmed with span data, high memory usage on application pods from span buffering, Zipkin search slows to minutes per query

Fix

Set management.tracing.sampling.probability=0.05 to 0.1 in production; use custom sampler to always-sample errors and debug requests

Mixing B3 and W3C propagation formats across services

Symptom

End-to-end traces are broken at service boundaries between services using different propagation formats — two separate traces appear in Zipkin for one request

Fix

Standardize on one format fleet-wide, or use management.tracing.propagation.consume=[B3, W3C] and produce=[B3] to accept both and emit B3

Not copying MDC to async threads

Symptom

Log statements inside @Async methods or CompletableFuture chains show empty traceId fields, breaking log correlation for async operations

Fix

Configure ContextPropagatingTaskDecorator on all custom ExecutorService and @Async executor beans

Querying Zipkin with wrong service name casing

Symptom

Zipkin search returns no results for a service that is definitely producing traces

Fix

Zipkin service names are derived from spring.application.name. The name is case-sensitive in Zipkin UI. Use GET /api/v2/services to see the exact registered service names

INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01JUNIOR

Why was Spring Cloud Sleuth deprecated and what replaces it in Spring Bo...

Q02SENIOR

Explain how traceId propagates from a Spring MVC service to a downstream...

Q03SENIOR

What is the difference between head-based and tail-based sampling, and w...

Q04SENIOR

How do you propagate trace context to Kafka consumers in Spring Boot 3.x...

Q05SENIOR

A request generates a traceId that appears in logs but never shows up in...

Q06JUNIOR

What MDC keys does Micrometer Tracing populate, and how do you include t...

Q07SENIOR

How do B3 and W3C TraceContext propagation formats differ, and when woul...

Q08SENIOR

How do you ensure trace context propagates correctly in a reactive WebFl...

Q09SENIOR

What is the recommended sampling rate for a high-throughput production s...

Q01 of 09JUNIOR

Why was Spring Cloud Sleuth deprecated and what replaces it in Spring Boot 3.x?

ANSWER

Spring Cloud Sleuth was built on Spring Boot 2.x and Spring Framework 5 internals and was not compatible with the Jakarta EE 10 / Spring Framework 6 foundation of Spring Boot 3.x. Micrometer Tracing is the replacement — it is part of the Micrometer observability stack and provides a vendor-neutral Tracer API with bridge implementations for both Brave (Zipkin) and OpenTelemetry. The configuration prefix moved from spring.sleuth. to management.tracing., and MDC keys changed from X-B3-TraceId to traceId.

FAQ · 7 QUESTIONS

Frequently Asked Questions

Is Spring Cloud Sleuth still supported for Spring Boot 2.x projects?

Can I use Jaeger or Grafana Tempo instead of Zipkin?

How much latency does distributed tracing add to each request?

What happens to traces if Zipkin is unavailable?

How do I add custom business data to spans for easier searching in Zipkin?

How do I test that trace propagation works correctly in unit and integration tests?

What is the Observation API and how does it relate to tracing?

🔥

That's Spring Cloud. Mark it forged?

10 min read · try the examples if you haven't

Distributed Tracing in Spring Boot 3.x: Micrometer Tracing + Zipkin

Dependencies and Configuration: Migrating from Sleuth to Micrometer Tracing

MDC Log Correlation: Linking Logs to Traces

Cross-Service Propagation: Feign, WebClient, and Kafka

Sampling Strategies: Balancing Observability and Overhead

B3 vs W3C Propagation: Choosing the Right Header Format

Zipkin UI Navigation and Trace Analysis

The Invisible Span: Kafka Consumer Missing from Zipkin Traces

Key takeaways

Common mistakes to avoid

Using spring-cloud-starter-sleuth with Spring Boot 3.x

Using new RestTemplate() instead of RestTemplateBuilder.build()

Not enabling Kafka listener observation

Leaving sampling.probability=1.0 in production

Mixing B3 and W3C propagation formats across services

Not copying MDC to async threads

Querying Zipkin with wrong service name casing

Interview Questions on This Topic

Frequently Asked Questions

That's Spring Cloud. Mark it forged?